Advanced Artificial Intelligence & Machine Learning
Q95 / 100

What is speculative decoding in LLMs?

Correct! Well done.

Incorrect.

The correct answer is B) An inference optimization where a small draft model generates tokens quickly, which the large target model verifies in parallel, achieving speedup without quality loss

B

Correct Answer

An inference optimization where a small draft model generates tokens quickly, which the large target model verifies in parallel, achieving speedup without quality loss

Explanation

Speculative decoding (Leviathan et al., 2022): draft model generates k tokens, target model verifies all in parallel (one forward pass). If draft tokens match target distribution, all are accepted. 2-3x speedup on typical text.

Progress
95/100