Advanced
Artificial Intelligence & Machine Learning
Q95 / 100
What is speculative decoding in LLMs?
Correct! Well done.
Incorrect.
The correct answer is B) An inference optimization where a small draft model generates tokens quickly, which the large target model verifies in parallel, achieving speedup without quality loss
B
Correct Answer
An inference optimization where a small draft model generates tokens quickly, which the large target model verifies in parallel, achieving speedup without quality loss
Explanation
Speculative decoding (Leviathan et al., 2022): draft model generates k tokens, target model verifies all in parallel (one forward pass). If draft tokens match target distribution, all are accepted. 2-3x speedup on typical text.
Progress
95/100