Advanced
Artificial Intelligence & Machine Learning
Q85 / 100
What is mechanistic interpretability in LLMs?
Correct! Well done.
Incorrect.
The correct answer is B) A research program reverse-engineering the specific circuits and algorithms implemented by transformer weights to understand exactly how models compute their outputs
B
Correct Answer
A research program reverse-engineering the specific circuits and algorithms implemented by transformer weights to understand exactly how models compute their outputs
Explanation
Mechanistic interpretability (Anthropic, EleutherAI) identifies circuits: induction heads, copy suppression heads, indirect object identification circuits. Aims to fully understand transformer computations like reverse-engineering software.
Progress
85/100