Advanced Artificial Intelligence & Machine Learning
Q85 / 100

What is mechanistic interpretability in LLMs?

Correct! Well done.

Incorrect.

The correct answer is B) A research program reverse-engineering the specific circuits and algorithms implemented by transformer weights to understand exactly how models compute their outputs

B

Correct Answer

A research program reverse-engineering the specific circuits and algorithms implemented by transformer weights to understand exactly how models compute their outputs

Explanation

Mechanistic interpretability (Anthropic, EleutherAI) identifies circuits: induction heads, copy suppression heads, indirect object identification circuits. Aims to fully understand transformer computations like reverse-engineering software.

Progress
85/100