Advanced
Artificial Intelligence & Machine Learning
Q90 / 100
What is activation engineering/steering in LLMs?
Correct! Well done.
Incorrect.
The correct answer is B) Directly modifying intermediate activations during inference to steer model behavior (adding a direction to the residual stream to induce concepts like "banana" or "French")
B
Correct Answer
Directly modifying intermediate activations during inference to steer model behavior (adding a direction to the residual stream to induce concepts like "banana" or "French")
Explanation
Activation steering (Representation Engineering, Zou et al.): identify directions in activation space corresponding to concepts, add them during forward pass. Changes model behavior without fine-tuning. Related to mechanistic interpretability.
Progress
90/100