Advanced Artificial Intelligence & Machine Learning
Q90 / 100

What is activation engineering/steering in LLMs?

Correct! Well done.

Incorrect.

The correct answer is B) Directly modifying intermediate activations during inference to steer model behavior (adding a direction to the residual stream to induce concepts like "banana" or "French")

B

Correct Answer

Directly modifying intermediate activations during inference to steer model behavior (adding a direction to the residual stream to induce concepts like "banana" or "French")

Explanation

Activation steering (Representation Engineering, Zou et al.): identify directions in activation space corresponding to concepts, add them during forward pass. Changes model behavior without fine-tuning. Related to mechanistic interpretability.

Progress
90/100