Advanced
Artificial Intelligence & Machine Learning
Q96 / 100
What is Constitutional AI and RLAIF vs RLHF?
Correct! Well done.
Incorrect.
The correct answer is B) RLHF uses human preferences to train a reward model; RLAIF (RL from AI feedback) uses AI-generated preferences, enabling scale without proportional human annotation cost
B
Correct Answer
RLHF uses human preferences to train a reward model; RLAIF (RL from AI feedback) uses AI-generated preferences, enabling scale without proportional human annotation cost
Explanation
RLHF: human raters compare outputs → reward model → PPO optimization. RLAIF: the AI itself rates outputs against constitutional principles → reward model → RL. Scales better but depends on AI judgment quality.
Progress
96/100