Intermediate Artificial Intelligence & Machine Learning
Q51 / 100

What is RLHF (Reinforcement Learning from Human Feedback)?

Correct! Well done.

Incorrect.

The correct answer is B) A training approach where human preferences between model outputs are used to train a reward model, which guides RL fine-tuning for better alignment with human values

B

Correct Answer

A training approach where human preferences between model outputs are used to train a reward model, which guides RL fine-tuning for better alignment with human values

Explanation

RLHF (InstructGPT, ChatGPT): collect human comparisons of outputs → train reward model → use PPO to maximize reward. Dramatically improves instruction following, helpfulness, and reducing harmful outputs.

Progress
51/100