What is reinforcement learning?

Answer

Reinforcement Learning (RL) is a paradigm where an agent learns to make decisions by interacting with an environment. At each step, the agent observes the state, takes an action, and receives a reward. The goal is to learn a policy (a mapping from states to actions) that maximizes cumulative reward. Key concepts: the exploration vs. exploitation tradeoff, Markov Decision Processes, Q-learning (value-based), and Policy Gradient methods. RL has achieved superhuman performance in games (AlphaGo, OpenAI Five) and is used in robotics, recommendation systems, and RLHF for LLM alignment.