Reinforcement Learning (RL) is a branch of machine learning where an AI learns to make decisions by interacting with an environment and receiving feedback. There is no labeled dataset. The system takes actions, observes results, and receives numerical rewards or penalties. Over millions of iterations, it builds a policy, a set of rules for which action to take in any given state to maximize long-term cumulative reward. The architecture consists of an agent placed in an environment. The agent takes an action, the environment returns a new state and a reward signal. The agent's only objective is to maximize total reward over time. Through rapid trial and error, it discovers strategies that no human explicitly programmed. RL powered DeepMind's AlphaGo, which defeated the world champion at Go by discovering strategies humans had never considered in thousands of years of play. Today RL trains robots to walk, navigate obstacle courses, and drive autonomous vehicles through unpredictable traffic. It is also central to the RLHF process that makes large language models more useful and less harmful.
Back to Glossary