Reinforcement Learning

Reinforcement Learning (RL) is a branch of machine learning where an AI learns to make decisions by interacting with an environment and receiving feedback. There is no labeled dataset. The system takes actions, observes results, and receives numerical rewards or penalties.

Over millions of iterations, it builds a policy, a set of rules for which action to take in any given state to maximize long-term cumulative reward. The architecture consists of an agent placed in an environment. The agent takes an action, the environment returns a new state and a reward signal. The agent's only objective is to maximize total reward over time.

Through rapid trial and error, it discovers strategies that no human explicitly programmed. RL powered DeepMind's AlphaGo, which defeated the world champion at Go by discovering strategies humans had never considered in thousands of years of play. Today RL trains robots to walk, move through obstacle courses, and drive autonomous vehicles through unpredictable traffic.

It is also central to the RLHF process that makes large language models more useful and less harmful.

Interactive Concept: reinforcement learning

Reinforcement Learning Agent

Watch an AI agent learn to navigate to the goal through trial and error

🤖

🚫

🎯

Training Statistics

Episode:1

Steps:0

Total Reward:0.0

Last Reward:0

Hyperparameters

Exploration Rate: 0.900

Learning Rate: 0.10

Legend

Agent

Goal (+100)

Obstacle (-50)

High Q-value

Low Q-value

Reinforcement Learning Agent

Training Statistics

Hyperparameters

Legend

Related Essays

Reinforcement Learning

Reinforcement Learning Agent

Training Statistics

Hyperparameters

Legend

Related Essays