Reinforcement Learning (RL)
Paradigma machine learning di mana agen belajar dengan berinteraksi dengan environment, menerima reward atau punishment. AlphaGo, RLHF untuk LLM.
From: LLM Wiki URL: llm-wiki.pages.dev/concepts/reinforcement-learning Created: June 21, 2026 Updated: June 21, 2026 Read time: 1 min
Reinforcement Learning
Definisi
Reinforcement learning (RL) adalah paradigma ML di mana agen belajar dengan berinteraksi dengan environment, menerima reward atau punishment.
Komponen
- Agent — pengambil keputusan
- Environment — dunia tempat agen berada
- State — representasi situasi
- Action — pilihan agen
- Reward — feedback skalar
- Policy — strategi agen
Algoritma
- Q-learning (1989)
- DQN (Deep Q-Network, 2013) — DeepMind
- Policy Gradient (REINFORCE, 1992)
- Actor-Critic (A2C, A3C)
- PPO (Proximal Policy Optimization, 2017) — OpenAI
- GRPO (Group Relative Policy Optimization, 2024) — DeepSeek
Aplikasi
- Game AI — AlphaGo, AlphaZero, OpenAI Five (Dota 2)
- Robotics — locomotion, manipulation
- RLHF — alignment LLM
- Autonomous driving