Reinforcement Learning
Definisi
Reinforcement learning (RL) adalah paradigma ML di mana agen belajar dengan berinteraksi dengan environment, menerima reward atau punishment.
Komponen
- Agent — pengambil keputusan
- Environment — dunia tempat agen berada
- State — representasi situasi
- Action — pilihan agen
- Reward — feedback skalar
- Policy — strategi agen
Algoritma
- Q-learning (1989)
- DQN (Deep Q-Network, 2013) — DeepMind
- Policy Gradient (REINFORCE, 1992)
- Actor-Critic (A2C, A3C)
- PPO (Proximal Policy Optimization, 2017) — OpenAI
- GRPO (Group Relative Policy Optimization, 2024) — DeepSeek
Aplikasi
- Game AI — AlphaGo, AlphaZero, OpenAI Five (Dota 2)
- Robotics — locomotion, manipulation
- RLHF — alignment LLM
- Autonomous driving