Reinforcement Learning (RL)

Paradigma machine learning di mana agen belajar dengan berinteraksi dengan environment, menerima reward atau punishment. AlphaGo, RLHF untuk LLM.

RL: agen → action → environment → reward. Algoritma: Q-learning, DQN, PPO, GRPO. Aplikasi: game AI (AlphaGo, AlphaZero), robot, RLHF untuk LLM alignment.

Also known as: RL, pembelajaran penguatan
Print

Reinforcement Learning

Definisi

Reinforcement learning (RL) adalah paradigma ML di mana agen belajar dengan berinteraksi dengan environment, menerima reward atau punishment.

Komponen

  • Agent — pengambil keputusan
  • Environment — dunia tempat agen berada
  • State — representasi situasi
  • Action — pilihan agen
  • Reward — feedback skalar
  • Policy — strategi agen

Algoritma

  • Q-learning (1989)
  • DQN (Deep Q-Network, 2013) — DeepMind
  • Policy Gradient (REINFORCE, 1992)
  • Actor-Critic (A2C, A3C)
  • PPO (Proximal Policy Optimization, 2017) — OpenAI
  • GRPO (Group Relative Policy Optimization, 2024) — DeepSeek

Aplikasi

  • Game AI — AlphaGo, AlphaZero, OpenAI Five (Dota 2)
  • Robotics — locomotion, manipulation
  • RLHF — alignment LLM
  • Autonomous driving

Connected to

Not yet written

The following pages are referenced but don't exist yet — they'd make good future additions.

  • /concepts/machine-learning

References

  1. Wikipedia

Type at least 2 characters to search.

Press to navigate, to open, esc to close.