AlphaZero

Algoritma RL generik dari DeepMind yang menguasai catur, shogi, dan Go dari nol (self-play). Mengalahkan Stockfish (catur), Elmo (shogi), AlphaGo Zero (Go).

AlphaZero: satu algoritma untuk 3 game. MCTS + deep neural network + self-play. Mengalahkan Stockfish 28-0, Elmo 90-2, AlphaGo Zero 89-11.

Print

AlphaZero

Definisi

AlphaZero adalah algoritma reinforcement learning generik dari DeepMind yang menguasai catur, shogi, dan Go dari NOL (self-play).

Pencapaian (Desember 2017)

GameLawanHasilTraining
CaturStockfish 828-0-724 jam
ShogiElmo90-2-82 jam
GoAlphaGo Zero89-118 jam

Signifikansi

  • Single algorithm untuk 3 game berbeda
  • No human knowledge — hanya aturan game
  • Pendekatan generik — mendekati AGI
  • Master superhuman dalam hitungan jam

Era Pasca

  • MuZero (2019) — tanpa model aturan
  • AlphaProof (2024) — matematika olympiad
  • AlphaGeometry (2024) — geometri
  • OpenAI Five (2018) — Dota 2

Connected to

Not yet written

The following pages are referenced but don't exist yet — they'd make good future additions.

  • /concepts/reinforcement-learning
  • /timeline/peluncuruan-alphazero

References

  1. Wikipedia

Type at least 2 characters to search.

Press to navigate, to open, esc to close.