Statistical Learning
Definisi
Statistical learning adalah pendekatan machine learning berbasis statistik (probabilitas, likelihood, Bayesian inference) yang dominan 1990-2010.
Algoritma
- Linear/Logistic Regression (1800-an, masih populer)
- Naive Bayes (1950-an)
- Decision Trees (ID3, CART — 1980-an)
- k-Nearest Neighbors
- Hidden Markov Models (HMM)
- Conditional Random Fields (CRF)
- Support Vector Machines (SVM) — Vapnik, 1995
- Bayesian Networks — Judea Pearl
- Gaussian Mixture Models (GMM)
- Random Forests (2001)
- Gradient Boosting (XGBoost, LightGBM)
Era
- Dominan: 1990-2010
- Tergeser: 2012+ (AlexNet, deep learning)
- Hybrid: 2020+ — neural network + statistical
- Sekarang: tetap terbaik untuk tabular data
Aplikasi Modern
- XGBoost, LightGBM, CatBoost — dominasi Kaggle tabular
- SVM — masih dipakai untuk small data
- Bayesian methods — uncertainty quantification
- Time series — ARIMA, Prophet
- A/B testing — statistical inference