Statistics for ML #8 — Bayes Theorem
Published:
Bayes’ Theorem is the mathematical foundation of rational belief update. It is arguably the most important equation in statistics and modern ML.
The Formula
\[P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}\]| Term | Name | Meaning |
|---|---|---|
| P(H|E) | Posterior | Updated belief after seeing evidence |
| P(E|H) | Likelihood | How probable is the evidence if H is true |
| P(H) | Prior | Initial belief before seeing evidence |
| P(E) | Marginal likelihood | Normalising constant (often intractable) |
Medical Diagnosis Example
A disease affects 1% of the population. A test has 99% sensitivity and 95% specificity.
If a patient tests positive, what is the probability they have the disease?
Despite a 99% sensitive test, only 1-in-6 positives actually have the disease when prevalence is low. This is the base rate fallacy — ignoring the prior.
Extended Form (Law of Total Probability in denominator)
\[P(H|E) = \frac{P(E|H) \cdot P(H)}{\sum_k P(E|H_k) \cdot P(H_k)}\]Bayesian Updating: Sequential Learning
Start with prior → observe data → compute posterior → use posterior as new prior → observe more data → …
This is exactly how online learning and Bayesian neural networks work.
Bayes in ML
- Naive Bayes classifier: Applies Bayes with conditional independence assumption
- Bayesian optimisation: For hyperparameter tuning
- Bayesian neural networks: Distributions over weights, not point estimates
- MAP estimation: Maximum A Posteriori = MLE + prior regularisation
from sklearn.naive_bayes import GaussianNB, BernoulliNB
# Naive Bayes for classification
gnb = GaussianNB(priors=[0.3, 0.7]) # set class priors
gnb.fit(X_train, y_train)
probs = gnb.predict_proba(X_test)
# Bayesian updating in R
library(bayesrules)
# Prior: Beta(2,5), Likelihood: Binomial
# Posterior: Beta(2+k, 5+n-k)
plot_beta_binomial(alpha=2, beta=5, y=14, n=20)
Previous: #7 Conditional Probability | Next: #9 Random Variables
