Statistics for ML #8 — Bayes Theorem

1 minute read

Published: January 08, 2026

Bayes’ Theorem is the mathematical foundation of rational belief update. It is arguably the most important equation in statistics and modern ML.

The Formula

\[P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}\]

Term	Name	Meaning
P(H\|E)	Posterior	Updated belief after seeing evidence
P(E\|H)	Likelihood	How probable is the evidence if H is true
P(H)	Prior	Initial belief before seeing evidence
P(E)	Marginal likelihood	Normalising constant (often intractable)

Medical Diagnosis Example

A disease affects 1% of the population. A test has 99% sensitivity and 95% specificity.
If a patient tests positive, what is the probability they have the disease?

\[P(\text{Disease}|\text{Test+}) = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} = \frac{0.0099}{0.0594} \approx 16.7\%\]

Despite a 99% sensitive test, only 1-in-6 positives actually have the disease when prevalence is low. This is the base rate fallacy — ignoring the prior.

Extended Form (Law of Total Probability in denominator)

\[P(H|E) = \frac{P(E|H) \cdot P(H)}{\sum_k P(E|H_k) \cdot P(H_k)}\]

Bayesian Updating: Sequential Learning

Start with prior → observe data → compute posterior → use posterior as new prior → observe more data → …

This is exactly how online learning and Bayesian neural networks work.

Bayes in ML

Naive Bayes classifier: Applies Bayes with conditional independence assumption
Bayesian optimisation: For hyperparameter tuning
Bayesian neural networks: Distributions over weights, not point estimates
MAP estimation: Maximum A Posteriori = MLE + prior regularisation

from sklearn.naive_bayes import GaussianNB, BernoulliNB

# Naive Bayes for classification
gnb = GaussianNB(priors=[0.3, 0.7])  # set class priors
gnb.fit(X_train, y_train)
probs = gnb.predict_proba(X_test)

# Bayesian updating in R
library(bayesrules)
# Prior: Beta(2,5), Likelihood: Binomial
# Posterior: Beta(2+k, 5+n-k)
plot_beta_binomial(alpha=2, beta=5, y=14, n=20)

Previous: #7 Conditional Probability | Next: #9 Random Variables

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Md Salek Miah

Statistics for ML #8 — Bayes Theorem

The Formula

Medical Diagnosis Example

Extended Form (Law of Total Probability in denominator)

Bayesian Updating: Sequential Learning

Bayes in ML

Share on

You May Also Enjoy

Future Blog Post

Statistics for ML #97 — Time Series Analysis: ARIMA, ACF, PACF

Time Series Analysis: ARIMA, ACF, PACF

Statistics for ML #96 — Autoencoders & VAE

Autoencoders & VAE

Statistics for ML #95 — Vanishing & Exploding Gradients

Vanishing & Exploding Gradients