Statistics for ML #6 — Probability Axioms & Rules

1 minute read

Published: January 06, 2026

Probability is the mathematical language of uncertainty. Every ML model — from logistic regression to deep neural networks — is built on probability theory.

Kolmogorov’s Three Axioms (1933)

For any event A in sample space Ω:

Non-negativity: P(A) ≥ 0
Normalization: P(Ω) = 1
Additivity: If A ∩ B = ∅, then P(A ∪ B) = P(A) + P(B)

Everything else in probability theory is derived from these three axioms.

Key Rules

Complement Rule: \(P(A^c) = 1 - P(A)\)

Addition Rule (General): \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

Multiplication Rule: \(P(A \cap B) = P(A) \cdot P(B|A)\)

Independence: A and B are independent if: \(P(A \cap B) = P(A) \cdot P(B)\)

Law of Total Probability

If {B₁, B₂, …, Bₙ} is a partition of Ω: \(P(A) = \sum_{i=1}^{n} P(A|B_i) \cdot P(B_i)\)

This is the foundation of marginalisation in Bayesian ML.

ML Connections

Naive Bayes: Assumes feature independence: P(x₁,…,xₙ y) = ∏P(xᵢ y)
Logistic Regression: Models P(Y=1 X) directly
Random Forest: Each tree gives a probability estimate; forest averages them
Calibration: Is P̂(Y=1 X=x) truly the probability of the event?

# In sklearn, predict_proba() gives probability estimates
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
probs = model.predict_proba(X_test)[:, 1]  # P(Y=1|X)

Previous: #5 Covariance & Correlation | Next: #7 Conditional Probability

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Md Salek Miah

Statistics for ML #6 — Probability Axioms & Rules

Kolmogorov’s Three Axioms (1933)

Key Rules

Law of Total Probability

ML Connections

Share on

You May Also Enjoy

Future Blog Post

Statistics for ML #97 — Time Series Analysis: ARIMA, ACF, PACF

Time Series Analysis: ARIMA, ACF, PACF

Statistics for ML #96 — Autoencoders & VAE

Autoencoders & VAE

Statistics for ML #95 — Vanishing & Exploding Gradients

Vanishing & Exploding Gradients