Statistics for ML #100 — A/B Testing & Experimentation Design

2 minute read

Published: April 03, 2026

A/B Testing & Experimentation Design

Post #100/100 in the Statistics for ML series — Md Salek Miah

Statistician & ML Researcher

SUST, Bangladesh.

A/B Testing is the gold standard for causal inference in experimental settings. It is hypothesis testing applied to business, clinical, and policy decisions.

The Framework

Define hypothesis: H₀: μ_A = μ_B vs H₁: μ_A ≠ μ_B
Calculate required sample size (power analysis)
Randomise units to treatment A and B
Run experiment — collect data
Analyse — t-test, z-test, or Bayesian approach
Decide — reject or fail to reject H₀

Sample Size Formula

\[n = \frac{2(z_{\alpha/2} + z_\beta)^2 \sigma^2}{\delta^2}\]

where δ = minimum detectable effect, β = desired power.

Common Pitfalls

Peeking: Stopping early when significant — inflates Type I error
Multiple metrics: Testing many outcomes → Bonferroni correction needed
Network effects: Spillover between treatment/control (SUTVA violation)
Novelty effect: Short-run engagement boost from any change

from scipy import stats
import numpy as np

# Sample size calculation
def required_n(effect_size, alpha=0.05, power=0.80):
    z_alpha = stats.norm.ppf(1 - alpha/2)
    z_beta = stats.norm.ppf(power)
    return int(np.ceil(2 * (z_alpha + z_beta)**2 / effect_size**2))

# For 5% lift in SBA rate (0.67 → 0.705)
p1, p2 = 0.67, 0.705
effect = (p2 - p1) / np.sqrt((p1*(1-p1) + p2*(1-p2))/2)  # Cohen's h
n = required_n(effect)
print(f'Required n per group = {n}')

# Analyse results
control_sba = np.random.binomial(1, 0.67, n)
treatment_sba = np.random.binomial(1, 0.705, n)
t_stat, p_val = stats.ttest_ind(control_sba, treatment_sba)
print(f't-stat = {t_stat:.3f}, p-value = {p_val:.4f}')

🎉 Series Complete!

Congratulations on reaching the end of the Statistics for ML — 100 Posts series!

This series covered everything from basic data types to causal inference, designed for statisticians, epidemiologists, and ML practitioners working on real-world health data.

What We Covered

📊 Part 1 (Posts 1–20): Foundations — data types, distributions, CLT, sampling
📈 Part 2 (Posts 21–35): Probability distributions — Bernoulli to multivariate Normal
🔬 Part 3 (Posts 36–50): Statistical inference — MLE, hypothesis testing, ANOVA
📉 Part 4 (Posts 51–63): Regression — OLS to regularisation
🤖 Part 5 (Posts 64–78): ML concepts — bias-variance, cross-validation, ROC
🧠 Part 6 (Posts 79–87): Bayesian & probabilistic ML
🔥 Part 7 (Posts 88–96): Deep learning foundations
🌍 Part 8 (Posts 97–100): Advanced — time series, survival, causal inference, A/B testing

Connect With Me

Series Index | Post #100/100 | Md Salek Miah | saleksta@gmail.com

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Md Salek Miah

Statistics for ML #100 — A/B Testing & Experimentation Design

A/B Testing & Experimentation Design

The Framework

Sample Size Formula

Common Pitfalls

🎉 Series Complete!

What We Covered

Connect With Me

Share on

You May Also Enjoy

Future Blog Post

Statistics for ML #97 — Time Series Analysis: ARIMA, ACF, PACF

Time Series Analysis: ARIMA, ACF, PACF

Statistics for ML #96 — Autoencoders & VAE

Autoencoders & VAE

Statistics for ML #95 — Vanishing & Exploding Gradients

Vanishing & Exploding Gradients