Big Picture
Tutorial 3 verified that the feature table looks reasonable. Distributions were centered correctly. No obvious pipeline bugs. But "looks reasonable" is not the same as "is provably random."
This tutorial introduces formal hypothesis testing. You will learn to test whether observed data matches what a random process should produce. The tool: the chi-squared test.
The chi-squared test compares observed frequencies to expected frequencies. If they match closely, the data passes. If they diverge too much, something non-random might be happening.
1. Where We Are in the Journey
Tutorial 1 cleaned the data. Tutorial 2 built features. Tutorial 3 performed exploratory checks. All of that was preparation.
Tutorial 4 is where the science begins. We test a specific claim: "This lottery is random." That claim is the null hypothesis.
If the lottery is truly random, we know what the distributions should look like. Odd counts should follow a binomial distribution. Decade buckets should be roughly uniform. We can compute those expectations precisely.
The chi-squared test measures how far the observed data deviates from those expectations. If the deviation is small, randomness is plausible. If the deviation is large, something else might be going on.
The workflow: raw data → validated data → feature table → EDA → hypothesis testing → advanced models.
We are at step five.
2. The Null Hypothesis: What "Random" Means
The null hypothesis is the default assumption. It is the claim we test against. For lottery analysis, the null hypothesis is simple: the lottery is random.
If the null hypothesis is true, we can predict what the data should look like. For example:
These are testable predictions. If the data matches them, the null hypothesis holds. If the data deviates significantly, we have evidence against randomness.
3. The Chi-Squared Test: Measuring Deviation
The chi-squared test compares observed frequencies (what actually happened) to expected frequencies (what should happen if H₀ is true).
The formula squares the differences so positive and negative deviations do not cancel out. Dividing by expected normalizes each term so that categories with different frequencies are weighted fairly.
- If observed ≈ expected → χ² is small → data matches null hypothesis
- If observed ≠ expected → χ² is large → data deviates from null hypothesis
The p-value tells you how likely it is to see a χ² value this large (or larger) if the null hypothesis is true. Small p-value = unlikely under randomness = evidence against H₀.
4. Worked Example: Testing Odd Count Distribution
Let's test whether the odd count distribution matches the binomial expectation. If you visualized this result, it would look like the "GOOD: Peaks at 2-3" chart from our Tutorial 3 reference set.
H₀: Odd counts follow a binomial distribution with n=5, p=0.5
1 odd ball: 195 draws
2 odd balls: 403 draws
3 odd balls: 420 draws
4 odd balls: 211 draws
5 odd balls: 42 draws
Total draws = 1,269. Expected probabilities from binomial: [0.029, 0.153, 0.318, 0.330, 0.170, 0.034]
1 odd ball: 194.2 draws
2 odd balls: 403.5 draws
3 odd balls: 418.8 draws
4 odd balls: 215.7 draws
5 odd balls: 43.1 draws
Degrees of freedom = 6 - 1 = 5
Using scipy: p-value ≈ 0.991
p = 0.991 is much larger than 0.05
Conclusion: Fail to reject H₀. The odd count distribution is consistent with randomness.
5. Code Roadmap: What the Script Does (and Why This Order)
The script performs four main steps:
Read the feature table created in Tutorial 2.
Count how many draws have 0, 1, 2, 3, 4, or 5 odd balls. Compare to the theoretical binomial distribution. Compute chi-squared statistic and p-value.
Count how many total hits each decade bucket received. Compare to uniform expectation (adjusted for decade 0-9 having only 9 numbers). Compute chi-squared statistic and p-value.
Display both test results with p-values and conclusions about whether to reject the null hypothesis.
6. Python Implementation
Here is the complete script. It runs two chi-squared tests: one for odd count distribution and one for decade uniformity.
"""Tutorial 4: Test if the lottery is behaving randomly using chi-squared tests"""
import pandas as pd
import numpy as np
from scipy.stats import chisquare
# --- Load feature table ---
print("Loading feature table from Tutorial 2...")
features = pd.read_parquet('data/processed/features_powerball.parquet')
print(f"Loaded {len(features)} draws\n")
# --- Test 1: Odd Count Distribution ---
print("Test 1: Odd Count Distribution")
print("-" * 40)
# Count how many draws have 0, 1, 2, 3, 4, 5 odd balls
observed_odd_counts = features['odd_count'].value_counts().sort_index()
observed_frequencies = [observed_odd_counts.get(i, 0) for i in range(6)]
print("\nObserved frequencies:")
for i, count in enumerate(observed_frequencies):
print(f" {i} odd balls: {count} draws")
# Calculate expected frequencies under the null hypothesis
# (binomial distribution with n=5, p=0.5)
total_draws = len(features)
expected_probabilities = [0.029, 0.153, 0.318, 0.330, 0.170, 0.034]
expected_frequencies = [prob * total_draws for prob in expected_probabilities]
print("\nExpected frequencies (if random):")
for i, count in enumerate(expected_frequencies):
print(f" {i} odd balls: {count:.1f} draws")
# Perform chi-squared test
chi2_stat, p_value = chisquare(observed_frequencies, expected_frequencies)
print(f"\nChi-squared statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {len(observed_frequencies) - 1}")
# Interpret the result
alpha = 0.05
if p_value < alpha:
print(f"\nConclusion: REJECT null hypothesis (p < {alpha})")
print("The odd count distribution does NOT match random expectation.")
else:
print(f"\nConclusion: FAIL TO REJECT null hypothesis (p >= {alpha})")
print("The odd count distribution is consistent with randomness.")
# --- Test 2: Decade Distribution ---
print("\n\nTest 2: Decade Distribution (Uniformity)")
print("-" * 40)
decade_columns = ['decade_00_09', 'decade_10_19', 'decade_20_29',
'decade_30_39', 'decade_40_49', 'decade_50_59', 'decade_60_69']
# Count total hits in each decade
observed_decade_counts = [features[col].sum() for col in decade_columns]
print("\nObserved decade counts:")
decades = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69']
for decade, count in zip(decades, observed_decade_counts):
print(f" {decade}: {count} hits")
# Under uniformity, each decade should get roughly the same number of hits
# But decade 0-9 only has 9 numbers vs 10 for others, so adjust expected values
decade_sizes = [9, 10, 10, 10, 10, 10, 10] # Number of balls in each decade
total_size = sum(decade_sizes)
total_hits = sum(observed_decade_counts)
expected_decade_counts = [(size / total_size) * total_hits for size in decade_sizes]
print("\nExpected decade counts (if uniform):")
for decade, count in zip(decades, expected_decade_counts):
print(f" {decade}: {count:.1f} hits")
# Perform chi-squared test
chi2_stat, p_value = chisquare(observed_decade_counts, expected_decade_counts)
print(f"\nChi-squared statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {len(observed_decade_counts) - 1}")
# Interpret the result
if p_value < alpha:
print(f"\nConclusion: REJECT null hypothesis (p < {alpha})")
print("The decade distribution is NOT uniform.")
else:
print(f"\nConclusion: FAIL TO REJECT null hypothesis (p >= {alpha})")
print("The decade distribution is consistent with uniformity.")
# --- Summary ---
print("\n\nSummary")
print("-" * 40)
print("Both tests used alpha = 0.05 (5% significance level)")
print("If p-value < 0.05, we reject the null hypothesis")
print("If p-value >= 0.05, we fail to reject (data looks random)\n")7. How to Run the Script
- You must have run Tutorial 2 first (to create
features_powerball.parquet) - Install scipy:
pip install scipy# Windows (PowerShell)
cd C:\path\to\tutorials
python tutorial4_probability_tests.py# Mac / Linux (Terminal)
cd /path/to/tutorials
python3 tutorial4_probability_tests.py8. How to Interpret the Results
- Reject H₀ (the null hypothesis)
- The data does NOT look random
- Evidence of structure, bias, or pattern
- Worth investigating further
- Fail to reject H₀
- The data is consistent with randomness
- No evidence of non-random patterns
- Does not prove randomness, just says "no red flags"
9. What You Now Understand (and why it matters later)
You know how to formulate a null hypothesis. You know how to compute a chi-squared statistic and interpret a p-value. You can test whether observed frequencies match theoretical expectations.
This establishes the baseline. If chi-squared tests pass, the lottery looks random at the level of basic frequency counts. Any pattern-detection method you build later needs to beat this baseline to be worth attention.
Tutorial 5 will introduce Bayesian inference and posterior distributions. That framework lets you ask more nuanced questions than "random or not?" It lets you quantify uncertainty and update beliefs as evidence accumulates.