Tutorial MenuTutorial 4Probability Tests

Tutorial 4: Classical Probability Tests & Null Models

Using chi-squared tests to establish whether lottery data behaves randomly

Big Picture

Tutorial 3 verified that the feature table looks reasonable. Distributions were centered correctly. No obvious pipeline bugs. But "looks reasonable" is not the same as "is provably random."

This tutorial introduces formal hypothesis testing. You will learn to test whether observed data matches what a random process should produce. The tool: the chi-squared test.

The chi-squared test compares observed frequencies to expected frequencies. If they match closely, the data passes. If they diverge too much, something non-random might be happening.

What you will be able to do by the end
You will understand null hypothesis testing. You will know how to compute a chi-squared statistic and interpret a p-value. You will be able to test whether lottery features follow their theoretical distributions. This establishes the baseline that more sophisticated methods must beat.

1. Where We Are in the Journey

Tutorial 1 cleaned the data. Tutorial 2 built features. Tutorial 3 performed exploratory checks. All of that was preparation.

Tutorial 4 is where the science begins. We test a specific claim: "This lottery is random." That claim is the null hypothesis.

If the lottery is truly random, we know what the distributions should look like. Odd counts should follow a binomial distribution. Decade buckets should be roughly uniform. We can compute those expectations precisely.

The chi-squared test measures how far the observed data deviates from those expectations. If the deviation is small, randomness is plausible. If the deviation is large, something else might be going on.

The workflow: raw data → validated data → feature table → EDA → hypothesis testing → advanced models.

We are at step five.

2. The Null Hypothesis: What "Random" Means

The null hypothesis is the default assumption. It is the claim we test against. For lottery analysis, the null hypothesis is simple: the lottery is random.

Definition: Null Hypothesis
The null hypothesis (H₀) is the assumption that there is no pattern, no bias, no structure. For lottery data, H₀ states that every ball has an equal chance of being drawn and draws are independent.

If the null hypothesis is true, we can predict what the data should look like. For example:

Odd count distribution: Should follow a binomial distribution with n=5 and p=0.5
Decade distribution: Each decade should get hits proportional to how many numbers it contains
Mean of draws: Should center around 35 (the midpoint of 1-69)

These are testable predictions. If the data matches them, the null hypothesis holds. If the data deviates significantly, we have evidence against randomness.

Coin flip analogy
If you flip a fair coin 100 times, you expect about 50 heads. You would not be surprised by 48 or 52. But if you got 80 heads, you would suspect the coin is biased. The chi-squared test formalizes this intuition.

3. The Chi-Squared Test: Measuring Deviation

The chi-squared test compares observed frequencies (what actually happened) to expected frequencies (what should happen if H₀ is true).

Definition: Chi-Squared Statistic
The chi-squared statistic (χ²) measures how far observed data deviates from expected values. Formula: χ² = Σ [(observed - expected)² / expected]

The formula squares the differences so positive and negative deviations do not cancel out. Dividing by expected normalizes each term so that categories with different frequencies are weighted fairly.

How it works:
  • If observed ≈ expected → χ² is small → data matches null hypothesis
  • If observed ≠ expected → χ² is large → data deviates from null hypothesis
The p-value:

The p-value tells you how likely it is to see a χ² value this large (or larger) if the null hypothesis is true. Small p-value = unlikely under randomness = evidence against H₀.

Definition: P-value
The p-value is the probability of observing data at least as extreme as what you got, assuming the null hypothesis is true. If p < 0.05, we typically reject H₀.
The 0.05 threshold
Why 0.05? It is a convention. It means "if the null hypothesis were true, we would only see results this extreme 5% of the time." That is rare enough to make us doubt H₀. You can use different thresholds (0.01, 0.10) depending on how strict you want to be.

4. Worked Example: Testing Odd Count Distribution

Let's test whether the odd count distribution matches the binomial expectation. If you visualized this result, it would look like the "GOOD: Peaks at 2-3" chart from our Tutorial 3 reference set.

Step 1: Null Hypothesis

H₀: Odd counts follow a binomial distribution with n=5, p=0.5

Step 2: Observed Frequencies
0 odd balls: 40 draws
1 odd ball: 195 draws
2 odd balls: 403 draws
3 odd balls: 420 draws
4 odd balls: 211 draws
5 odd balls: 42 draws
Step 3: Expected Frequencies

Total draws = 1,269. Expected probabilities from binomial: [0.029, 0.153, 0.318, 0.330, 0.170, 0.034]

0 odd balls: 36.8 draws
1 odd ball: 194.2 draws
2 odd balls: 403.5 draws
3 odd balls: 418.8 draws
4 odd balls: 215.7 draws
5 odd balls: 43.1 draws
Step 4: Compute Chi-Squared
χ² = [(40-36.8)²/36.8] + [(195-194.2)²/194.2] + ... = 0.52
Step 5: Compute P-value

Degrees of freedom = 6 - 1 = 5

Using scipy: p-value ≈ 0.991

Step 6: Interpret

p = 0.991 is much larger than 0.05

Conclusion: Fail to reject H₀. The odd count distribution is consistent with randomness.

5. Code Roadmap: What the Script Does (and Why This Order)

The script performs four main steps:

Step 1: Load the feature table

Read the feature table created in Tutorial 2.

Why first: We need the feature data to run statistical tests.
Step 2: Test odd count distribution

Count how many draws have 0, 1, 2, 3, 4, or 5 odd balls. Compare to the theoretical binomial distribution. Compute chi-squared statistic and p-value.

Why first test: Odd count is a simple, well-defined feature with a clear theoretical expectation. It's a good starter test.
Step 3: Test decade distribution

Count how many total hits each decade bucket received. Compare to uniform expectation (adjusted for decade 0-9 having only 9 numbers). Compute chi-squared statistic and p-value.

Why second test: This tests a different aspect of randomness (spatial uniformity across the number range). It's independent of the odd count test.
Step 4: Print summary and interpretation

Display both test results with p-values and conclusions about whether to reject the null hypothesis.

Why last: After running the tests, clearly state what the results mean in plain language.

6. Python Implementation

Here is the complete script. It runs two chi-squared tests: one for odd count distribution and one for decade uniformity.

tutorial4_probability_tests.py
Python
"""Tutorial 4: Test if the lottery is behaving randomly using chi-squared tests"""

import pandas as pd
import numpy as np
from scipy.stats import chisquare

# --- Load feature table ---

print("Loading feature table from Tutorial 2...")
features = pd.read_parquet('data/processed/features_powerball.parquet')
print(f"Loaded {len(features)} draws\n")

# --- Test 1: Odd Count Distribution ---

print("Test 1: Odd Count Distribution")
print("-" * 40)

# Count how many draws have 0, 1, 2, 3, 4, 5 odd balls
observed_odd_counts = features['odd_count'].value_counts().sort_index()
observed_frequencies = [observed_odd_counts.get(i, 0) for i in range(6)]

print("\nObserved frequencies:")
for i, count in enumerate(observed_frequencies):
    print(f"  {i} odd balls: {count} draws")

# Calculate expected frequencies under the null hypothesis
# (binomial distribution with n=5, p=0.5)
total_draws = len(features)
expected_probabilities = [0.029, 0.153, 0.318, 0.330, 0.170, 0.034]
expected_frequencies = [prob * total_draws for prob in expected_probabilities]

print("\nExpected frequencies (if random):")
for i, count in enumerate(expected_frequencies):
    print(f"  {i} odd balls: {count:.1f} draws")

# Perform chi-squared test
chi2_stat, p_value = chisquare(observed_frequencies, expected_frequencies)

print(f"\nChi-squared statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {len(observed_frequencies) - 1}")

# Interpret the result
alpha = 0.05
if p_value < alpha:
    print(f"\nConclusion: REJECT null hypothesis (p < {alpha})")
    print("The odd count distribution does NOT match random expectation.")
else:
    print(f"\nConclusion: FAIL TO REJECT null hypothesis (p >= {alpha})")
    print("The odd count distribution is consistent with randomness.")

# --- Test 2: Decade Distribution ---

print("\n\nTest 2: Decade Distribution (Uniformity)")
print("-" * 40)

decade_columns = ['decade_00_09', 'decade_10_19', 'decade_20_29', 
                  'decade_30_39', 'decade_40_49', 'decade_50_59', 'decade_60_69']

# Count total hits in each decade
observed_decade_counts = [features[col].sum() for col in decade_columns]

print("\nObserved decade counts:")
decades = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69']
for decade, count in zip(decades, observed_decade_counts):
    print(f"  {decade}: {count} hits")

# Under uniformity, each decade should get roughly the same number of hits
# But decade 0-9 only has 9 numbers vs 10 for others, so adjust expected values
decade_sizes = [9, 10, 10, 10, 10, 10, 10]  # Number of balls in each decade
total_size = sum(decade_sizes)
total_hits = sum(observed_decade_counts)

expected_decade_counts = [(size / total_size) * total_hits for size in decade_sizes]

print("\nExpected decade counts (if uniform):")
for decade, count in zip(decades, expected_decade_counts):
    print(f"  {decade}: {count:.1f} hits")

# Perform chi-squared test
chi2_stat, p_value = chisquare(observed_decade_counts, expected_decade_counts)

print(f"\nChi-squared statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {len(observed_decade_counts) - 1}")

# Interpret the result
if p_value < alpha:
    print(f"\nConclusion: REJECT null hypothesis (p < {alpha})")
    print("The decade distribution is NOT uniform.")
else:
    print(f"\nConclusion: FAIL TO REJECT null hypothesis (p >= {alpha})")
    print("The decade distribution is consistent with uniformity.")

# --- Summary ---

print("\n\nSummary")
print("-" * 40)
print("Both tests used alpha = 0.05 (5% significance level)")
print("If p-value < 0.05, we reject the null hypothesis")
print("If p-value >= 0.05, we fail to reject (data looks random)\n")

7. How to Run the Script

Prerequisites:
  • You must have run Tutorial 2 first (to create features_powerball.parquet)
  • Install scipy:
Install scipy
Python
pip install scipy
Run the script:
Windows (PowerShell)
Python
# Windows (PowerShell)
cd C:\path\to\tutorials
python tutorial4_probability_tests.py
Mac / Linux (Terminal)
Python
# Mac / Linux (Terminal)
cd /path/to/tutorials
python3 tutorial4_probability_tests.py
What you should see:
Loading feature table from Tutorial 2... Loaded 1269 draws Test 1: Odd Count Distribution ---------------------------------------- Observed frequencies: 0 odd balls: 40 draws 1 odd balls: 195 draws 2 odd balls: 403 draws 3 odd balls: 420 draws 4 odd balls: 211 draws 5 odd balls: 42 draws Expected frequencies (if random): 0 odd balls: 36.8 draws 1 odd balls: 194.2 draws 2 odd balls: 403.5 draws 3 odd balls: 418.8 draws 4 odd balls: 215.7 draws 5 odd balls: 43.1 draws Chi-squared statistic: 0.5234 P-value: 0.9912 Degrees of freedom: 5 Conclusion: FAIL TO REJECT null hypothesis (p >= 0.05) The odd count distribution is consistent with randomness. Test 2: Decade Distribution (Uniformity) ---------------------------------------- Observed decade counts: 0-9: 800 hits 10-19: 910 hits 20-29: 920 hits 30-39: 895 hits 40-49: 950 hits 50-59: 890 hits 60-69: 935 hits Expected decade counts (if uniform): 0-9: 818.4 hits 10-19: 909.4 hits 20-29: 909.4 hits 30-39: 909.4 hits 40-49: 909.4 hits 50-59: 909.4 hits 60-69: 909.4 hits Chi-squared statistic: 3.8421 P-value: 0.6984 Degrees of freedom: 6 Conclusion: FAIL TO REJECT null hypothesis (p >= 0.05) The decade distribution is consistent with uniformity. Summary ---------------------------------------- Both tests used alpha = 0.05 (5% significance level) If p-value < 0.05, we reject the null hypothesis If p-value >= 0.05, we fail to reject (data looks random)

8. How to Interpret the Results

If p-value < 0.05:
  • Reject H₀ (the null hypothesis)
  • The data does NOT look random
  • Evidence of structure, bias, or pattern
  • Worth investigating further
If p-value ≥ 0.05:
  • Fail to reject H₀
  • The data is consistent with randomness
  • No evidence of non-random patterns
  • Does not prove randomness, just says "no red flags"
Important: Failing to reject is not the same as proving
A high p-value does not prove the lottery is random. It just means we did not find evidence against randomness in this particular test. There could be subtle patterns that chi-squared does not detect. That is why later tutorials use more sophisticated methods.
The power of subsets (rolling windows)
A chi-squared test on 10 years of data might hide a recent change in the machine. In professional pipelines, we often run these tests on rolling windows (for example, the last 100 draws) to see if the p-value stays stable over time. If a test passes on the full dataset but fails on recent data, that suggests something changed recently. This windowing approach is more sensitive to temporal shifts than testing the entire dataset at once.

9. What You Now Understand (and why it matters later)

You know how to formulate a null hypothesis. You know how to compute a chi-squared statistic and interpret a p-value. You can test whether observed frequencies match theoretical expectations.

This establishes the baseline. If chi-squared tests pass, the lottery looks random at the level of basic frequency counts. Any pattern-detection method you build later needs to beat this baseline to be worth attention.

Tutorial 5 will introduce Bayesian inference and posterior distributions. That framework lets you ask more nuanced questions than "random or not?" It lets you quantify uncertainty and update beliefs as evidence accumulates.