Lottery Lab Research
A comprehensive analysis testing whether machine learning and advanced statistics can predict lottery numbers. Spoiler: they can't.
Methodologies & Techniques
Complete Research Modules
Click any module to explore the full methodology, results, and analysis
MODULE 1: Data Validation & Format Standardization
Before I could say anything serious about lottery randomness, I needed to make sure the data itself was clean, consistent, and comparable. Lotteries quietly change their formats over time: they adjust number ranges, tweak odds, and introduce new features. That creates a critical problem for any analysis that pretends all drawings are coming from the same process. If you pool drawings from incompatible formats, you silently break the mathematical assumptions behind almost every statistical test. This module walks through how I validated 3,358 drawings across more than 15 years, identified all format transitions, and then filtered down to analysis-ready datasets that actually meet the core requirement for everything that follows: each drawing can be treated as one sample from a single, stable underlying probability distribution.
A lot of casual lottery analysis starts with ideas like "hot" and "cold" numbers. It sounds simple: count how often each number appears, then look for outliers. The problem is that this only makes sense if every drawing in your dataset comes from the same game definition. If your Powerball dataset spans the October 2015 format change, when the game switched from 5/59 to 5/69 for the main numbers, any statistical test that treats all drawings as identically distributed is already invalid. Numbers 60 through 69 literally could not appear before 2015 because they were not part of the game. Treating "missing" 60s as evidence that 60 is cold is not insight, it is a modeling error.
This is not a minor technical nuisance. Most of the methods I use later in the project rely on the assumption that each observation is drawn from the same underlying probability distribution. In statistics this is the identically distributed part of the i.i.d. assumption. When a lottery changes its format, it changes the sample space and the probabilities. At that point, you are no longer looking at one process over time, you are looking at several different experiments stitched together.
Lotteries are attractive as testbeds for statistical methods precisely because the draws are designed to be as close to provably random as we can make them. At the same time, the long operational history of these games means their rules have evolved intentionally: number ranges have shifted, odds have been pushed up to grow jackpots, and new play options have been introduced. Those changes make the raw historical data heterogeneous. If I ignore that and treat the whole history as if it came from one fixed game, I would be building subtle but serious violations straight into the foundation of my analysis.
In total, I scraped 3,358 drawings going back to 2010: 1,858 for Powerball and 1,500 for Mega Millions. About 29 percent of those drawings come from older formats that are not compatible with the current game structure. That left me with a practical choice that affects the rest of the project: either use all the data and accept that some methods are applied outside their ideal assumptions, or restrict the analysis to format-compatible draws and lose almost a third of the observations.
I opted for the format-consistent subset. Having fewer draws that better match the model assumptions felt preferable to stretching those assumptions across incompatible formats. That choice carries through into every later module. The final analysis datasets contain only current-format drawings: 1,550 for Powerball (2015 to present) and 830 for Mega Millions (2017 to present). From this point forward, when I talk about "the Powerball dataset" or "the Mega Millions dataset," I mean these filtered, format-consistent subsets. Every hypothesis test, visualization, Bayesian model, and machine learning experiment in later modules is built on the assumption that these drawings are independent realizations from a single well-defined distribution. This module documents the steps I took to support that assumption.
MODULE 2: Advanced Feature Engineering for Lottery Drawing Analysis
This module takes the raw lottery drawings from Module 1 and turns them into a dense, high-dimensional feature space. For every Powerball and Mega Millions drawing, I engineered over one hundred numerical features that summarize distributional properties, temporal behavior, parity patterns, gaps since last appearance, co-occurrence structure, positional effects, seasonality, and jackpot context. The idea is straightforward: if there is any systematic structure in these games, it should show up somewhere in this 108-dimensional description of each draw. If the games are truly random, these features should behave like noise, and later modules are set up to test that assumption.
A single drawing by itself does not tell you much. Take an example like [12, 23, 34, 45, 56] with a Powerball of 15. It is just six integers on a page. What matters for research is how drawings behave over time: how often numbers show up, how tightly they bunch together, whether the mix of high and low numbers drifts, and whether anything about one drawing tells you the slightest thing about the next.
Feature engineering is how I make those questions precise. Instead of staring at the raw numbers, I convert each draw into a vector of 108 measurements: means, ranges, parity counts, gap statistics, rolling-window summaries, network-based co-occurrence metrics, calendar indicators, jackpot levels, and more. Each feature encodes a specific hypothesis people like to argue about in practice: hot and cold numbers, overdue numbers, lucky primes, suspicious streaks, calendar effects, and so on.
Under the null hypothesis that the draws are i.i.d. and uniform over the allowed numbers, none of these features should predict anything about the future. They should line up with their theoretical distributions, show no temporal structure, and collapse to pure noise once I try to forecast ahead. The purpose of Module 2 is to build a feature space rich enough that this null hypothesis is actually testable in Modules 3–8 instead of being assumed without evidence.
MODULE 3: Visualization & Exploratory Analysis
With 108 features per drawing, each lottery draw lives in a 108-dimensional space that no human can visualize directly. I used several complementary visualization tools—PCA, t-SNE, UMAP, and autoencoders—plus frequency heatmaps, gap-length distributions, and correlation analysis to look for structure: clusters, manifolds, trends over time, anything that might hint at exploitable patterns. Across all of these methods, the data consistently behaved like high-dimensional random noise. No clear clusters, no obvious hidden geometry, no persistent hot or cold regions—just the kind of uniform scatter you would expect from a well-behaved random process.
High-dimensional data is hard to reason about. With 108 engineered features per drawing, each Powerball or Mega Millions draw is a point in 108-dimensional space. Patterns that would be obvious in 2D or 3D—clusters, gradients, smooth curves—can be completely invisible once you move into very high dimensions.
In 2D you can glance at a scatterplot and see whether points form clumps or follow a line. In 108D, two things go wrong: distances between points all start to look similar, and our geometric intuitions stop working. This is the classic curse of dimensionality.
To get any intuition back, I used dimensionality reduction: compressing 108 features down to 2 or 3 while trying to preserve the important structure of the data. Different techniques preserve different notions of structure—linear vs. nonlinear, local neighborhoods vs. global shape, deterministic vs. stochastic. If there were any real structure hiding in the feature space, at least one of these methods should have revealed it.
The plan for this module was:
- PCA to look for low-dimensional linear structure.
- t-SNE and UMAP to look for nonlinear manifolds and clusters.
- Autoencoders to test whether a neural network could discover a compact latent representation.
- Classical visualizations (heatmaps, gap distributions, correlation plots, time-series) to sanity-check basic behaviors.
What follows is the story of running all of those tests and seeing them repeatedly line up with the randomness hypothesis.
MODULE 4: Bayesian Model Zoo
Frequentist tests are good at telling you when something is off, but they are terrible at expressing confidence when everything looks fine. A p-value can say “we didn’t see enough evidence to reject randomness,” but it can’t say “we have strong evidence that the lottery really is random.” Module 4 is my attempt to address that gap. Here I put together a Bayesian model zoo: five separate Bayesian models that attack lottery randomness from different angles. Across Powerball and Mega Millions, the results are surprisingly consistent: decisive evidence for uniformity, moderate evidence for temporal independence, strong validation of the physical drawing process through position effects, extremely strong pooling toward uniformity, and zero predictive power from 108 engineered features. This module serves as a statistical core for the entire project: it helps turn “probably random” into “here is quantitative, multi-angle evidence for randomness.”
A typical lottery analysis might end with something like: “we ran a chi-square test and failed to reject the null.” That is technically correct, but not very satisfying. It leaves open all the usual worries: maybe the test did not have enough power, maybe there is bias we just did not happen to see, or maybe uniformity is only one of many plausible explanations.
Bayesian inference lets me ask a stronger question: Given the data I actually observed, how much more likely is it that the lottery is random than that it is biased? That comparison is captured by the Bayes Factor (BF). BF = 10 means the data are ten times more likely under the null than under the alternative; BF = 0.1 means the data are ten times more likely under the alternative. Crucially, Bayes Factors allow me to quantify evidence for the null, not just against it.
Bayesian models also provide full posterior distributions over parameters instead of a single point estimate. That means I can compute credible intervals with direct probabilistic meaning (for example, “there is a 95% probability that θ lies in this interval”), visualize uncertainty, and reason about higher-level quantities such as the concentration parameter κ in the hierarchical Dirichlet model that summarizes how strongly the data want to pull everything toward uniformity.
In other words, Module 4 is where I can move from saying “we didn’t see evidence of non-randomness” to saying: “we saw overwhelming, quantified evidence for randomness, and here is exactly how strong that evidence is.”
MODULE 5: Deep Learning Model Zoo
Bayesian methods in Module 4 found essentially zero predictive signal in the 108 engineered features. But what if the patterns are nonlinear, high-dimensional, or too complex for traditional statistics? In this module I tried to test that possibility with a deep-learning "model zoo": Variational Autoencoders, Normalizing Flows, Transformers with Bayesian uncertainty heads (four variants), DeepSets, and a Bayesian Neural Network. After cleaning up the target definition so every model predicts the same scalar response (the sum-of-z score used in Modules 4 and 6) and re-aggregating results, the story becomes very simple: all architectures achieve baseline-level performance (RMSE ≈ 0.84–1.0 on the standardized target) and well-calibrated uncertainty once I apply a single global scale factor. Even the more complex attention-based models agree with the humble historical-mean baseline: there is nothing to learn. This module also sets up a key methodological lesson—when results look too good, you first check the target definition and validation scheme (Module 11), not the architecture.
Module 4's Bayesian regression found R² < 0.02, meaning linear models can't predict lottery numbers at all. Deep learning, in contrast, is built to mine structure in high-dimensional, nonlinear data. If lottery drawings hide subtle interactions between engineered features—gaps, recency, co-occurrence statistics, autoregressive summaries—this is where transformers, flows, and Bayesian neural nets should shine.
In principle, deep models could exploit many different kinds of structure:
- Attention mechanisms might discover long-range temporal dependencies that AR models miss.
- Variational autoencoders could compress drawings into a low-dimensional latent manifold if any hidden structure exists.
- Normalizing flows can represent extremely flexible probability distributions and would deviate from a simple Gaussian or uniform law if there were multi-modal clusters.
- DeepSets architectures are specifically designed for unordered sets and should exploit any set-level regularities in the five white balls.
- Bayesian neural networks provide full predictive distributions; if there were weak but real signals, they should reflect both improved accuracy and calibrated uncertainty.
All models were asked to predict the same scalar target: a standardized score related to the sum of the five white balls (the same target used in the corrected Module 4 and in Module 6). This keeps the comparison apples-to-apples across architectures. If any network significantly beat the historical-mean baseline, that would be evidence for exploitable patterns.
MODULE 6: Calibration, Ensembling, and Diagnostic Evaluation
Module 5 showed that none of the deep learning models materially improved upon simple baselines on the standardized lottery target. In Module 6, I shifted focus from point accuracy to uncertainty: not just what models predicted, but how well their predictive intervals matched reality. I evaluated calibration for every model, then built several ensembles to see whether combining models could improve reliability. A consistent pattern emerged: DeepSets and some transformer variants produced stable, conservative uncertainty estimates; the Bayesian neural network (BNN) was sharply overconfident; and ensemble performance depended heavily on how weights were assigned. These results provided a small set of calibrated models I could safely use in later simulation and diagnostic modules.
Module 5 evaluated models at the level of point estimates (mean predictions) using RMSE. That established the limits of predictive accuracy but did not address uncertainty. For the later modules, I needed models that not only predicted means but also quantified how uncertain they were.
In Module 6, I treated each model as a probabilistic forecaster and asked three questions:
- RQ1: Calibration. Did empirical frequencies match the model’s stated probabilities over many draws?
- RQ2: Ensembling. Could combining models improve calibration or accuracy?
- RQ3: Downstream use. Which calibrated models were reliable enough to support later simulation and structural analysis?
Powerball and Mega Millions were analyzed separately, but with identical code paths and diagnostics. This made it straightforward to separate dataset effects from model behavior.
MODULE 7: Universe Generation and Realism Evaluation
Up to Module 6 I had mostly asked predictive questions: can any model say something useful about the next lottery draw, and if not, how honest are its uncertainty estimates about that failure? In Module 7 I flipped the problem around. Instead of predicting the next draw, I trained deep generative models to create entire synthetic "universes" of Powerball and Mega Millions drawings and then asked how realistic those universes looked compared to history. This raised a subtle issue: even a generator that sampled exactly from the true uniform lottery law could not match a finite historical dataset perfectly. The gap between those two set an upper bound on how well any non-resampling generative model could ever do.
Deep generative models are usually evaluated on rich, structured data: images, audio, and text. In those settings there is a lot to learn, and the question is whether a model can exploit that structure. Lottery data live at the opposite extreme. The physical mechanism is deliberately designed to approximate independent, uniform sampling over a discrete space. By the time I reached this module, the previous parts of Lottery Lab had already shown that point predictors and calibrated forecasters could not beat trivial baselines.
That led to a different question: if the data are essentially random, what does it mean for a generative model to be "good"? How close could a model reasonably get to the empirical distribution of historical draws, given that the theoretical law is uniform but the observed sample is finite and noisy? Answering that question required me to define a notion of universe realism and to quantify the gap between three objects:
- the true lottery law (uniform over valid ticket combinations),
- the empirical distribution of actual historical draws, and
- the synthetic universes generated by various models.
In other words, Module 7 was less about squeezing out the last bit of predictive accuracy and more about defining what “realistic” means when the ground truth process is random and high entropy.
MODULE 8: Searching for Causal Structure in Noise
By the time I reached this module, I had used predictive modeling, feature-based statistics, and several machine learning approaches. None of them uncovered meaningful structure in the lottery data. Those methods were designed to learn symmetric or correlational relationships. They did not test whether the outcome of one ball influenced another in time or through hidden interactions. I wanted to know if the system had any directional or asymmetric influence. If such influence existed, it might appear in nonlinear or delayed patterns rather than in simple statistical moments. This motivated a deeper investigation into causal inference and information flow.
MODULE 9: Network Analysis of Lottery Numbers
After deep learning failed to extract predictive information from the engineered features, I shifted focus from forecasting individual outcomes to examining structural relationships. Instead of asking whether I could predict the next draw, I asked whether the numbers themselves could be studied as a network. I wanted to know if certain numbers tended to co-occur, if they formed cliques or hubs, or if some numbers played a distinctive role over time. Network science provided a different lens. If there were subtle, stable, or hierarchical patterns in how numbers appeared together, they might show up as structure in co-occurrence networks, transition graphs, centrality metrics, or spectral properties. If the lottery mechanism had any hidden bias, this framework was designed to give it a chance to surface.
I constructed two types of networks for both lotteries: co-occurrence graphs and transition graphs. In co-occurrence graphs, nodes represented individual balls and edges represented how often pairs of balls appeared together in the same drawing. In transition graphs, edges captured temporal relationships, connecting balls based on whether one tended to follow another across draws. These representations let me treat the lottery as a network, where each ball had a pattern of connections determined by co-occurrence and ordering.
If structure existed in the system, I expected to see evidence through community detection, modularity, heavy-tailed degree distributions, or persistent patterns in centrality. For transition graphs, directed edges could reveal asymmetric behavior, such as one ball consistently appearing after another. I built networks for Powerball and Mega Millions separately, then compared their properties to null models to see whether any observed structure was meaningful rather than incidental.
Module 10: Manifold Analysis of Lottery Draws
Module 9 convinced me that lottery balls do not behave like nodes in a structured interaction network. There were no cliques, no influencers, no tightly-knit communities. But that negative result did not eliminate the possibility that structure might exist in a different form. Relationships can be absent while geometry remains. Perhaps the data lived on a low-dimensional manifold: a curved surface, hidden beneath engineered features, not visible to graph algorithms but detectable through geometry. In this module I tried to answer a focused question: do lottery draws lie on a structured geometric object, or do they occupy high-dimensional feature space in a way that is effectively unconstrained?
MODULE 11: Summary and Lessons Learned
Module 11 was not an additional experiment. It was an opportunity to step back and interpret the project as a whole. I had spent months building models, running experiments, debugging failures, and documenting results. The final step was to understand what the work revealed about the lottery, what it revealed about empirical methods, and what it revealed about how I approached the problem. The project began with the premise that there might be hidden structure in lottery drawings. It ended with a stronger and more specific conclusion: the system behaved like a well-engineered randomization mechanism. Structure that appeared along the way was best explained by sampling variability, engineering choices, or temporal confounding rather than by a genuine predictive signal.
Research Conclusion
All 11 independent methodologies converged on the same conclusion with 100% validity: lottery drawings (Powerball and Mega Millions) are genuinely random with no exploitable patterns.
This research refutes common gambling fallacies (hot/cold numbers, due numbers, pattern prediction) and provides rigorous evidence for lottery integrity using Bayesian inference, deep learning, causal inference, network analysis, and topological methods.