·

7 minutes

Can Machine Learning Beat the Base Rate? What Random Forest and XGBoost Found

Algorithmic

Can Machine Learning Beat the Base Rate? What Random Forest and XGBoost Found

I needed to answer one question.

The Algorithmic Suite framework produces a base rate. Across 18 years of ES futures data, when a Turning Points signal appears near a Midnight Grid level, a defined outcome follows at a measurable win rate. I have written about the research behind that number, about the Sharpe and Sortino ratios it produces, and about the signal quality at entry.

But there was a question I had not answered.

Can machine learning find something better?

The obvious expectation

If you have a dataset with 89,774 trade signals, each tagged with a dozen metadata features — time of day, day of week, signal direction, volume state, proximity to the level, level type, session position, volatility regime, market direction — the natural assumption is that a modern ML model will find patterns in that metadata. Some combination of features that predicts which signals win and which do not.

That is what machine learning does. It finds structure in data.

So I ran the experiment.

What I tested

I trained two standard classification models on the full signal dataset.

Random Forest. The workhorse of tabular classification. An ensemble of decision trees, each trained on a random subset of features and observations. It handles nonlinear relationships, interaction effects, and noisy features without manual engineering. It is the first model any serious data scientist reaches for when the data lives in rows and columns.

XGBoost. The gradient-boosted variant. Typically more powerful than Random Forest on structured data. It learns sequentially, with each new tree correcting the errors of the previous ensemble. It dominates machine learning competitions on tabular problems for a reason.

Both models were given every feature I could extract from the data:

  • Time of day (hour of signal)

  • Day of week

  • Signal direction (bullish or bearish)

  • Volume relative to the 20-period moving average

  • Proximity to the Midnight Grid level (distance in points)

  • Level type (which of the 14 MG levels)

  • Session position (pre-market, RTH open, midday, close, after-hours)

  • Volatility regime (5 categories from historically calm to extreme)

  • Market direction (trending up, trending down, flat)

Every feature that could plausibly predict signal outcome was included. Nothing was held back.

The target variable was binary. Win or loss, defined by the framework's target and stop parameters.

The results

The framework has a base rate — the win rate you get by taking every qualifying signal without any filtering.

Random Forest accuracy: identical to the base rate. It could not improve on it by a single basis point.

XGBoost accuracy: identical to the base rate. Same result.

I ran 5-fold cross-validation to ensure this was not a fluke of the train-test split. Across all five folds, ML accuracy never exceeded the base rate by more than 2 percentage points. In most folds, it was within a fraction of a point.

There is no fold where the model suddenly found the pattern. There is no subset of the data where ML pulled ahead.

What SHAP values revealed

When a model cannot beat the base rate, the next question is why. SHAP (SHapley Additive exPlanations) values decompose each prediction into the contribution of every individual feature. They tell you which features are driving the model's decisions — and by how much.

The answer was unambiguous.

No single feature had a mean absolute SHAP value above 0.5. Not time of day. Not day of week. Not volume. Not proximity. Not level type. Not session position. Not volatility regime. Not market direction.

None of them.

Every feature that I tested — every dimension that a reasonable person would expect to matter — contributes almost nothing to predicting whether a given signal wins or loses.

What about feature combinations?

This is the natural follow-up. Maybe no single feature matters, but maybe pairs of features interact in ways that produce a meaningful edge. Friday afternoon sessions. High-volume mornings. Bearish signals near specific level types in low-volatility regimes.

I tested for interaction effects systematically.

The maximum synergy I found was Friday PM sessions: +3.6 percentage points above the base rate. Most interactions were near zero. There are no hidden high-impact combinations waiting to be discovered.

As a final check, I ran logistic regression — the simplest possible model, designed to find linear confounders. It achieved base rate plus 1 percentage point. No hidden confounders. No suppressed signal.

Why this is counterintuitive

This result goes against everything the "AI trading" marketing machine tells you.

The promise, everywhere, is that ML will find the edge. Feed it enough data, enough features, enough compute, and the pattern will emerge. The secret combination of time, volume, and market state that separates winners from losers.

I tested that promise with institutional-grade models on 18 years of data, using every feature available.

The promise is wrong.

Not because the models are bad. Random Forest and XGBoost are genuinely powerful tools. They are the standard for tabular prediction tasks across finance, medicine, and engineering. If there were a signal in the metadata, they would have found it.

There is no signal in the metadata.

Why this is actually the strongest possible result

This is where the logic inverts. And it is the part that matters most.

If machine learning could predict which signals would win based on time of day, or volume state, or level type — what would that actually mean?

It would mean the edge was not in the framework. It was in the filter. It would mean the signals themselves were unreliable, and the only way to profit was to apply a metadata screen that selected the good ones and discarded the rest.

That would be a fragile edge.

Filters can be arbitraged. If Friday afternoon signals are 10 points better than Monday morning signals, that information becomes known. Market participants adjust. The edge erodes. The filter that worked in the backtest stops working in live conditions. And you are left with a framework whose structural foundation was never the source of its returns.

The Algorithmic Suite result is the opposite of that.

The edge is not in any filter. It is not concentrated in a specific time of day, a specific level type, a specific volume condition, or a specific market regime. The edge is distributed across all of them. Uniformly. Consistently. Across 18 years.

The edge is structural. It lives in the interaction between price and the framework's levels. Not in the metadata surrounding that interaction.

That is not fragile. That is the definition of robust.

What this means in practice

When SHAP values show that no feature is important, it means something specific and practical.

There is no "best time" to use the framework. There is no "best level type" that carries the entire result. There is no volume condition you need to wait for. There is no day of the week you should avoid.

The framework works because the levels work. Not because of when, or how, or under what conditions the levels are engaged.

For a trader, this is simplifying. You do not need to memorize a matrix of optimal conditions. You do not need to check a dashboard of feature states before acting. The framework's decision support is the same at 9:30 AM on a Monday and at 2:45 PM on a Friday. The base rate holds across all of them.

The contrast with the industry

Most products in the "AI trading" space make the opposite claim. They promise that their machine learning model found the hidden pattern. The secret feature combination. The proprietary signal that only their neural network can detect.

Ask them one question: what is the base rate?

If they cannot tell you what happens without their ML filter — what the raw signal quality looks like before any metadata screening — they have not done the foundational work. They are optimizing a filter without knowing whether the thing being filtered has any edge to begin with.

The Algorithmic Suite answered the base rate question first. Then it tested whether ML could improve on it. The answer was no. And that answer is more valuable than a model that claims to predict with 85% accuracy on a feature set that was never validated against a structural baseline.

The full validation chain

This ML experiment is one link in a longer chain of quantitative validation.

The framework was also tested with Monte Carlo permutation analysis — 2,000 random shuffles, p-value below 0.0005. Walk-forward out-of-sample testing across every year used as a holdout period. Sharpe and Sortino ratios computed daily across the full 18-year dataset. Maximum Adverse Excursion analysis measuring entry precision at the tick level.

Every test confirms the same conclusion from a different angle. The edge is real. It is structural. It is not an artifact of overfitting, selection bias, or metadata screening.

The ML experiment is the version of that conclusion that speaks most directly to the current moment. In an era where every trading product claims AI as its differentiator, the Algorithmic Suite ran the experiment honestly and published the result.

Machine learning could not beat the base rate. Because the base rate is the edge.

The Algorithmic Suite

Midnight Grid. Quantum Vision. Turning Points.

Three indicators. One framework. An edge that ML could not improve on because it is already structural.

Available on your TradingView charts today.

Start Your 7-Day Free Trial

Algorithmic is charting software for decision support on TradingView. It is not financial advice. Trading involves risk. Outcomes depend on your rules, risk management, and execution. Past performance does not guarantee future results.