Mar 20, 2026

7 minutes

Subsample Stability: What Happens When You Drop 60% of the Data

Algorithmic

Subsample Stability: What Happens When You Drop 60% of the Data

Overfitting is the silent killer of trading strategies.

It does not announce itself. It does not show up in the backtest results. It hides behind impressive numbers, behind smooth equity curves, behind the confidence that comes from seeing a large dataset agree with your thesis.

And then it destroys you in live trading.

I needed a way to find it before it found me.

The problem with backtests that look too good

Here is the uncomfortable truth about backtesting. A strategy that looks great on the full dataset might be held up by 5% of the data. A handful of outsized winners. A cluster of favorable sessions in one particular month. A brief regime where everything aligned perfectly.

Remove those trades and the whole thing collapses.

You would never know from the summary statistics. The win rate looks solid. The profit factor looks clean. The equity curve slopes upward. But underneath, the edge is fragile — concentrated in a thin slice of the data rather than distributed across the whole thing.

This is what overfitting looks like in practice. It is not a coding error. It is not a bad parameter choice. It is a structural dependency on a small number of favorable outcomes that happened to be included in the test.

The question I needed to answer was simple. Is the edge in the Algorithmic Suite distributed or concentrated?

The test: throw away 40% and measure what moves

Subsample stability testing is conceptually straightforward. Take the full dataset. Remove a large, random chunk of it. Measure the key metric. Repeat.

If the results barely change, the edge is distributed. It does not depend on any particular subset of trades or any particular cluster of sessions. The signal is everywhere.

If the results swing wildly, the edge is concentrated. It lives in a few critical trades or a few critical days. And concentrated edges do not survive contact with live markets.

I ran two versions of this test across the full 18-year ES futures dataset.

Signal dropout: randomly removing 40% of trades

The first test operates at the individual trade level.

Take the full set of 89,774 first-visit signals across 4,721 trading sessions. Randomly drop 40% of them. Measure the win rate on the remaining 60%. Repeat with a different random seed. And again. And again.

The result: 0.89 percentage points of total range.

Less than one percentage point of variance when throwing away 40% of the data at random.

That number matters. It means the edge is not carried by a few lucky trades. It is not dependent on catching one particular signal on one particular day. You can remove nearly half the trades — any half — and the framework produces the same result.

For context, a strategy with a concentrated edge might show 5, 8, or 10 percentage points of variance under this same test. Strategies built on a handful of outsized winners can swing even wider. The tighter the range, the more uniformly the edge is distributed.

Less than one percentage point is rock solid.

Session dropout: randomly removing 40% of trading days

The second test is harder. Instead of removing individual trades, it removes entire trading sessions.

This is a more demanding test because it eliminates all signals from a given day simultaneously. If the edge is concentrated in certain types of trading days — high-volatility sessions, trend days, specific days of the week — session dropout will expose it.

Same procedure. Take all 4,721 sessions. Randomly drop 40% of them. Measure the win rate on the signals from the remaining sessions. Repeat.

The result: 1.29 percentage points of total range.

Still extremely tight. Slightly wider than the signal dropout test, which is expected — removing entire days is a coarser, more aggressive perturbation than removing individual trades. But 1.29 percentage points of variance when discarding 40% of all trading days confirms the same conclusion.

The edge is not day-dependent. It does not live in one type of session. It is distributed across the entire 18-year history.

Signal decay: do later visits get worse?

This is a different question but equally important for practical trading.

Many indicators and level-based frameworks show a decay pattern. The first time price reaches a level, the reaction is strong. The second time, it is weaker. By the third or fourth visit, the level is spent. The information has been consumed.

This matters operationally. If signal decay is real, you need to catch the first touch or the edge disappears. Miss it, and you are trading a degraded setup.

I needed to know whether the Algorithmic Suite exhibited this pattern.

The signal decay analysis measures win rate by visit number — first visit to a level, second visit, third visit, fourth and beyond. Same level, same session, sequential visits.

The result:

Visit | Win Rate

Visit 1 | Baseline

Visit 2 | Slightly above baseline

Visit 3 | Slightly above Visit 2

Visit 4+ | Highest of all visits

No decay. None.

Later signals are slightly stronger, not weaker. The win rate does not erode with repeated visits to the same level. It holds — and if anything, it improves marginally as the session develops.

This is unusual. Most level-based frameworks show measurable degradation after the first touch. The Algorithmic Suite does not.

Why signal decay matters for real trading

The absence of signal decay is not just a statistical curiosity. It has direct operational implications.

In live trading, you do not always catch the first touch. You might be away from the screen. You might be managing another position. You might see the setup develop but hesitate. By the time you are ready to act, it is the second or third visit to that level.

With a framework that decays, those later entries are objectively worse trades. You are trading a weaker version of the setup. You know it. And that knowledge erodes confidence, which erodes execution, which erodes results.

With a framework that does not decay, the third visit is the same trade as the first visit. You have not missed anything. The edge is still there. That changes how you manage your attention, your entries, and your session.

Knowing that visit four is as valid as visit one is not a small thing.

Half-year consistency: no structural breaks within years

Another way to test for hidden fragility is to split each year in half and compare the first six months to the second six months. If the framework is stable, the two halves should produce similar results. If you see large gaps, something is shifting within the year — a regime change, a seasonal dependency, a structural break.

Across the full 18-year dataset, the maximum difference between first-half and second-half win rate in any single year was 6.4 percentage points.

That is the worst case. Most years showed much tighter convergence.

No structural breaks. No evidence that the framework works in the first half of a year and fails in the second, or vice versa. The edge is persistent within years as well as across them.

Directional balance: 50.4% bullish, 49.6% bearish

One more dimension of stability worth noting.

Across all 89,774 first-visit signals, the split between bullish and bearish signals is 50.4% to 49.6%.

Perfectly balanced. No directional bias.

This matters because a framework with a directional skew — one that generates far more bullish signals than bearish, or vice versa — is implicitly making a market direction bet. In a bull market, the excess bullish signals inflate the win rate. When the market turns, the framework turns with it.

A balanced signal distribution means the framework is direction-neutral. It works in both directions equally. The edge is structural, not directional.

What all of this means together

Each of these tests answers a different question.

Signal dropout answers: Is the edge carried by a few lucky trades? No. Less than 1 percentage point of variance when removing 40%.

Session dropout answers: Is the edge carried by a few lucky days? No. 1.29 percentage points of variance when removing 40% of all sessions.

Signal decay answers: Do later signals at the same level degrade? No. Visit 4+ performs slightly better than visit 1.

Half-year consistency answers: Are there structural breaks within years? No. Maximum 6.4 percentage points between halves, and that is the worst case.

Directional balance answers: Is the framework secretly a directional bet? No. 50.4/49.6 split.

Taken individually, each result is reassuring. Taken together, they paint a clear picture.

The edge in the Algorithmic Suite is uniformly distributed across trades, across sessions, across visit sequences, across halves of years, and across market directions. It is not concentrated. It is not fragile. It is not dependent on catching a specific setup or being present on a specific day.

That is what subsample stability looks like when the underlying framework is doing real work.

Why most strategies fail this test

I want to be direct about this.

Most retail trading strategies cannot pass these tests. Not because the traders who built them are dishonest. Because most strategies were never tested this way.

Optimizing a strategy on the full dataset and then reporting the results of that same dataset is not validation. It is circular. The parameters were chosen because they worked on that data. Of course the results look good.

Real validation means attacking the results from every angle you can construct. Walk-forward testing. Monte Carlo simulation. Maximum adverse excursion. Subsample stability. Signal decay. Regime analysis.

And then publishing the process, not just the outcome.

The Algorithmic Suite

Midnight Grid. Quantum Vision. Turning Points.

Three indicators. One framework. An edge distributed across 89,774 signals and 4,721 sessions — stable when you remove 40% of either.

Start Your 7-Day Free Trial

Algorithmic is charting software for decision support on TradingView. It is not financial advice. Trading involves risk. Outcomes depend on your rules, risk management, and execution. Past performance does not guarantee future results.

Data Sources (for fact-check)

89,774 first-visit signals: from full-signal-analysis.py, 14 MG levels, NY MO removed
4,721 sessions: from same backtest run
Signal dropout 0.89pp: from VALIDATION_PLAYBOOK.md subsample stability test
Session dropout 1.29pp: from same
Signal decay visit numbers: from signal decay analysis in validation suite
Half-year max 6.4pp: from half-year consistency test
Directional split 50.4/49.6: from bull/bear signal balance in grand aggregate

More Blogs for You

The Prop Firm Trader's Daily Routine for ES Futures

Apr 12, 2026

Trailing Drawdown Rules Explained: Why Structure Beats Impulse

Apr 10, 2026

Drawdown Management for Funded Futures Accounts: A Quantitative Framework

Apr 8, 2026