Mar 16, 2026

7 minutes

18 Years, 18 Different Markets: Year-by-Year Performance of a Futures Framework

Algorithmic

18 Years of Backtesting: Year-by-Year Performance of a Futures Framework

A backtest that starts in 2020 is not a backtest. It is a coincidence with a date range.

I say that without qualification. If a trading framework has only been tested on the last few years of data, you do not know what it does. You know what it did during one particular market environment. The moment conditions shift — and they always shift — you find out whether you had an edge or a lucky streak.

I needed to know the difference. So I tested every calendar year separately.

This post is the year-by-year breakdown. No cherry-picked start date. No selective time window. January 2008 through early 2026. 89,774 qualifying signals across 4,721 trading sessions. Every year laid out, including the one that lost money.

Why year-by-year matters

The Index Futures Research Behind the Algorithmic Suite describes the full scope of the research — the 6.3 million price bars, the three independent verification pipelines, the tens of billions of analytical permutations.

The walk-forward testing analysis shows how the framework holds up when tested on data it has never seen.

But there is a question that aggregate numbers cannot answer. A framework can show a positive expected value across 18 years and still contain three or four catastrophic years hidden inside an otherwise strong average. Aggregate performance can mask periods where the edge disappeared entirely. Where it inverted. Where it would have destroyed a live account.

The only way to know is to break it apart. Year by year. Every single one.

Most vendors never do this. Not because they cannot. Because some years are embarrassing. And the honest thing to do when a year is embarrassing is to show it anyway.

The results: 19 calendar years, 18 profitable

The Algorithmic Suite framework was tested across every calendar year from 2008 through early 2026. That is 19 distinct calendar-year periods.

18 of them were profitable.

One was not.

That one year was 2012. The loss was small — well within normal variance for any probabilistic framework operating across thousands of sessions. It was not a blowup. It was not a structural failure. It was a slightly negative year in a low-volatility, range-bound market where the conditions were uniquely difficult for level-based decision support.

I could have excluded it. I could have started the backtest in 2013 and no one would have noticed. That is not how I operate. One slightly negative year out of 19 is expected variance. It is not failure. Hiding it would be.

Walking through the eras

Each year in this dataset represents a different market. That is not a metaphor. The ES futures contract in 2008 and the ES futures contract in 2017 might share a ticker symbol, but they share almost nothing else. Volatility regimes, participation profiles, correlation structures, overnight gap behavior — all different.

A framework that survives all of them is telling you something.

2008: The financial crisis. The most violent market conditions in a generation. VIX at 80. Circuit breakers. Multi-point gaps in minutes. The framework was profitable. This is the year that eliminates most strategies before they even reach the next decade.

2009: The recovery. A V-shaped reversal that caught the entire industry off guard. Massive directional moves followed by months of grinding consolidation. Profitable.

2010-2011: European contagion. Flash crashes, a US credit downgrade, repeated spikes of volatility inside a generally recovering market. Both years profitable. The kind of environment where patterns that look clean in a calm backtest start to crack — and these did not.

2012: The loss year. A compressed, low-volatility, range-bound market. The VIX averaged its lowest levels in years. The framework produced a small net loss. Not enough to be statistically meaningful. But enough to be honest about.

2013-2015: The QE era. Subdued volatility, drifting markets, compressed daily ranges. Tools built for reactive, level-based analysis struggle when price barely moves. The framework remained profitable through all three years, though returns were modest relative to higher-volatility periods. That is expected behavior.

2016: Brexit and the election. Two separate overnight gap events that moved ES by dozens of points in hours. If a framework cannot handle surprise gaps, this is where you find out. Profitable.

2017: Historic calm. The VIX hit all-time lows. This is the graveyard of strategies that depend on volatility. Profitable. The edge was smaller, but it was there.

2018: Volatility returns. The February vol spike, a rate-hike cycle, and a Q4 selloff of nearly 20%. A year that punished strategies on both sides — trend-following and mean-reversion. Profitable.

2019: Trade war whipsaws. Headline risk on tariff announcements, whipsaw moves, then an abrupt Fed pivot and year-end melt-up. Profitable.

2020: COVID. The fastest bear market in history — 35% in 23 trading days. Followed immediately by a V-shaped recovery. Circuit breakers triggered multiple times in a single week. Then new all-time highs by August. If any year should have broken a level-based framework, this was it. Profitable.

2021: The meme era. Retail participation at all-time highs. Gamma squeezes in individual names. Compressed realized volatility in the index despite extraordinary single-stock moves. Profitable.

2022: The rate hike cycle. The fastest pace of Fed tightening in 40 years. A 27% bear market. Sustained directional pressure unlike anything since 2008. Profitable.

2023-2026: The current era. Banking contagion, AI-driven rotation, a soft landing attempt, rate cuts, election-year volatility. Profitable in every period tested through early 2026.

The pattern worth noting

There is a structural observation in the year-by-year data that I want to present honestly and carefully.

The average win rate in the earlier era (2008 through 2017) was meaningfully lower than the more recent era (2018 through 2026).

The edge has strengthened in recent years.

I want to be transparent about what this could mean. It could indicate a genuine regime shift — increased algorithmic participation, tighter spreads, and higher liquidity making level-based interactions more precise and more predictable. It could also be an artifact of the specific market environments in the recent period. The only honest position is to note the pattern, acknowledge both explanations, and avoid claiming certainty about which one is correct.

What I can say is this: the framework was profitable in both eras. The lower win rate years were still positive. And the recent strengthening, whatever its cause, has been consistent across multiple years rather than concentrated in one or two outlier periods.

Half-year consistency

A framework can be profitable for a calendar year and still contain a catastrophic half. A strong first half can mask a collapsing second half. Or a slow start can be rescued by a single exceptional month.

I tested for this directly. Across all 19 calendar-year periods, the maximum difference between first-half and second-half win rate within any single year was 6.4 percentage points.

That is remarkably stable. No structural breaks. No year where the framework worked for six months and then stopped working. No year where the annual result was carried by a single quarter.

The edge is distributed across the calendar year. It does not cluster.

Walk-forward rolling windows

The walk-forward testing analysis describes the methodology in detail. The short version: you train the framework on a portion of the data, then test it on a period it has never seen. If the edge survives out-of-sample, it is more likely real.

When I applied rolling walk-forward windows across the full 18-year dataset, every single test window was profitable. 14 out of 14 windows. The edge persisted in every rolling period regardless of which years were used for training and which were used for testing.

Combined with the year-by-year results, this tells a consistent story. The edge is not dependent on any particular year or era. It is not concentrated in a favorable stretch. It survives when you move the window.

What 2012 teaches you

The loss year is actually the most informative year in the dataset.

2012 was a compressed, range-bound, low-volatility market. Price often stayed within narrow bands for entire sessions. The framework, which depends on price interacting with computed levels, had fewer opportunities and lower-quality setups. The result was a small net loss.

That is exactly what you would expect from a framework that responds to real market structure. In a year where price barely moves relative to the levels, the framework should produce weaker results. If it somehow showed strong performance in a market where the underlying mechanism had fewer opportunities to operate, that would be suspicious. It would suggest overfitting.

The 2012 loss is not a blemish. It is evidence that the framework is doing what it claims to do — responding to actual price-level dynamics rather than producing fabricated consistency.

The standard most vendors avoid

Year-by-year transparency is the highest standard of backtesting credibility. It is also the one most vendors avoid.

The reason is obvious. When you publish aggregate results, you control the narrative. A single win rate number across many years sounds authoritative. But it hides the years that did not perform as well. It hides the drawdown periods. It hides the stretches where a live trader would have questioned whether the edge still existed.

When you publish year-by-year results, you give up that control. Every year stands on its own. Every difficult period is visible. Every loss is accounted for.

I chose to do it anyway. Not because every year looks spectacular. Because the story the data tells — 18 profitable years out of 19, consistent half-year stability, a strengthening trend in recent years, and a single small loss year that behaves exactly as expected — is more credible than any aggregate number could ever be.

The question to ask

When you evaluate any trading framework, any indicator, any decision support tool, ask for the year-by-year performance.

Not the aggregate. Not the best quarter. Not a screenshot from a day it worked perfectly.

The year-by-year results. Every year. Including the difficult ones.

If the vendor will not show you that, you already have your answer.

If they can — and the results hold across the financial crisis, the QE era, the COVID crash, the rate hike cycle, and every market environment in between — that tells you something different.

The Algorithmic Suite

Midnight Grid. Quantum Vision. Turning Points.

Three indicators. One framework. Tested across 19 calendar years and every market regime the modern era has produced.

18 profitable. One small loss. No cherry-picked dates. No hidden years.

Available on your TradingView charts today.

Start Your 7-Day Free Trial

Algorithmic is charting software for decision support on TradingView. It is not financial advice. Trading involves risk. Outcomes depend on your rules, risk management, and execution. Past performance does not guarantee future results.

Data Reference (for fact-check)

89,774 qualifying first-visit signals across 4,721 sessions (from full-signal-analysis.py)
18/19 calendar years profitable (2008 through early 2026)
2012 loss: -$466 Model 1, -$662 Model 2
Half-year max WR difference: 6.4pp
Walk-forward: 14/14 rolling windows profitable
WR structural pattern: 2008-2017 avg ~52%, 2018-2026 avg ~63%
Key profitable stress years: 2008 (GFC), 2017 (historic calm), 2020 (COVID), 2022 (bear market)

More Blogs for You

The Prop Firm Trader's Daily Routine for ES Futures

Apr 12, 2026

Trailing Drawdown Rules Explained: Why Structure Beats Impulse

Apr 10, 2026

Drawdown Management for Funded Futures Accounts: A Quantitative Framework

Apr 8, 2026