Blog

Why Your Strategy Works in Backtest but Not Live: The Six Gaps, Including the Ones My Simulator Skips

Your backtest is not lying to you. It is answering a different question than the one you asked.

Why Your Strategy Works in Backtest but Not Live: The Six Gaps, Including the Ones My Simulator Skips

You tested a strategy on historical data and it printed money. You ran it live and it bled. Most of what you will find when you search this question is either a vague checklist or a sales page for a tool claiming to have solved the problem.

I build a backtesting simulator for crypto perpetual futures, so I have a financial incentive to tell you backtests are trustworthy. I am going to do the opposite. Here is the full taxonomy of backtest-vs-live gaps, including the ones my own simulator skips.

First, two definitions so we are speaking the same language. A backtest is a replay of a trading strategy against historical price data, to see how it would have performed. A perpetual future, or perp, is a crypto derivative contract that never expires, which traders use to bet on price with leverage (borrowed exposure, so a $100 stake can control a $1,000 position).

There are six gaps. Some are about market mechanics. Some are about you.

Gap 1: Slippage

Slippage is the difference between the price you expected and the price you actually got.

A backtest fills your order at the candle's recorded price (a candle is one bar of price history: the open, high, low, and close for a time slice, say 4 hours). The historical record says BTC traded at $67,432, so the simulator hands you a fill at $67,432.

A live order does not work that way. It walks the order book, the live list of all resting buy and sell offers at each price level. If you buy more than the best offer can absorb, your order eats through the next price level, and the next, and your average fill lands worse than the price on the screen.

Here is the honesty move: PerpForge's backtests fill at candle price with zero slippage too. I have not solved this, and I want to be direct about why I have not bolted on a "slippage model": any backtester claiming realistic slippage deserves your skepticism. Real slippage depends on your order size, the liquidity sitting in the book at that exact moment, and which venue you are trading on. None of that survives in historical candle data. A flat "assume 0.05% slippage" knob is a guess wearing a lab coat. I would rather tell you the fill is idealized than sell you a fake correction.

The admission is the receipt. If a tool will not tell you what it skips, assume it skips more than this.

Gap 2: Fees, but Only the Flat Kind

Every trade costs money to place. PerpForge does simulate trading fees: a flat configurable rate charged on every entry and every exit. That alone removes a large class of backtest delusion, because a high-frequency strategy that looks great with free trades often dies the moment each round trip costs a few basis points.

But real exchanges are messier. They split fees into maker and taker. A maker order rests in the order book and adds liquidity, so it gets the lower fee. A taker order fills immediately against a resting order and removes liquidity, so it pays the higher fee. Exchanges also tier fees by your trading volume.

We do not simulate the maker/taker split or fee tiers yet. A flat rate is an honest approximation, not an exchange-accurate one. This is on our public build list.

Gap 3: Funding Rates

A funding rate is the periodic payment exchanged between long holders (betting price goes up) and short holders (betting price goes down) on a perp. It exists to tether the perp's price to the spot price of the underlying asset. When the perp trades above spot, longs pay shorts. When it trades below, shorts pay longs. Payments typically settle every few hours.

PerpForge does not simulate funding yet. Plainly: if your strategy holds positions for days, funding can be a real, recurring drag that no backtest on this platform currently charges you for. A strategy that backtests at a thin edge can have that entire edge consumed by funding in live trading, especially on pairs where one side stays crowded.

This one is also on our public build list, and the before-and-after comparison (how much do funding costs change the fleet's results?) will be published when it ships. No date promised.

Gap 4: Your Strategy Memorized the Past

This is the gap that has nothing to do with costs. Overfitting means the strategy learned the specific noise of the historical data rather than a real, repeatable edge. You tuned the parameters until the backtest looked great, and what you actually did was curve-fit to one particular stretch of history.

The market then shifts regime. A regime is the prevailing character of a market: trending versus chopping sideways, calm versus violent. A momentum strategy tuned during a clean uptrend meets a sideways chop and gives everything back, not because execution differed but because the conditions that made it work stopped existing.

This deserves its own full treatment, and it gets one: see our companion post on backtest overfitting and how to avoid it. The short version is that the more parameter combinations you tried before picking the winner, the less the winner's backtest means.

It is common to find a parameter set that ranks near the top of the field in one stretch of history and falls to the middle of the pack in the next. Nothing about the strategy changed; the market's regime did. That single before-and-after is the cleanest receipt there is for why a flattering in-sample rank is not a promise about the future.

Gap 5: The Backtest Never Hesitates. You Do.

A backtest executes with perfect discipline. It takes every entry. It honors every TP/SL (take-profit and stop-loss: the preset prices at which a position closes to lock in gains or cap losses). It never widens a stop because "it's about to bounce". It never revenge-trades after a loss. It never skips the signal that fires at 3 a.m.

You will do all of these things. Not because you are weak, but because you are a human watching real money move. The backtest measures the strategy. Live trading measures the strategy times your discipline, and that second factor is almost always less than one.

This gap is unfixable by software. It is worth naming anyway, because traders routinely blame the strategy for what was actually an execution deviation.

Gap 6: You Only Went Live with the Winner

Here is the quiet one. Suppose you backtested twenty variants and took the best one live. That selection step is itself a bias. Among twenty random coin flips, one will look like a hot streak. By choosing the best backtest, you preferentially selected the variant most likely to have been flattered by luck, which means live performance regresses toward the mean almost by construction.

This is the same statistical trap we wrote about in our post on whether your strategy is just luck, and it is why every result PerpForge publishes carries a significance verdict (a named statistical test, the Wilson confidence interval, that asks whether the win rate is distinguishable from a coin flip given the sample size).

A large share of strategies that look like winners on raw performance turn out to be statistically indistinguishable from a coin flip once the significance test is applied. That is the standing receipt for this section: selecting the best backtest is mostly selecting the luckiest one, and only the significance verdict separates the two.

What PerpForge Does Simulate That Most Backtests Skip

For balance, one thing this simulator gets right that most do not: liquidation. Liquidation is when the exchange force-closes a leveraged position because losses have eaten through the margin (the collateral you posted to open the trade). PerpForge simulates it with the standard formula, the entry price adjusted by one over the leverage. The position is force-closed, the margin is wiped, and the exit reason is recorded.

Most backtesting tools ignore liquidation entirely. They will happily show a 10x leveraged strategy riding through a 15% drawdown that, in live trading, would have ended the account at the 10% mark. If you backtest perps with leverage and your tool does not model liquidation, your equity curve contains positions that could not have survived.

How to Use a Backtest Correctly

A backtest is a filter, not a forecast.

Used correctly, it answers: "is this idea worth testing further, or can I discard it now?" A strategy that fails in a frictionless, perfectly disciplined, zero-slippage replay of the past will not improve in a hostile live market. That is genuinely useful. It lets you kill bad ideas cheaply, in simulation, instead of expensively, with funds.

Used incorrectly, it answers a question it cannot answer: "how much will I make?" It cannot, because of all six gaps above.

So the practical posture is this. Expect live results to be worse than backtest results. Budget for the gap explicitly: thinner edge after slippage, real fee splits, funding drag on held positions, regime drift, your own hands, and selection bias. If a strategy only works when the backtest is taken at face value, it does not work.

If You Know a Better Way, Show Us

This taxonomy is our current best understanding, not a final word. If you think a slippage model can be honest, or our liquidation formula has a flaw, or there is a seventh gap we missed, make the case. We adopt corrections, credit them, and publish the change.

If you want to see what a backtest with fees and liquidation simulated, plus a significance verdict on every result, actually looks like: the leaderboard is public, no signup, viewing is free. Test your own variant if one of the published results makes you curious.

PerpForge is an educational simulator. No real money is traded, nothing here is financial advice, and it is for adults (18+).

Put it to the test

Does your idea have a real edge, or just a big number?

Spawn your variant, run it on the same engine we use for every result on this site, and read the edge-significance verdict — before you risk real money.

Test your own idea — free →Free account, no card