Concept · Is it a real edge, or luck?

Confidence Interval

A range around a measured statistic that quantifies how uncertain the measurement is. A 95% confidence interval (CI) says: "we are 95% confident the true value lies within this range."

Confidence Interval

A range around a measured statistic that quantifies how uncertain the measurement is. A 95% confidence interval (CI) says: "we are 95% confident the true value lies within this range."

In plain English

When you measure a strategy's win rate as 40% across 35 trades, you do not actually know the win rate is 40%. You know that in this sample of 35 trades it was 40%. The true underlying win rate could be higher or lower. The 95% CI gives you the range of plausible true values.

Reading a metric without its CI is like reading a thermometer that says "70°F" without knowing it's accurate to ±2°F or ±20°F. The number is meaningless without the uncertainty.

Formula

For a measured proportion p̂ (e.g. win rate as a fraction) over N trades:

SE  = sqrt( p̂ × (1 − p̂) / N )
CI₉₅ = p̂ ± 1.96 × SE

The 1.96 multiplier comes from the normal distribution and corresponds to 95% confidence. For 99% confidence use 2.58; for 68% use 1.0.

The formula above is the simple Wald approximation. The project's standard is the Wilson interval — a refinement that stays sensible at small N and near 0% or 100% (where Wald can run off the end of [0, 100%]). The CIs cited below are Wilson intervals, taken directly from the dossier.

Why it matters for this fleet

Almost no displayed metric in the trading simulator is shown with its CI. This means you, the human reader, have to mentally apply the CI when interpreting low-N strategies. Failing to do so leads to ranking lucky strategies as "best."

CI half-widths at various N (assuming true WR ≈ 40%)

Trades	± half-width
20	21.5%
30	17.5%
50	13.6%
100	9.6%
200	6.8%
500	4.3%
1000	3.0%

To halve the CI you need 4× the trades. There is no shortcut.

Examples from the live fleet

id478 (EMA 50/200 · BTC · 1d · 2× · long): just N=3 trades, measured win rate (the share of trades that close in profit) 66.7%. Wilson 95% CI: [20.8%, 93.9%] — a half-width of ±36.5pp (percentage points). That range spans "losing strategy" to "almost-certain win" — the metric is useless for any decision.
id511 (EMA 21/50 · BTC · 1h · 2× · long): N=469 trades, measured win rate 24.9%. Wilson 95% CI: [21.2%, 29.1%] — a half-width of just ±3.9pp. Tight enough to act on.

The first has a flashier headline (66.7% win rate, profit factor 20.8), but the second has the only trustworthy estimate. Sample size buys precision.

CIs apply to more than just win rate

Profit factor has a CI too. Below ~100 trades, PF is wildly uncertain.
Sharpe ratio has a CI driven by both N and the variance of returns. Sharpe over 30 trades is essentially noise.
Expectancy (avg PnL per trade) has CI = SE of returns × 1.96.

Every backtest metric is an estimate with a CI. Sample size is the only knob that tightens it.

Practical rule

When comparing two strategies, ask: "Do their CIs overlap?"

id478 win rate 66.7% → [20.8%, 93.9%]
id511 win rate 24.9% → [21.2%, 29.1%]

These intervals overlap (both include the low-to-high-20s). Despite id478's headline win rate being more than double id511's, you cannot say id478 has a higher true win rate — its interval is so wide it reaches all the way down into id511's range. They might differ. Or they might not.

This is the formal way of saying "low-N strategies are not comparable to high-N strategies."

Sources

wiki/qa-sessions/2026-05-17-session.md#q2 (first asked here)
Standard binomial proportion confidence interval (Wald approximation)

Related concepts

See it in a real result →

Put it to the test

Does your idea have a real edge, or just a big number?

Spawn your variant, run it on the same engine, and read the edge-significance verdict — before you risk real money.

Test your own idea — free →Free account, no card