Concept · Is it a real edge, or luck?

Statistical Independence

Two observations are statistically independent when knowing one tells you nothing about the other. Sample-size benefits — tighter confidence intervals, stronger significance — only apply to independent observations. Correlated observations look like more data but don't behave like it.

Statistical Independence

Two observations are statistically independent when knowing one tells you nothing about the other. Sample-size benefits — tighter confidence intervals, stronger significance — only apply to independent observations. Correlated observations look like more data but don't behave like it.

In plain English

Suppose you run the same 21/50-crossover-long signal on BTC and on ETH over the same window. You have two Sharpe numbers — but BTC and ETH largely move together, so both backtests saw the same broad market swings and produced linked outcomes. Knowing the BTC result tells you a lot about the ETH result. These two estimates are NOT fully independent.

By contrast, if you measured the same strategy on 2024 and then on 2018-2021, you would sample two genuinely different markets — a mostly-bull stretch and a multi-regime span. Knowing the first result tells you little about the second. Those ARE close to independent.

The difference matters because all statistical machinery — confidence intervals, significance tests, sample-size formulas — assumes independence. Plugging correlated data into formulas designed for independent data gives over-tight (false) confidence.

Why it matters for this fleet

This is the load-bearing reason the fleet's headline result is so humbling. The dossier holds 210 variants — but they are NOT 210 independent tests. They share the same 21/50-long signal family, run across correlated assets (BTC/ETH/SOL tend to move together), in one window. So they rise and fall together.

That is exactly why 5 edge-significant hits mean almost nothing. When you run ~210 correlated variants, roughly 11 would clear an edge test by pure chance (the multiple-comparisons problem — run enough correlated tests and some pass on luck alone). Five is below eleven. After that haircut, no edge in the fleet is distinguishable from luck. Independence is the precondition that lets sample-size and significance benefits apply at all — and this fleet badly lacks it.

The most common confusion: thinking that running more variants automatically buys statistical confidence. It doesn't — what buys it is temporal or market-event diversification (genuinely new history the strategy has never seen).

The variance formula for correlated estimates

For two estimates S_1 and S_2 with correlation ρ:

Var( (S_1 + S_2) / 2 ) = (σ² / 2) × (1 + ρ)

ρ = 0 (independent): variance halves. SE shrinks by sqrt(2). Genuine 2× sample-size gain.
ρ = 0.5 (partially correlated): variance shrinks by 25%. Modest gain.
ρ = 0.95 (highly correlated): variance shrinks by 2.5%. Effectively zero gain.
ρ = 1 (identical): variance unchanged. Zero gain.

For k correlated estimates, the variance reduction degrades similarly. Adding more correlated venues hits diminishing returns very fast.

Typical correlation values in this fleet's testing scenarios

This dossier is single-venue (Binance only) and single-window, so the diversification axes available inside it are symbol and leverage — neither of which buys real independence.

Scenario	Approx ρ	Independent?
Same signal, same symbol, different leverage (id523 2× vs id659 1×)	1.00 (identical trades)	No — literally the same trades
Same signal, same window, different symbol (BTC vs ETH vs SOL)	0.40–0.80	Partially (linked by macro events)
Same signal, different time periods (e.g. 2018-2021 vs 2024) — not in dossier	~0	Yes
Two distinct EMA pairs on the same symbol & window	0.0–0.5 (depends on overlap)	Variable

The symbol/leverage rows are estimates, not measurements — confidence is MEDIUM. The leverage row is exact: leverage scales PnL but never changes which trades fire.

Examples relevant to the fleet

id523 (SOL 1h 2×) vs id659 (SOL 1h 1×): ρ = 1.00 — they are the identical 436 trades (W=139 / L=297). Leverage only rescaled the equity curve. Pooling these two adds zero new information — the cleanest demonstration that "more variants" ≠ "more independent data."
The same 21/50-long signal across BTC, ETH, SOL (e.g. id511 / id517 / id523): ρ ≈ 0.4–0.8 — the three assets share broad market swings, so their win/loss patterns are linked. Pooling tightens the estimate somewhat, but far less than the raw trade-count sum suggests.
The full 210-variant fleet: a dense web of the above correlations. Treating it as 210 independent tests is exactly the error that makes 5 edge-significant hits look meaningful when ~11 would pass by chance.

Practical guidance

For sample-size growth: use temporal extension (genuinely new history). That's where independence lives — and it is the reserved out-of-sample exam this fleet has not yet faced.
Beware the leverage mirage. Variants that differ only in leverage are the same trades; never count them as added evidence.
When in doubt: report unpooled per-stratum results. A reader who sees three numbers with three CIs is better-informed than one who sees a single pooled value with a falsely-tight CI.

meta analysis
venue stratified testing
sample size
confidence interval
cross venue data

Sources

wiki/qa-sessions/2026-05-17-session.md#q8 (first asked here)
Standard correlation-of-averages formulas (any intro statistics text)
Meta-analysis methodology for dependent effect sizes (Borenstein et al., 2009)

Related concepts

See it in a real result →

Put it to the test

Does your idea have a real edge, or just a big number?

Spawn your variant, run it on the same engine, and read the edge-significance verdict — before you risk real money.

Test your own idea — free →Free account, no card