Question 1

In plain English

Accepted Answer

Every metric in a strategy report — win rate, profit factor, Sharpe ratio, expectancy — is an estimate of an underlying truth, based on the trades the strategy generated. The more trades, the closer the estimate is to the truth. The fewer trades, the wider the range of plausible truths. Sample size is the single biggest determinant of how much you should trust the metrics.

Question 2

Why it matters for this fleet

Accepted Answer

Trade counts in the 210-strategy fleet range from N=3 (a 50/200 daily variant) to N=10,574 (a 9/21 scalp on 1-minute candles) — a 3,500× spread. The median is N=436. Naively ranking strategies by PnL or Sharpe without weighting by sample size produces a top list dominated by lucky low-N strategies, not robust ones. A clean illustration sits at the two extremes: id478 (N=3, win rate could be anywhere from 20% to 94%) is pure anecdote, while id511 (N=469, win rate pinned to ±3.9pp) is a trustworth

Question 3

How sample size drives down uncertainty

Accepted Answer

Standard error scales with 1/sqrt(N). Doubling your sample only reduces uncertainty by ~30%. To halve the confidence interval, you need 4× the trades. Trades 95% CI half-width on WR (true ≈ 40%) 20 ±21.5% 50 ±13.6% 100 ±9.6% 200 ±6.8% 500 ±4.3% 1000 ±3.0%

Question 4

What drives sample size in this fleet

Accepted Answer

Interval. Shorter candles → more crossings → more trades. The 50/200 macro pair on daily candles is the fuzzy corner: structurally few daily crossings give it a median of only ≈28 trades — too few to prove anything (0 of that pair are edge-significant). The 21/50 pair on 1h fires far more often. Filter strictness. Volume-gated variants fire less than ungated ones (18 of the 210 are volume-gated). Side. Long-only on this bull window generates more entries than short-only. Symbol. SOL whipsaws mor

Question 5

Examples from the live fleet

Accepted Answer

id478 (EMA 50/200 · BTC · 1d · 2× · long): just N=3 trades. The profit factor of 20.8 and 66.7% win rate (the share of trades that close in profit) rest on three outcomes. Removing one would change everything. This is sample-size fragility in raw form. id511 (EMA 21/50 · BTC · 1h · 2× · long): N=469 trades. The win rate is pinned to ±3.9pp (percentage points). Removing a handful of trades barely moves the metric.

Question 6

How to mitigate small samples

Accepted Answer

Pool related strategies. A "family" view pools variants that share the same signal. Beware: pooling across leverage adds nothing (id523 at 2× and id659 at 1× are the identical 436 trades — leverage scales PnL, not which trades fire). Pool across genuinely different conditions instead. Extend the backtest window. More years = more trades. Add more symbols. Spawning the rule on three symbols (Phase 125: explicit multi-select at spawn time) gives effectively 3× the data if the signal is symbol-inde

Question 7

Refinement — independence matters more than count

Accepted Answer

A subtle but critical refinement: raw trade count overstates effective sample size when observations are correlated. See [[statistical-independence]]. The 210 variants in this dossier are not 210 independent tests — they share the same 21/50-long signal family across correlated assets (BTC/ETH/SOL) in one window, so they move together. Real sample-size growth comes from genuinely different markets and time periods, not from re-running correlated variants. Practical implication: prioritize tempor

Question 8

Refined 2026-05-17 — deployable thresholds

Accepted Answer

The "≥30 trades" threshold is the floor, not the target. For deployable confidence: Tier N Use Floor 30 Anecdote only Directional 100 Hypothesize edge Deployable 300 Trust PF and Sharpe enough to risk capital Robust 1000+ Near-bulletproof Three independent constraints justify the 300 mark: Win-rate confidence interval tightens to ±5% (actionable precision) at N ≈ 400. Sharpe-significance test: N ≈ (1.96 / Sharpe)². For true per-trade Sharpe 0.1, you need ~385 trades to distinguish from zero. Pro

Question 9

Accepted Answer

[[statistical-significance]] [[statistical-independence]] [[confidence-interval]] [[number-of-winners]] [[sharpe-significance]] [[selection-bias]] [[edge]] [[out-of-sample-testing]]

Question 10

Sources

Accepted Answer

wiki/qa-sessions/2026-05-17-session.md#q2 (first asked here) wiki/qa-sessions/2026-05-17-session.md#q3 (refinement) /api/analytics

Sample Size