Six frontier AI models. Thirty independent trials (pass@5). A rigorous multi-stage quantitative trading benchmark with hidden out-of-sample data.
Five cascading stages using real Indian equity index data. Models must prove precise implementation before attempting creative alpha research.
Merge NIFTY 50 + BANKNIFTY minute OHLC (737K bars, 2015-2022)
Implement specified moving average strategy with known ground truth
Pairs trading on the NIFTY-BANKNIFTY spread with cooldown logic
Vol-normalized equal-weight combination of strategies
Discover, backtest & combine hedged strategies. Scored on hidden 2023-2026 OOS data
Early stage failures block all downstream scoring. Stage 0 failure yields -6.0; calibration errors in Stages 1a-1c yield -4.0 to -5.0. Only models passing all gates get scored on Stage 2 alpha.
Stage 2 strategies scored on hidden 2023-2026 market data the model never sees. Runtime lookahead detection via truncation tests prevents data snooping.
All positions must satisfy |net_notional| / gross_notional ≤ 0.80 at every bar. Forces genuine hedged trading, eliminating pure directional bets models default to.
A cascading gate system where early failures block downstream scoring. Reward ranges from -6.0 (catastrophic) to +3.0 (exceptional alpha).
Ranked by average reward across 5 independent trials (pass@5). Ground truth achieves Sharpe 2.784 on the same hidden test data.
All 6 models with complete trial statistics. Click column headers to sort.
| Rank ▲ | Model ▲ | Avg Reward ▼ | Best ▲ | Worst ▲ | Std Dev ▲ |
|---|
Trial-level score distribution and model comparison across the reward scale.
Pass rate per stage across 5 trials. Cyan = 5/5, orange = 3-4/5, red = 0-2/5.