Simulation 3 — Final Results
Simulation 3 ran for three days across a volatile market — BTC up 2.8%, ETH up 3.1%, SOL down 0.1% over the period — with sharp intraday swings in both directions. Every agent finished in the red. The spread from first to last was nearly 28 percentage points.
The dominant story was fee drag. ChatGPT ran 102 trades and paid $1,635 in fees — the highest fee bill in any simulation — converting $699 in gross PnL into a 9.36% loss. Grok ran 131 trades, the most of any agent, and paid $1,543 in fees. High-frequency trading in a choppy market is expensive.
The algo won by not losing. 55 trades, $263 in fees, a worst trade of -$33.92. Its deterministic rules enforced discipline that the AI agents consistently failed to apply themselves.
Market · Mar 1–3, 2026
| Asset | Mar 1 | Mar 2 | Mar 3 | 3-Day Change |
|---|---|---|---|---|
| BTC | ~$67,008 | ~$65,714–68,791 | ~$68,864 | +2.8% |
| ETH | ~$1,965 | ~$2,027 | ~$2,027 | +3.1% |
| SOL | ~$84 | ~$86 | ~$84 | -0.1% |
Final standings
| # | Agent | Equity | Return | Net PnL | Fees | Win% | Best | Worst | Trades |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Algo | $9,745.90 | -2.54% | -$254.10 | $263.87 | 41.8% | $84.78 | -$33.92 | 55 |
| 2 | Claude | $9,586.82 | -4.13% | -$413.18 | $646.95 | 43.7% | $231.76 | -$481.86 | 71 |
| 3 | Qwen | $9,205.41 | -7.95% | -$756.82 | $365.50 | 32.5% | $110.33 | -$106.77 | 83 |
| 4 | ChatGPT | $9,064.28 | -9.36% | -$935.71 | $1,635.15 | 45.1% | $645.54 | -$432.60 | 102 |
| 5 | Gemini | $8,859.56 | -11.40% | -$1,103.21 | $201.98 | 39.3% | $111.78 | -$156.41 | 56 |
| 6 | Grok | $6,982.66 | -30.17% | -$3,017.34 | $1,543.42 | 49.6% | $269.85 | -$278.17 | 131 |
Equity over time · Simulation 3
Loading chart...
Agent breakdown
The deterministic system held the line in a losing market. 55 trades and $263 in fees — the second-lowest cost of any agent. Gross PnL of just $9.77 points to near-break-even direction-picking, but the worst trade was only -$33.92. Algo doesn't win by being right — it wins by not being wrong badly.
$233.77 in gross PnL but $646.95 in fees makes the math painful. 71 trades at 43.7% accuracy — direction calls slightly below random. The best trade of $231.76 nearly covers one day's fee cost; the worst of -$481.86 erased it in a single position. Claude outperformed most of the field, but fees consumed the margin.
A 32.5% win rate across 83 trades is a directional accuracy problem, not a fee problem. Qwen won Simulation 1 with 60% accuracy, dropped to 34.1% in Simulation 2, and fell further here. Three simulations in, Qwen's win rates are 60%, 34.1%, 32.5% — inconsistent enough that Simulation 1 looks more like variance than edge.
Gross PnL of +$699.44 but fees of $1,635.15 — the largest fee bill in any simulation to date. 102 trades at 45.1% accuracy should produce roughly break-even results before costs. ChatGPT won Simulation 2 by managing payoff asymmetry; here the trade frequency reversed that entirely. Same model, different outcome — trade count is the variable.
39.3% win rate across 56 trades and -$901.23 in gross PnL. The lowest fees of any AI agent at $201.98, but with sub-40% directional accuracy, keeping costs low only slows the decline. Gemini has now posted below-40% win rates in two of three simulations.
131 trades and $1,543 in fees — the most trades and highest fee bill in Simulation 3. A 49.6% win rate nearly touches 50%, but at 131 trades, the fee obligation is nearly impossible to overcome regardless of accuracy. The -$1,473.92 gross loss is the worst of any agent in any simulation. Grok has now finished last twice out of three.
What changes in Simulation 4
Four changes going into Simulation 4.
First, Gemini upgrades from gemini-2.5-flash-lite to gemini-3.1-flash-lite-preview. Every other provider stays the same.
Second, all AI agents now receive historical indicator context — the last 5 snapshots per symbol before the current one, shown oldest-to-newest. In previous simulations, agents saw only the current moment: one price, one RSI, one MACD reading with no sense of direction. Now they can observe whether EMA lines are converging or diverging, whether RSI is rising or falling into a threshold, whether the MACD histogram is expanding or contracting. The difference between a number and a trend.
Third, the system prompt adds two new sections. A risk and capital deployment block explicitly tells agents that staying flat for long periods without cause is not the goal — capital should be working in normal conditions. A calibration guidance block asks agents to treat confidence as a probability estimate and reduce both confidence and position size when recent win rates fall below expectations. Both are responses to patterns observed across three simulations: agents defaulting to HOLD when uncertain, and expressing high confidence regardless of recent accuracy.
Fourth, the algorithmic agent expands from four scoring dimensions to six. The existing four — EMA cross, RSI zone, MACD histogram sign, Bollinger Band position — are unchanged. Two new dimensions activate once historical snapshots accumulate: a MACD line/signal crossover detector, which scores +1 or -1 when the lines actually cross rather than just checking histogram sign, and an EMA trend momentum check, which scores whether the gap between EMA20 and EMA50 is widening or narrowing. The algo now adapts to the same historical data the AI agents receive. Score range expands from ±4 to ±6 when history is available.