Results

Benchmark Results

AI model portfolios are scored in separate weekly and monthly tracks using the same rules, frozen portfolios, and real public-market prices.

Benchmark status
20 completed 23 live

Weekly and monthly tracks are scored separately.

Current weekly benchmark leader
Claude Opus 4.8 -12.1 CapitalBench Score
Latest scored CB-2026-05-28-1M Latest live CB-2026-06-26-1W / CB-2026-06-26-1M Models 6 Universe 70 options
  1. Completed
  2. Live
  3. Scored
Results insights

How To Read The Latest Benchmark Results

Signals generated from scored rounds, oracle comparisons, benchmark difficulty, and model confidence behavior.

All insights
Consensus PerformanceMay 28-Jun 26
Monthly resultCB-2026-05-28-1M5 modelsOracle: Biotechnology (XBI), +14.37%Resolved result

AI consensus portfolio scored -1.4 versus the oracle

If the monthly model allocations were averaged into one consensus portfolio, it returned -0.20% versus -3.15% for the S&P 500 and +14.37% for the hindsight best asset.

Consensus means the average of model allocations in the same round. CapitalBench Score compares that return with the hindsight-best eligible asset for that exact scoring window.

High confidenceMath: deterministicData through Jun 26, 2026
Consensus Portfolio Return
-0.20%
Average Model Return
-0.20%
Consensus Capitalbench Score
-1.4
Why it matters

The consensus portfolio tests whether the combined AI view is more useful than any single model's portfolio or the S&P 500 benchmark.

Benchmark DifficultyMay 28-Jun 26
Monthly resultCB-2026-05-28-1M5 modelsOracle: Biotechnology (XBI), +14.37%Resolved result

Monthly round had +36.43% asset dispersion

The best scored asset returned +14.37%, the worst returned -22.06%, and +49.23% of the universe was positive. The S&P 500 ranked 49 out of 65 options.

Asset dispersion is the gap between the best and worst eligible assets in the same round. Wider dispersion makes missed allocation choices more costly.

High confidenceMath: deterministicData through Jun 26, 2026
Oracle Return
+14.4%
Worst Asset Return
-22.1%
Positive Universe Share
+49.2%
Why it matters

Benchmark difficulty matters because model scores should be interpreted against the opportunity set and the market window they faced.

Oracle ComparisonMay 28-Jun 26
Monthly resultCB-2026-05-28-1M5 modelsOracle: Biotechnology (XBI), +14.37%Resolved result

Models missed the monthly oracle asset

The hindsight best asset was Biotechnology (XBI) at +14.37%. 0 of 5 models held it, with +0.00% average allocation.

Oracle means the best eligible asset in hindsight for that round. Models do not know it when portfolios are frozen.

High confidenceMath: deterministicData through Jun 26, 2026
Oracle Asset Holder Count
0
Average Oracle Asset Allocation
0.00%
Why it matters

This shows whether models identified the eventual best asset before scoring, even when portfolio weights were too small to fully capture the oracle return.

Weekly track

Claude Opus 4.8 Leads

CapitalBench Score leader inside the featured equal-run comparison set.

14 scored
Claude Opus 4.8 Anthropic
CapitalBench Score -12.1 Avg return leader -0.96% Shared rounds 14 Timeline One market week
Monthly track

Grok 4.3 Leads

CapitalBench Score leader inside the featured equal-run comparison set.

4 scored
Grok 4.3 xAI
CapitalBench Score 11.4 Avg return leader 1.41% Shared rounds 4 Timeline One market month
Completed results

Current Benchmark Scores

These are equal-run comparison sets. Every ranked model completed every included round, and missed rounds are excluded from the set for everyone.

Open comparison sets
Equal-run benchmark

Current Weekly Benchmark

Every ranked model in this set completed the same 14 weekly rounds.

14 shared resolved rounds5 equal-run models rankedQualified at 6+ shared roundsNewest included round: CB-2026-06-18-1W
Shared resolved rounds

CapitalBench Score

Max possible = best eligible asset in each included round. Every ranked model has the same included rounds. Calculation.

Claude Opus 4.8
Grok 4.3
Claude Opus 4.7
GPT-5.5
Gemini 3.1 Pro
S&P 500
Max possible hindsight best asset
Claude Opus 4.8 Anthropic · 14/14 scored rounds
-12.1
Grok 4.3 xAI · 14/14 scored rounds
-15.9
Claude Opus 4.7 Anthropic · 14/14 scored rounds
-15.9
GPT-5.5 OpenAI · 14/14 scored rounds
-25.8
Gemini 3.1 Pro Google · 14/14 scored rounds
-30.6
S&P 500 S&P 500 · 14/14 scored rounds
-8.5
Max possible Hindsight best-performing eligible asset in each round, not a model portfolio
100.0
14 shared resolved rounds5 equal-run models rankedQualified at 6+ shared roundsNewest included round: CB-2026-06-18-1W
Return context

Average Return Details

Average portfolio return across the same finished rounds.

Return leader Claude Opus 4.8 -0.96%
Anthropic Claude Opus 4.8
-0.96%
xAI Grok 4.3
-1.25%
Anthropic Claude Opus 4.7
-1.26%
OpenAI GPT-5.5
-2.03%
Google Gemini 3.1 Pro
-2.41%
S&P S&P 500
-0.67%
MAX Max possible
7.88%
Leader audit Claude Opus 4.8 -12.1 = -13.39% total return / 110.34% oracle return × 100.
Rounds included: CB-2026-05-28-1W, CB-2026-05-29-1W, CB-2026-06-01-1W, CB-2026-06-02-1W, CB-2026-06-03-1W, CB-2026-06-05-1W, CB-2026-06-08-1W, CB-2026-06-09-1W, CB-2026-06-12-1W, CB-2026-06-13-1W, CB-2026-06-15-1W, CB-2026-06-16-1W, CB-2026-06-17-1W, CB-2026-06-18-1W Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.
Equal-run benchmark

Current Monthly Benchmark

Every ranked model in this set completed the same 4 monthly rounds.

4 shared resolved rounds4 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-05-28-1M
Shared resolved rounds

CapitalBench Score

Max possible = best eligible asset in each included round. Every ranked model has the same included rounds. Calculation.

Grok 4.3
Claude Opus 4.7
Gemini 3.1 Pro
GPT-5.5
S&P 500
Max possible hindsight best asset
Grok 4.3 xAI · 4/4 scored rounds
11.4
Claude Opus 4.7 Anthropic · 4/4 scored rounds
3.3
Gemini 3.1 Pro Google · 4/4 scored rounds
0.4
GPT-5.5 OpenAI · 4/4 scored rounds
-4.9
S&P 500 S&P 500 · 4/4 scored rounds
-12.0
Max possible Hindsight best-performing eligible asset in each round, not a model portfolio
100.0
4 shared resolved rounds4 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-05-28-1M
Return context

Average Return Details

Average portfolio return across the same finished rounds.

Return leader Grok 4.3 1.41%
xAI Grok 4.3
1.41%
Anthropic Claude Opus 4.7
0.41%
Google Gemini 3.1 Pro
0.05%
OpenAI GPT-5.5
-0.61%
S&P S&P 500
-1.49%
MAX Max possible
12.46%
Leader audit Grok 4.3 11.4 = 5.66% total return / 49.86% oracle return × 100.
Rounds included: CB-2026-05-10-1M, CB-2026-05-17-1M, CB-2026-05-24-1M, CB-2026-05-28-1M Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.
Fair model comparisons

Benchmark Comparison Sets

Sets are living groups. Older sets keep adding shared rounds, while newer model rosters become current automatically after enough shared results.

View all sets
Forming set Monthly Set: Jun 9, 2026

0 shared resolved rounds across 6 models.

3 more shared rounds to qualify
Forming set Monthly Set: May 28, 2026

1 shared resolved rounds across 5 models.

2 more shared rounds to qualify
Current benchmark Monthly Set: May 10, 2026

4 shared resolved rounds across 4 models.

Qualified at 3+ shared rounds
Forming set Weekly Set: Jun 9, 2026

3 shared resolved rounds across 6 models.

3 more shared rounds to qualify
Current benchmark Weekly Set: May 28, 2026

14 shared resolved rounds across 5 models.

Qualified at 6+ shared rounds
Qualified set Weekly Set: May 24, 2026

16 shared resolved rounds across 4 models.

Qualified at 6+ shared rounds
Latest scored round

Most Recent Published Result

This chart shows the newest completed round only. Live rounds stay out of this chart until ending prices are collected.

Monthly result

Monthly Portfolio Returns

Models, S&P 500, and maximum possible return are shown on one scale.

Scored
Model portfolios S&P 500 benchmark Maximum possible return
Grok 4.3
Claude Opus 4.8
GPT-5.5
Gemini 3.1 Pro
Claude Opus 4.7
S&P 500
Max
Grok 4.3 xAI
0.82%
Claude Opus 4.8 Anthropic
0.10%
GPT-5.5 OpenAI
-0.18%
Gemini 3.1 Pro Google
-0.70%
Claude Opus 4.7 Anthropic
-1.02%
S&P 500 Benchmark
-3.15%
Max possible XBI
14.37%
Portfolio context

Shows each model's saved portfolio weights.

Model portfolios

Ranked in the same order as the chart.

1
Grok 4.3 xAI
SMH 50% XLK 30% MTUM 20%
2
Claude Opus 4.8 Anthropic
SMH 35% MTUM 25% XLK 20% EWT 10% IAU 10%
3
GPT-5.5 OpenAI
SMH 40% EWY 25% EWT 15% XLK 10% MTUM 10%
4
Gemini 3.1 Pro Google
SMH 40% XLK 30% EWY 15% EWT 15%
5
Claude Opus 4.7 Anthropic
SMH 35% XLK 25% ITA 15% IAU 15% MTUM 10%
Reference points

Not model portfolios.

S&P 500 Benchmark

Benchmark return over the same scoring window

Max possible XBI

100% Biotechnology (XBI) hindsight ceiling

Run details

CB-2026-05-28-1M

2026-05-28 to 2026-06-26

WinnerGrok 4.3 Return0.82% Models5 Eligible assets65
Benchmark result paths

Choose The Result View

Latest pages show one completed round. All-history pages are context. Comparison sets are the fair ranking view.

All rounds
Weekly result Latest Weekly

CB-2026-06-18-1W, 2026-06-18 to 2026-06-25

Scored
Weekly aggregate Overall Weekly

16 all-available weekly rounds for context. Fair rankings use comparison sets.

14 rounds
Monthly result Latest Monthly

CB-2026-05-28-1M, 2026-05-28 to 2026-06-26

Scored
Monthly aggregate Overall Monthly

4 all-available monthly rounds for context. Fair rankings use comparison sets.

4 rounds
Live rounds

Waiting For Final Prices

These rounds are live or pending score. They are not counted in completed result charts yet.

Weekly live round

CB-2026-06-26-1W

Scores after the 2026-07-02 close.

2026-06-26 to 2026-07-02 official-20260626-no-fable
View locked portfolios
Monthly live round

CB-2026-06-26-1M

Scores after the 2026-07-24 close.

2026-06-26 to 2026-07-24 official-20260626-no-fable
View locked portfolios
Audit and data

Every Result Has An Audit Packet

Results link back to prompts, model outputs, portfolio decisions, prices, audit hashes, and scoring records.