Results

Benchmark Results

AI model portfolios are scored in separate weekly and monthly tracks using the same rules, frozen portfolios, and real public-market prices.

Latest weekly Comparison sets All weekly history Combined results Audit and data

Benchmark status

20 completed 23 live

Weekly and monthly tracks are scored separately.

Current weekly benchmark leader

Claude Opus 4.8 -12.1 CapitalBench Score

Latest scored CB-2026-05-28-1M Latest live CB-2026-06-26-1W / CB-2026-06-26-1M Models 6 Universe 70 options

Completed
Live
Scored

Latest weekly result Comparison sets All rounds

Results insights

How To Read The Latest Benchmark Results

Signals generated from scored rounds, oracle comparisons, benchmark difficulty, and model confidence behavior.

All insights

Consensus PerformanceMay 28-Jun 26

Monthly resultCB-2026-05-28-1M5 modelsOracle: Biotechnology (XBI), +14.37%Resolved result

AI consensus portfolio scored -1.4 versus the oracle

If the monthly model allocations were averaged into one consensus portfolio, it returned -0.20% versus -3.15% for the S&P 500 and +14.37% for the hindsight best asset.

Consensus means the average of model allocations in the same round. CapitalBench Score compares that return with the hindsight-best eligible asset for that exact scoring window.

High confidenceMath: deterministicData through Jun 26, 2026

Consensus Portfolio Return: -0.20%
Average Model Return: -0.20%
Consensus Capitalbench Score: -1.4

Benchmark DifficultyMay 28-Jun 26

Monthly resultCB-2026-05-28-1M5 modelsOracle: Biotechnology (XBI), +14.37%Resolved result

Monthly round had +36.43% asset dispersion

The best scored asset returned +14.37%, the worst returned -22.06%, and +49.23% of the universe was positive. The S&P 500 ranked 49 out of 65 options.

Asset dispersion is the gap between the best and worst eligible assets in the same round. Wider dispersion makes missed allocation choices more costly.

High confidenceMath: deterministicData through Jun 26, 2026

Oracle Return: +14.4%
Worst Asset Return: -22.1%
Positive Universe Share: +49.2%

Oracle ComparisonMay 28-Jun 26

Monthly resultCB-2026-05-28-1M5 modelsOracle: Biotechnology (XBI), +14.37%Resolved result

Models missed the monthly oracle asset

The hindsight best asset was Biotechnology (XBI) at +14.37%. 0 of 5 models held it, with +0.00% average allocation.

Oracle means the best eligible asset in hindsight for that round. Models do not know it when portfolios are frozen.

High confidenceMath: deterministicData through Jun 26, 2026

Oracle Asset Holder Count: 0
Average Oracle Asset Allocation: 0.00%

Weekly track

Claude Opus 4.8 Leads

CapitalBench Score leader inside the featured equal-run comparison set.

14 scored

Claude Opus 4.8 Anthropic

CapitalBench Score -12.1 Avg return leader -0.96% Shared rounds 14 Timeline One market week

Latest result Comparison sets All history Audit packet

Monthly track

Grok 4.3 Leads

CapitalBench Score leader inside the featured equal-run comparison set.

4 scored

Grok 4.3 xAI

CapitalBench Score 11.4 Avg return leader 1.41% Shared rounds 4 Timeline One market month

Latest result Comparison sets All history Audit packet

Completed results

Current Benchmark Scores

These are equal-run comparison sets. Every ranked model completed every included round, and missed rounds are excluded from the set for everyone.

Open comparison sets

Equal-run benchmark

Current Weekly Benchmark

Every ranked model in this set completed the same 14 weekly rounds.

14 shared resolved rounds5 equal-run models rankedQualified at 6+ shared roundsNewest included round: CB-2026-06-18-1W

Shared resolved rounds

CapitalBench Score

Max possible = best eligible asset in each included round. Every ranked model has the same included rounds. Calculation.

Claude Opus 4.8

Grok 4.3

Claude Opus 4.7

GPT-5.5

Gemini 3.1 Pro

S&P 500

Max possible hindsight best asset

Claude Opus 4.8 Anthropic · 14/14 scored rounds

-12.1

Grok 4.3 xAI · 14/14 scored rounds

-15.9

Claude Opus 4.7 Anthropic · 14/14 scored rounds

-15.9

GPT-5.5 OpenAI · 14/14 scored rounds

-25.8

Gemini 3.1 Pro Google · 14/14 scored rounds

-30.6

S&P 500 S&P 500 · 14/14 scored rounds

-8.5

Max possible Hindsight best-performing eligible asset in each round, not a model portfolio

100.0

14 shared resolved rounds5 equal-run models rankedQualified at 6+ shared roundsNewest included round: CB-2026-06-18-1W

Return context

Average Return Details

Average portfolio return across the same finished rounds.

Return leader Claude Opus 4.8 -0.96%

Claude Opus 4.8

-0.96%

Grok 4.3

-1.25%

Claude Opus 4.7

-1.26%

GPT-5.5

-2.03%

Gemini 3.1 Pro

-2.41%

S&P S&P 500

-0.67%

MAX Max possible

7.88%

Leader audit Claude Opus 4.8 -12.1 = -13.39% total return / 110.34% oracle return × 100.

Rounds included: CB-2026-05-28-1W, CB-2026-05-29-1W, CB-2026-06-01-1W, CB-2026-06-02-1W, CB-2026-06-03-1W, CB-2026-06-05-1W, CB-2026-06-08-1W, CB-2026-06-09-1W, CB-2026-06-12-1W, CB-2026-06-13-1W, CB-2026-06-15-1W, CB-2026-06-16-1W, CB-2026-06-17-1W, CB-2026-06-18-1W Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.

Equal-run benchmark

Current Monthly Benchmark

Every ranked model in this set completed the same 4 monthly rounds.

4 shared resolved rounds4 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-05-28-1M

Shared resolved rounds

CapitalBench Score

Max possible = best eligible asset in each included round. Every ranked model has the same included rounds. Calculation.

Grok 4.3

Claude Opus 4.7

Gemini 3.1 Pro

GPT-5.5

S&P 500

Max possible hindsight best asset

Grok 4.3 xAI · 4/4 scored rounds

11.4

Claude Opus 4.7 Anthropic · 4/4 scored rounds

3.3

Gemini 3.1 Pro Google · 4/4 scored rounds

0.4

GPT-5.5 OpenAI · 4/4 scored rounds

-4.9

S&P 500 S&P 500 · 4/4 scored rounds

-12.0

Max possible Hindsight best-performing eligible asset in each round, not a model portfolio

100.0

4 shared resolved rounds4 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-05-28-1M

Return context

Average Return Details

Average portfolio return across the same finished rounds.

Return leader Grok 4.3 1.41%

Grok 4.3

1.41%

Claude Opus 4.7

0.41%

Gemini 3.1 Pro

0.05%

GPT-5.5

-0.61%

S&P S&P 500

-1.49%

MAX Max possible

12.46%

Leader audit Grok 4.3 11.4 = 5.66% total return / 49.86% oracle return × 100.

Rounds included: CB-2026-05-10-1M, CB-2026-05-17-1M, CB-2026-05-24-1M, CB-2026-05-28-1M Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.

Evidence context

Evidence Behind The Current Benchmarks

Generated from completed rounds, benchmark-set rules, protocols, and available baselines so every score carries its own context.

Score methodology

Weekly benchmarkMore established

Evidence levelMore establishedWeekly evidence has enough completed rounds for stronger pattern reads, while still needing ongoing live validation.

Weekly evidence16 resolved rounds / 81 model resultsCurrent threshold met at 6+ rounds

Equal-run comparison5 models on the same 14 roundsRanked models are compared only on rounds every model in the roster completed.

ProtocolPortfolio-onlyCompleted rounds use constrained multi-asset portfolios.

Score scaleOracle-relative100 means matching the hindsight best asset in the same scored window.

Baselines shownS&P 500, Cash, Oracle, AI consensus portfolioPractical references are shown beside the impossible hindsight ceiling when available.

Use this as benchmark evidence, not an investable strategy result. More resolved rounds are needed before making strong performance claims.

Monthly benchmarkQualified but still forming

Evidence levelQualified but still formingMonthly evidence has crossed the current benchmark threshold, but the sample is still early for strong performance claims.

Monthly evidence4 resolved rounds / 17 model resultsCurrent threshold met at 3+ rounds

Equal-run comparison4 models on the same 4 roundsRanked models are compared only on rounds every model in the roster completed.

ProtocolMixed protocolCompleted history includes 3 portfolio, 1 single-pick, and 0 unlabelled rounds.

Score scaleOracle-relative100 means matching the hindsight best asset in the same scored window.

Baselines shownS&P 500, Cash, Oracle, AI consensus portfolioPractical references are shown beside the impossible hindsight ceiling when available.

Use this as benchmark evidence, not an investable strategy result. More resolved rounds are needed before making strong performance claims.

Fair model comparisons

Benchmark Comparison Sets

Sets are living groups. Older sets keep adding shared rounds, while newer model rosters become current automatically after enough shared results.

View all sets

Forming set Monthly Set: Jun 9, 2026

0 shared resolved rounds across 6 models.

3 more shared rounds to qualify Forming set Monthly Set: May 28, 2026

1 shared resolved rounds across 5 models.

2 more shared rounds to qualify Current benchmark Monthly Set: May 10, 2026

4 shared resolved rounds across 4 models.

Qualified at 3+ shared rounds Forming set Weekly Set: Jun 9, 2026

3 shared resolved rounds across 6 models.

3 more shared rounds to qualify Current benchmark Weekly Set: May 28, 2026

14 shared resolved rounds across 5 models.

Qualified at 6+ shared rounds Qualified set Weekly Set: May 24, 2026

16 shared resolved rounds across 4 models.

Qualified at 6+ shared rounds

Latest scored round

Most Recent Published Result

This chart shows the newest completed round only. Live rounds stay out of this chart until ending prices are collected.

Monthly result

Monthly Portfolio Returns

Models, S&P 500, and maximum possible return are shown on one scale.

Scored

Model portfolios S&P 500 benchmark Maximum possible return

Grok 4.3

Claude Opus 4.8

GPT-5.5

Gemini 3.1 Pro

Claude Opus 4.7

S&P 500

Max

Grok 4.3 xAI

0.82%

Claude Opus 4.8 Anthropic

0.10%

GPT-5.5 OpenAI

-0.18%

Gemini 3.1 Pro Google

-0.70%

Claude Opus 4.7 Anthropic

-1.02%

S&P 500 Benchmark

-3.15%

Max possible XBI

14.37%

Portfolio context

Shows each model's saved portfolio weights.

Model portfolios

Ranked in the same order as the chart.

Grok 4.3 xAI

SMH 50% XLK 30% MTUM 20%

Claude Opus 4.8 Anthropic

SMH 35% MTUM 25% XLK 20% EWT 10% IAU 10%

GPT-5.5 OpenAI

SMH 40% EWY 25% EWT 15% XLK 10% MTUM 10%

Gemini 3.1 Pro Google

SMH 40% XLK 30% EWY 15% EWT 15%

Claude Opus 4.7 Anthropic

SMH 35% XLK 25% ITA 15% IAU 15% MTUM 10%

Reference points

Not model portfolios.

S&P 500 Benchmark

Benchmark return over the same scoring window

Max possible XBI

100% Biotechnology (XBI) hindsight ceiling

Run details

CB-2026-05-28-1M

2026-05-28 to 2026-06-26

Full result Benchmark results

WinnerGrok 4.3 Return0.82% Models5 Eligible assets65

Benchmark result paths

Choose The Result View

Latest pages show one completed round. All-history pages are context. Comparison sets are the fair ranking view.

All rounds

Weekly result Latest Weekly

CB-2026-06-18-1W, 2026-06-18 to 2026-06-25

Scored Weekly aggregate Overall Weekly

16 all-available weekly rounds for context. Fair rankings use comparison sets.

14 rounds Monthly result Latest Monthly

CB-2026-05-28-1M, 2026-05-28 to 2026-06-26

Scored Monthly aggregate Overall Monthly

4 all-available monthly rounds for context. Fair rankings use comparison sets.

4 rounds

Live rounds

Waiting For Final Prices

These rounds are live or pending score. They are not counted in completed result charts yet.

Weekly live round

CB-2026-06-26-1W

Scores after the 2026-07-02 close.

2026-06-26 to 2026-07-02 official-20260626-no-fable

View locked portfolios

Monthly live round

CB-2026-06-26-1M

Scores after the 2026-07-24 close.

2026-06-26 to 2026-07-24 official-20260626-no-fable

View locked portfolios

Audit and data

Every Result Has An Audit Packet

Results link back to prompts, model outputs, portfolio decisions, prices, audit hashes, and scoring records.

Round index 43 public rounds with dates, prices, and audit packets. Scoring rules How model portfolios are compared with market returns. API docs Published data endpoints for results, rounds, models, and exposures. GitHub repository Source files, generated reports, and public benchmark artifacts.