Monthly Set: May 10, 2026

Qualified Comparison Set

Fairness Scope

Every ranked model in this set is scored only on rounds that all 4 listed models completed. If one model misses a resolved round, that round is excluded from this set for everyone.

All comparison sets

Shared rounds24 Models4 Threshold3 StatusQualified Comparison Set

Evidence context

Evidence Behind This Comparison

This monthly set is generated from completed shared rounds, not manually curated copy.

Score methodology

Monthly benchmarkMore established

Evidence levelMore establishedThis comparison set has enough shared rounds for stronger pattern reads, while still needing ongoing live validation.

Set evidence24 shared rounds in this setQualified at 3+ shared rounds

Equal-run comparison4 models on the same 24 roundsRanked models are compared only on rounds every model in the roster completed.

ProtocolMixed protocolCompleted history includes 23 portfolio, 1 single-pick, and 0 unlabelled rounds.

Score scaleOracle-relative100 means matching the hindsight best asset in the same scored window.

Baselines shownS&P 500, Cash, Oracle, AI consensus portfolioPractical references are shown beside the impossible hindsight ceiling when available.

Use this as benchmark evidence, not an investable strategy result. More resolved rounds are needed before making strong performance claims.

Equal-run benchmark

Monthly Qualified Comparison Set

Every ranked model in this set completed the same 24 monthly rounds.

24 shared resolved rounds4 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-06-30-1M

Shared resolved rounds

CapitalBench Score

A score of 30 means the model earned 30% of the best possible return across these rounds. Calculation

Grok 4.3

Claude Opus 4.7

Gemini 3.1 Pro

GPT-5.5

S&P 500

Max possible What is this? hindsight best asset

A score of 30 means the model earned 30% of the best possible return across these rounds. Calculation

Grok 4.3 xAI · 24/24 scored rounds

-6.9

Claude Opus 4.7 Anthropic · 24/24 scored rounds

-12.0

Gemini 3.1 Pro Google · 24/24 scored rounds

-16.9

GPT-5.5 OpenAI · 24/24 scored rounds

-25.7

S&P 500 S&P 500 · 24/24 scored rounds

-0.5

Max possible Hindsight ceiling, not a model portfolio

What is this? 100.0

24 shared resolved rounds4 equal-run models rankedQualified at 3+ shared roundsNewest included round: CB-2026-06-30-1M

Return context

Average Return Details

Average portfolio return across the same finished rounds.

Grok 4.3

-1.29%

Claude Opus 4.7

-2.26%

Gemini 3.1 Pro

-3.19%

GPT-5.5

-4.84%

S&P S&P 500

-0.09%

MAX Max possible What is this?

18.86%

Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.

Compare model groups

How do these results compare?

Claude Opus 4.8 ranks first in May 28 Monthly. Grok 4.3 ranks first in May 10 Monthly. The groups share 21 completed rounds. May 10 Monthly includes 3 more rounds. Claude Opus 4.8 appears only in May 28 Monthly.

4models in both 21rounds used by both Hardly changedchange in order Yes top model changed

May 28 Monthly is the main published ranking. May 10 Monthly also has enough rounds, so compare them to see whether the results hold across different model groups.

Compare these groups

Roster

Models In This Set

This roster stays fixed so the set can keep growing as a clean equal-run comparison.

Anthropic Claude Opus 4.7

anthropic-claude-opus-4-7

24 shared rounds in this set Google Gemini 3.1 Pro

google-gemini-3-1-pro

24 shared rounds in this set OpenAI GPT-5.5

openai-gpt-5-5

24 shared rounds in this set xAI Grok 4.3

xai-grok-4-3

24 shared rounds in this set

Round audit

Included And Excluded Rounds

Included rounds count toward the score. Excluded rounds are resolved rounds after the set started where at least one set model was missing.

Included rounds CB-2026-05-10-1M, CB-2026-05-17-1M, CB-2026-05-24-1M, CB-2026-05-28-1M, CB-2026-05-29-1M, CB-2026-06-01-1M, CB-2026-06-02-1M, CB-2026-06-03-1M, CB-2026-06-05-1M, CB-2026-06-08-1M, CB-2026-06-09-1M, CB-2026-06-12-1M, CB-2026-06-13-1M, CB-2026-06-15-1M, CB-2026-06-16-1M, CB-2026-06-17-1M, CB-2026-06-18-1M, CB-2026-06-22-1M, CB-2026-06-23-1M, CB-2026-06-24-1M, CB-2026-06-25-1M, CB-2026-06-26-1M, CB-2026-06-29-1M, CB-2026-06-30-1M

Excluded for fairness None

Calculation

How The Score Is Calculated

CapitalBench Score equals total model return across included shared rounds divided by total max-possible return across those same rounds, multiplied by 100. Max possible is the best eligible asset in each included round in hindsight.

Scoring details