Weekly Set: May 28, 2026

Qualified Comparison Set

Fairness Scope

Every ranked model in this set is scored only on rounds that all 5 listed models completed. If one model misses a resolved round, that round is excluded from this set for everyone.

All comparison sets

Shared rounds33 Models5 Threshold6 StatusQualified Comparison Set

Evidence context

Evidence Behind This Comparison

This weekly set is generated from completed shared rounds, not manually curated copy.

Score methodology

Weekly benchmarkMore established

Evidence levelMore establishedThis comparison set has enough shared rounds for stronger pattern reads, while still needing ongoing live validation.

Set evidence33 shared rounds in this setQualified at 6+ shared rounds

Equal-run comparison5 models on the same 33 roundsRanked models are compared only on rounds every model in the roster completed.

ProtocolPortfolio-onlyCompleted rounds use constrained multi-asset portfolios.

Score scaleOracle-relative100 means matching the hindsight best asset in the same scored window.

Baselines shownS&P 500, Cash, Oracle, AI consensus portfolioPractical references are shown beside the impossible hindsight ceiling when available.

Use this as benchmark evidence, not an investable strategy result. More resolved rounds are needed before making strong performance claims.

Equal-run benchmark

Weekly Qualified Comparison Set

Every ranked model in this set completed the same 33 weekly rounds.

33 shared resolved rounds5 equal-run models rankedQualified at 6+ shared roundsNewest included round: CB-2026-07-20-1W

Shared resolved rounds

CapitalBench Score

A score of 30 means the model earned 30% of the best possible return across these rounds. Calculation

Grok 4.3

Claude Opus 4.8

Claude Opus 4.7

Gemini 3.1 Pro

GPT-5.5

S&P 500

Max possible What is this? hindsight best asset

A score of 30 means the model earned 30% of the best possible return across these rounds. Calculation

Grok 4.3 xAI · 33/33 scored rounds

-1.5

Claude Opus 4.8 Anthropic · 33/33 scored rounds

-4.5

Claude Opus 4.7 Anthropic · 33/33 scored rounds

-8.2

Gemini 3.1 Pro Google · 33/33 scored rounds

-10.4

GPT-5.5 OpenAI · 33/33 scored rounds

-10.6

S&P 500 S&P 500 · 33/33 scored rounds

-1.3

Max possible Hindsight ceiling, not a model portfolio

What is this? 100.0

33 shared resolved rounds5 equal-run models rankedQualified at 6+ shared roundsNewest included round: CB-2026-07-20-1W

Return context

Average Return Details

Average portfolio return across the same finished rounds.

Grok 4.3

-0.13%

Claude Opus 4.8

-0.40%

Claude Opus 4.7

-0.72%

Gemini 3.1 Pro

-0.91%

GPT-5.5

-0.93%

S&P S&P 500

-0.12%

MAX Max possible What is this?

8.75%

Excluded for fairness: CB-2026-07-21-1W missing anthropic-claude-opus-4-7; CB-2026-07-22-1W missing anthropic-claude-opus-4-7; CB-2026-07-23-1W missing anthropic-claude-opus-4-7 Fairness rule: every ranked model completed every included round. A missed round is excluded from this set for everyone.

Compare model groups

How do these results compare?

Grok 4.3 ranks first in both groups. The groups share 6 completed rounds. May 28 Weekly includes 27 more rounds. Grok 4.5, Claude Fable 5, GPT-5.6 Sol appear only in Jul 10 Weekly.

5models in both 6rounds used by both Changed a lotchange in order No top model changed

Jul 10 Weekly is the main published ranking. May 28 Weekly also has enough rounds, so compare them to see whether the results hold across different model groups.

Compare these groups

Roster

Models In This Set

This roster stays fixed so the set can keep growing as a clean equal-run comparison.

Anthropic Claude Opus 4.7

anthropic-claude-opus-4-7

33 shared rounds in this set Anthropic Claude Opus 4.8

anthropic-claude-opus-4-8

33 shared rounds in this set Google Gemini 3.1 Pro

google-gemini-3-1-pro

33 shared rounds in this set OpenAI GPT-5.5

openai-gpt-5-5

33 shared rounds in this set xAI Grok 4.3

xai-grok-4-3

33 shared rounds in this set

Round audit

Included And Excluded Rounds

Included rounds count toward the score. Excluded rounds are resolved rounds after the set started where at least one set model was missing.

Included rounds CB-2026-05-28-1W, CB-2026-05-29-1W, CB-2026-06-01-1W, CB-2026-06-02-1W, CB-2026-06-03-1W, CB-2026-06-05-1W, CB-2026-06-08-1W, CB-2026-06-09-1W, CB-2026-06-12-1W, CB-2026-06-13-1W, CB-2026-06-15-1W, CB-2026-06-16-1W, CB-2026-06-17-1W, CB-2026-06-18-1W, CB-2026-06-22-1W, CB-2026-06-23-1W, CB-2026-06-24-1W, CB-2026-06-25-1W, CB-2026-06-26-1W, CB-2026-06-29-1W, CB-2026-06-30-1W, CB-2026-07-01-1W, CB-2026-07-02-1W, CB-2026-07-06-1W, CB-2026-07-07-1W, CB-2026-07-08-1W, CB-2026-07-09-1W, CB-2026-07-10-1W, CB-2026-07-13-1W, CB-2026-07-14-1W, CB-2026-07-15-1W, CB-2026-07-17-1W, CB-2026-07-20-1W

Excluded for fairness

3 resolved candidate rounds

CB-2026-07-21-1W missing anthropic-claude-opus-4-7; CB-2026-07-22-1W missing anthropic-claude-opus-4-7; CB-2026-07-23-1W missing anthropic-claude-opus-4-7

Calculation

How The Score Is Calculated

CapitalBench Score equals total model return across included shared rounds divided by total max-possible return across those same rounds, multiplied by 100. Max possible is the best eligible asset in each included round in hindsight.

Scoring details