Model behavior patterns

How The AI Allocators Differ

A peer-normalized comparison of each model's typical allocation style across eligible official frozen portfolios, with current open positioning shown separately from historical behavior.

Most risk-seeking GPT-5.5 Highest average risk-taking score Most concentrated Gemini 3.1 Pro Highest concentration across saved portfolios Most defensive Claude Opus 5 Largest defensive allocation Lowest turnover Claude Opus 5 Smallest average portfolio turnover Closest to peers Claude Opus 5 Highest average peer overlap Most distinctive GPT-5.6 Sol Lowest average peer overlap

Comparison matrix

Distinct Behavior By Model

Every label, sentence, and pill comes from the same deterministic evidence record. Realized investment results are deliberately excluded from allocation-style classification.

Calculation method

Model Distinct behavior Evidence Common exposures

GPT-5.5 OpenAI Established pattern

High-risk allocator

Risk taking averaged 80.0/100, with a median 6.6 points above same-round peers; the difference had the same direction in 83% of 86 matched portfolios. Portfolios averaged 4.8 holdings, a 34.9% largest position, and 47.1% turnover.

Risk 80/100 · +7 pts vs peers4.8 holdings · 35% top47% turnoverNow: XLF 14%

Semiconductors (SMH) 19.9% avg Biotechnology (XBI) 7.4% avg Crude Oil (USO) 7.2% avg

Grok 4.3 xAI Moderate evidence

Concentrated allocator

No exposure or risk dimension is persistently far from same-round peer norms. Portfolios averaged 3.7 holdings, a 40.2% largest position, and 52.9% turnover.

Near peer mix3.7 holdings · 40% top53% turnoverNow: XLF 18%

Semiconductors (SMH) 13.4% avg Energy Sector (XLE) 10.8% avg Healthcare Sector (XLV) 9.7% avg

Gemini 3.1 Pro Google Moderate evidence

High-conviction concentrator

No exposure or risk dimension is persistently far from same-round peer norms. Portfolios averaged 3.4 holdings, a 39.9% largest position, and 59.9% turnover.

Near peer mix3.4 holdings · 40% top60% turnoverNow: XLV 21%

Semiconductors (SMH) 16.7% avg Healthcare Sector (XLV) 10.9% avg S&P 500 (SPY) 10.6% avg

Claude Opus 4.7 Anthropic Established pattern

Defensive consensus-aligned allocator

Defensive assets averaged 16.6%, with a median 15.0 percentage points above same-round peers; the difference had the same direction in 76% of 70 matched portfolios. Portfolios averaged 4.9 holdings, a 31.8% largest position, and 50.1% turnover.

Defensive 17% · +15pp vs peers4.9 holdings · 32% top50% turnoverHistorical · retired

Semiconductors (SMH) 17.6% avg Healthcare Sector (XLV) 11.2% avg Financials Sector (XLF) 7.4% avg

GPT-5.6 Sol OpenAI Moderate evidence

High-risk distinctive allocator

Risk taking averaged 71.0/100, with a median 6.4 points above same-round peers; the difference had the same direction in 74% of 27 matched portfolios. Portfolios averaged 4.1 holdings, a 36.5% largest position, and 70.6% turnover.

Risk 71/100 · +6 pts vs peers4.1 holdings · 36% top71% turnoverNow: SMH 11%

Energy Sector (XLE) 13.7% avg Financials Sector (XLF) 12.0% avg Semiconductors (SMH) 10.7% avg

Grok 4.5 xAI Moderate evidence

Real-asset consensus-aligned allocator

Real assets averaged 38.2%, with a median 10.0 percentage points above same-round peers; the difference had the same direction in 71% of 31 matched portfolios. Portfolios averaged 4.5 holdings, a 31.3% largest position, and 55.3% turnover.

Real assets 38% · +10pp vs peers4.5 holdings · 31% top55% turnoverNow: XLE 19%

Energy Sector (XLE) 20.3% avg Financials Sector (XLF) 12.9% avg Crude Oil (USO) 12.1% avg

Claude Opus 4.8 Anthropic Established pattern

Risk-conscious steady allocator

Risk taking averaged 69.5/100, with a median 5.1 points below same-round peers; the difference had the same direction in 78% of 81 matched portfolios. Portfolios averaged 4.7 holdings, a 30.5% largest position, and 45.4% turnover.

Risk 69/100 · −5 pts vs peers4.7 holdings · 30% top45% turnoverNow: XLF 24%

Healthcare Sector (XLV) 14.1% avg Financials Sector (XLF) 13.0% avg Semiconductors (SMH) 11.1% avg

Claude Fable 5 Anthropic Moderate evidence

Diversified allocator

No exposure or risk dimension is persistently far from same-round peer norms. Portfolios averaged 4.9 holdings, a 28.0% largest position, and 55.7% turnover.

Near peer mix4.9 holdings · 28% top56% turnoverNow: IWD 17%

Energy Sector (XLE) 14.7% avg Financials Sector (XLF) 12.5% avg US Large-Cap Value (IWD) 12.5% avg

Claude Opus 5 Anthropic Provisional

Emerging allocation profile

Emerging pattern across 10 official portfolios. Portfolios averaged 4.8 holdings, a 32.5% largest position, and 40.6% turnover.

Only 10 peer-matched portfolios across 5 independent decision dates are available; stable labels require 8 and 6, respectively.

Pattern still forming4.8 holdings · 33% top41% turnoverNow: SPY 20%

S&P 500 (SPY) 21.5% avg Financials Sector (XLF) 19.5% avg Healthcare Sector (XLV) 18.5% avg

Key numbers

Behavior Metrics In One Table

These are cumulative allocation-behavior measures across eligible official saved portfolios. Performance remains available on the weekly and monthly leaderboards, but does not determine these behavior labels.

Model	Risk	Holdings	Top holding	High risk	Defensive	Peer overlap	Turnover
GPT-5.5	80.0 / 100	4.83	34.9%	84.5%	4.1%	55.0%	47.1%
Grok 4.3	75.3 / 100	3.69	40.2%	70.5%	5.4%	55.0%	52.9%
Gemini 3.1 Pro	72.1 / 100	3.43	39.9%	65.3%	11.7%	48.6%	59.9%
Claude Opus 4.7	71.3 / 100	4.90	31.8%	63.6%	16.6%	60.6%	50.1%
GPT-5.6 Sol	71.0 / 100	4.07	36.5%	74.4%	7.4%	48.3%	70.6%
Grok 4.5	71.0 / 100	4.55	31.3%	78.9%	2.6%	56.5%	55.3%
Claude Opus 4.8	69.5 / 100	4.69	30.5%	51.5%	12.4%	55.8%	45.4%
Claude Fable 5	67.7 / 100	4.86	27.9%	53.5%	11.7%	55.4%	55.7%
Claude Opus 5	63.6 / 100	4.80	32.5%	30.5%	12.5%	57.5%	40.6%

Comparative findings

What Stands Out

Each finding is tied to model IDs and metric keys in the generated report.

GPT-5.5, Gemini 3.1 Pro

GPT-5.5 and Gemini 3.1 Pro are different in different ways

GPT-5.5 stands out by risk appetite at 80.0 / 100, while Gemini 3.1 Pro stands out by portfolio structure with a 39.9% average largest holding.

risk taking scoreaverage top allocation pct

Claude Opus 5

Claude Opus 5 look more risk-managed than the aggressive cohort

Claude Opus 5 has the highest defensive allocation at 12.5%. Claude Opus 5 has the lowest measured turnover at 40.6%.

defensive pctaverage turnover pct

Claude Opus 5

Claude Opus 5 is closest to the model crowd

Claude Opus 5 has the highest average peer overlap at 57.5%. This means its allocation weights have looked more like the rest of the roster than the most distinctive models.

peer similarity

Methodology

How Behavior Labels And Pills Are Determined

The report is rebuilt from eligible official frozen portfolios during every publication build. No model receives a manually assigned caption, and the model's own descriptive wording cannot assign its label.

For each model and round, CapitalBench subtracts the median behavior-metric value of the other models in that same round. A behavior signal must exceed its published materiality floor, point in the same direction in at least 65% of matched portfolios, and have at least 8 matched portfolios across 6 independent decision dates.

Qualifying signals are ordered by absolute median peer difference divided by their materiality floor, then by persistence and a stable metric key. The strongest exposure or risk signal supplies the label modifier; peer-normalized construction, turnover, or overlap supplies the allocation-style noun.

Evidence is “established” only after 16 decision dates and 75% persistence. Opposite material weekly and monthly signals are marked horizon-dependent; a sufficiently sampled reversal under the newest methodology is marked evolving. The four pills always report signature, construction, tempo, and current open positioning (or lifecycle for a retired model). “Typical” uses all eligible history; “Now” uses only currently open portfolios.

Realized returns, ranks, ineligible or pilot runs, market-briefing prose, and free-form rationale wording are not classification inputs. Structured candidate-ledger, forecast, confidence, and key-risk fields are retained as decision-process context when coverage exists, but they do not override allocation evidence. Page-level “most” leader cards use active models only; retired profiles remain available as historical evidence.

Method version: capitalbench_behavior_evidence_v2
Peer baseline: leave-one-model-out same-round peer median
Wording provenance: deterministic_source_of_truth
Prompt contract: capitalbench_model_patterns_prompt_v2

Read the full CapitalBench benchmark methodology

Qualification thresholds Published materiality floors

A median same-round peer difference must meet the relevant floor before persistence can qualify it.

Risk taking≥ 4 score points
Technology≥ 5 percentage points
Real assets≥ 5 percentage points
International assets≥ 4 percentage points
Defensive assets≥ 4 percentage points
Cash and duration≥ 4 percentage points
S&P 500 core≥ 5 percentage points
Largest holding≥ 5 percentage points
Holding count≥ 0.5 holdings

0-100 Risk-taking score

Average allocation-weighted risk appetite across all official saved portfolios. Higher means more growth, momentum, cyclical, and high-risk exposure.

count Avg holdings

Average number of non-zero assets in the model's official saved portfolios.

percentage_points Avg top holding

Average size of the largest single holding in each official saved portfolio.

percentage_points High-risk allocation

Average allocation to assets rated as higher risk by the CapitalBench asset risk model.

percentage_points Defensive allocation

Average allocation to cash, bonds, defensive sectors, and other lower-risk ballast.

percentage_points Technology allocation

Average allocation to technology, semiconductors, Nasdaq-style growth, and AI-linked technology exposure.

percentage_points Cash/duration allocation

Average allocation to cash-like assets and duration-sensitive bond exposure.

percentage_points International allocation

Average allocation to non-U.S. country, regional, or international equity exposure.

percentage_points Real assets allocation

Average allocation to commodities, crypto, energy, gold, and other inflation-linked or real-asset groups.

percentage_points S&P 500 core allocation

Average allocation to the S&P 500 benchmark option across official saved portfolios.

0-1 Peer overlap

Average cosine similarity between this model's allocation weights and peer model portfolios in the same rounds.

percentage_points Avg turnover

Average one-half summed absolute allocation change between consecutive same-track portfolios.

rank Avg rank

Average finishing rank across resolved rounds. Lower is better.

points Avg CapitalBench Score

Average model score versus the hindsight-best eligible asset in each resolved round.