CapitalBench now publishes deterministic insights generated from benchmark positioning, results, risk appetite, and model behavior data.
The new insights pipeline builds canonical input packets, generates deterministic insight candidates, validates the public feed, and writes dated audit artifacts under insights/.
An optional NVIDIA NIM rewrite step now uses meta/llama-3.1-8b-instruct by default to improve wording without changing calculations, evidence, or benchmark facts.
LLM output is rejected if it references unknown candidates, introduces unsupported numbers, or uses investment-action language; deterministic insights publish when NVIDIA is unavailable.
The website now has a dedicated Insights page and the Data API exposes /v1/insights plus /v1/insights/{insight_id}.
The weekday GitHub Actions schedule is documented and ready to publish once workflow-file write scope is available; the first public artifact was generated manually with NVIDIA.
A stable benchmark-data fingerprint prevents unchanged holiday or stale-data runs from calling NVIDIA, rewriting latest insights, or deploying the site.
Deterministic calculations remain the source of truth.
CapitalBench now publishes fair weekly and monthly benchmark sets that compare only models sharing the same resolved rounds.
The homepage and leaderboard pages now surface current weekly and monthly benchmark sets scored against the same oracle-relative CapitalBench Score scale.
Dedicated benchmark-set pages show fixed rosters, shared included rounds, fairness exclusions, and score calculations for each set.
Model profile pages now show the benchmark sets each model belongs to, with filters and focused score charts for weekly and monthly comparisons.
New official run rosters automatically open a set only when no already-started set for that track already contains the models in the run.
Set coverage is time-aware, so future manual metadata cannot hide an earlier new-model cohort.
Temporary model outages do not create smaller sets; those rounds remain covered by the larger roster and are excluded from that set until every required model has an official result.
Claude Fable 5 entered its first official weekly and monthly CapitalBench tests in the June 9, 2026 cohort.
The model is configured as anthropic-claude-fable-5 using the Anthropic API model ID claude-fable-5.
Adaptive-thinking effort is pinned to low, Anthropic's lowest supported setting for Claude Fable 5.
Both official June 9 runs produced valid submissions with the same frozen briefing, Universe v2.1 options, and June 9 adjusted-close entry snapshot used by the other five models.
Older tests are not backfilled, so Fable's scorecards will show a shorter history until these rounds resolve.
2026-06-09
Methodologyupdated
CapitalBench Score aligned directly with the oracle
Resolved weekly scoring files were refreshed onto one adjusted-close basis, and future automated resolution now fetches both start and end prices before final scoring.
Final automated resolution now refreshes both entry and exit adjusted closes together after the scoring window closes, avoiding stale entry snapshots when ETF adjusted histories update for distributions.
The three resolved weekly rounds were regenerated from same-source adjusted-close price files; Gemini's latest weekly CapitalBench Score remains 85.1, while OpenAI's latest weekly score moved from 53.6 to 53.5 after the price-basis correction.
The external price audit now verifies all resolved weekly rounds against adjusted-close data, including local model returns, maximum possible returns, and CapitalBench Scores.
2026-06-05
Dataupdated
Selected-asset return field corrected for portfolio tests
Portfolio-format weekly result files now separate the primary selected asset's return from the weighted portfolio return.
The affected CapitalBench scores, portfolio returns, S&P 500 comparisons, alpha values, regret values, and rankings did not change.
The selected_asset_return field now reports the realized return of selected_option_id, while portfolio_return reports the allocation-weighted portfolio result.
A raw CSV score audit command now recomputes every resolved score from leaderboard, return, and allocation files and rejects this field mixup.
CapitalBench now uses the scoring-window end date as the first rule for choosing the latest resolved test across local reports, synced tables, and public pages.
Latest leaderboard publishing and Supabase sync now share the same exit-date-first ordering used by the website and Data API.
Regression tests cover overlapping rounds where a later decision date and a later scoring end date point to different tests.
This prevents hydrated latest-result tables from drifting away from the static latest-result page when schedules overlap.
CapitalBench expanded automated checks for displayed benchmark data, documentation examples, sitemap metadata, and API contract behavior.
Rendered-page validation now checks homepage, leaderboards, round pages, model pages, universe data, methodology/scoring claims, and API documentation examples against generated source data.
The OpenAPI spec now lists only the working production API host and documents only implemented parameters for leaderboard and positioning-change endpoints.
The Data API now rejects mixed weekly/monthly leaderboard requests and unknown model or asset identifiers instead of returning misleading empty success responses.
The website build and API tests now verify that documented API endpoints are actually served by the runtime handler with current model, asset, and round IDs.
Open weekly and monthly tests can now show interim portfolio returns from the latest available close while remaining separate from official final scores.
The homepage now includes a Live Portfolio Returns chart for open tests, with filters for all live, weekly, and monthly tracks.
Model pages now show each model's current live return before final scoring, using only unresolved rounds and excluding completed results.
The Data API exposes GET /v1/live/performance, GET /v1/rounds/{round_id}/live-performance, and GET /v1/models/{model_id}/live-performance for interim mark-to-market data.
The scheduled interim refresh now updates all active tracks instead of only active monthly tests.
2026-06-03
Methodologypublished
CapitalBench Score documented as the primary benchmark score
Overall weekly and monthly results now explain CapitalBench Score as the primary benchmark score against the maximum possible return in each completed window.
Scoring documentation now separates raw portfolio return, S&P 500 return, Portfolio Minus S&P 500, regret, and CapitalBench Score.
Overall weekly and monthly pages now lead with the CapitalBench Score chart, with average portfolio return and S&P 500 return kept as supporting context.
The Data API read model and OpenAPI schema include max_possible_return_pct and capitalbench_score for resolved result rows and cumulative leaderboards.
Active monthly round charts now refresh from reusable daily full-universe price snapshots instead of manual per-round updates.
A new update-interim-performance command creates or reuses one daily price snapshot and applies it to every active monthly round whose timeline includes that close date.
The scheduled GitHub Actions refresh runs after U.S. market close, commits changed interim artifacts, and deploys the website only when refreshed data changes.
The website deploy workflow now watches universe configs, round artifacts, latest snapshots, and cumulative data so data-only updates can publish without an app-code edit.
Existing full-universe entry and exit price packages can also serve as reusable snapshots, reducing Tiingo calls while keeping each round's frozen entry prices unchanged.
Per-round Supabase sync failures are reported as warnings so one stale or mismatched round does not block other active monthly charts from updating.
CapitalBench now reports how concentrated each round is across the model portfolios submitted for that round.
Round pages and the latest scored-test view now show run-level consensus allocation, including the largest shared asset, top-three asset share, and effective asset count.
Completed rounds remain available for concentration review, while active exposure still excludes completed rounds from live positioning.
The Data API now exposes GET /v1/rounds/{round_id}/concentration for round-level concentration summaries, asset weights, category mix, and model-level holders.
CapitalBench now has a protected read-only API for published model portfolios, active positioning, cumulative allocation behavior, results, assets, and proof metadata.
The web build now generates a static API read model from the same public round files used by the website, so API data refreshes when round artifacts are deployed.
Versioned v1 endpoints expose active and cumulative positioning, model holdings, asset holders, rounds, portfolios, latest and cumulative leaderboards, current universe data, and model style metrics.
API requests require bearer keys backed by Cloudflare D1, with per-minute and daily fixed-window rate limits.
A local API-key CLI can generate one-time keys and insert them into the production D1 database when Cloudflare credentials are available.
CapitalBench Universe v2.1 adds five future-round ETF options to broaden the choice set while leaving completed rounds frozen.
The new future-round options are Broad AI Technology (AIQ), Autonomous Technology and Robotics (ARKQ), Cybersecurity (CIBR), Solar Energy (TAN), and Metals and Mining (XME).
Universe v2.1 keeps all 65 v2.0 options unchanged and adds the new ETFs as neutral exposure options, not as recommendations or performance-ranked choices.
New rounds initialized with capitalbench init-round now default to configs/universes/capitalbench_universe_v2_1.yaml and record universe_version: v2.1 unless an older or custom universe is explicitly passed.
Existing v1.5 and v2.0 round directories, manifests, option files, hashes, and public results remain unchanged.
CapitalBench now supports one-week rounds as a separate track with separate website lanes and leaderboard slots.
The first weekly packet, CB-2026-05-24-1W, uses its own manifest, prompt, model input, hashes, run folder, entry prices, and resolution job while reusing May 24 source research only as input material.
Latest and cumulative public read models now use separate weekly and monthly slots so one-week and one-month scores cannot overwrite or mix with each other.
The homepage, leaderboard hub, round index, and round pages now label weekly and monthly tracks separately.
The landing page now presents weekly and monthly as equal track lanes with separate status cards, allocation previews, leaderboard links, timelines, and audit packet links.
Weekly prompts now make the close-to-close timeline explicit, including the May 22 entry close, Memorial Day market holiday, Tuesday-to-Friday regular-session window, and May 29 exit close.
Default monthly prompt generation and generated model-input metadata now reinforce close-to-close scoring and timeline-focused reasoning without using negative one-month wording in weekly rounds.
Future CapitalBench prompts now make the one-month scoring window explicit before models choose allocations.
Newly initialized portfolio prompts instruct models to optimize for the close-to-close one-month scoring window from entry adjusted close to exit adjusted close.
Single-pick prompt defaults received the same clarification so older and newer submission formats remain conceptually aligned.
Generated model inputs now include scoring-window, close-to-close scoring, and timeline-focus metadata derived from each round manifest's entry date, exit date, and horizon.
CapitalBench now has an expanded 65-option universe for future rounds while preserving the original 40-option universe for completed rounds.
Universe v2.0 keeps every v1.5 option and adds 25 Tiingo-validated exposures across equal-weight US equity, biotechnology, regional banks, aerospace and defense, country equity, bonds, commodities, currencies, and crypto ETF proxies.
Round manifests can now carry a universe_version value so the website and Supabase read model can show which option file was frozen for each round.
The public universe page now shows version history and renders the latest approved option table without changing any completed round inputs.
CapitalBench now supports a versioned portfolio submission protocol for future rounds while preserving single-pick compatibility for completed rounds.
Future portfolio rounds can require 1 to 5 holdings, 5% allocation increments, and exactly 100% total allocation through frozen round manifest constraints.
CLI validation, mock submissions, scoring, reports, Supabase sync, and website tables now understand portfolio allocations and holding-level audit rows.
Completed single-pick rounds remain labeled and scored under their original methodology; portfolio rounds are reported separately by submission format.