Cadence

Event-Based

Updates are posted for material benchmark, data, methodology, security, or operational changes.

Latest Entry

2026-06-16

Insights engine v1 added

Entry Format

Impact First

Each entry states what changed, why it matters, what was touched, and where to inspect the result.

Release Notes

Published Updates

Newest entries appear first. Links and anchors are stable unless an entry is corrected.

  1. Research published

    Insights engine v1 added

    #

    CapitalBench now publishes deterministic insights generated from benchmark positioning, results, risk appetite, and model behavior data.

    • The new insights pipeline builds canonical input packets, generates deterministic insight candidates, validates the public feed, and writes dated audit artifacts under insights/.
    • An optional NVIDIA NIM rewrite step now uses meta/llama-3.1-8b-instruct by default to improve wording without changing calculations, evidence, or benchmark facts.
    • LLM output is rejected if it references unknown candidates, introduces unsupported numbers, or uses investment-action language; deterministic insights publish when NVIDIA is unavailable.
    • The website now has a dedicated Insights page and the Data API exposes /v1/insights plus /v1/insights/{insight_id}.
    • The weekday GitHub Actions schedule is documented and ready to publish once workflow-file write scope is available; the first public artifact was generated manually with NVIDIA.
    • A stable benchmark-data fingerprint prevents unchanged holiday or stale-data runs from calling NVIDIA, rewriting latest insights, or deploying the site.
    • Deterministic calculations remain the source of truth.
  2. Benchmark published

    Equal-run benchmark sets added

    #

    CapitalBench now publishes fair weekly and monthly benchmark sets that compare only models sharing the same resolved rounds.

    • The homepage and leaderboard pages now surface current weekly and monthly benchmark sets scored against the same oracle-relative CapitalBench Score scale.
    • Dedicated benchmark-set pages show fixed rosters, shared included rounds, fairness exclusions, and score calculations for each set.
    • Model profile pages now show the benchmark sets each model belongs to, with filters and focused score charts for weekly and monthly comparisons.
    • New official run rosters automatically open a set only when no already-started set for that track already contains the models in the run.
    • Set coverage is time-aware, so future manual metadata cannot hide an earlier new-model cohort.
    • Temporary model outages do not create smaller sets; those rounds remain covered by the larger roster and are excluded from that set until every required model has an official result.
  3. Benchmark published

    Claude Fable 5 joins CapitalBench

    #

    Claude Fable 5 entered its first official weekly and monthly CapitalBench tests in the June 9, 2026 cohort.

    • The model is configured as anthropic-claude-fable-5 using the Anthropic API model ID claude-fable-5.
    • Adaptive-thinking effort is pinned to low, Anthropic's lowest supported setting for Claude Fable 5.
    • Both official June 9 runs produced valid submissions with the same frozen briefing, Universe v2.1 options, and June 9 adjusted-close entry snapshot used by the other five models.
    • Older tests are not backfilled, so Fable's scorecards will show a shorter history until these rounds resolve.
  4. Methodology updated

    CapitalBench Score aligned directly with the oracle

    #

    CapitalBench Score now compares model return directly with the hindsight maximum, including the magnitude of losses.

    • A model that matches the best asset scores 100, no net return scores 0, and negative values represent losses relative to the oracle.
    • Full-history scores divide summed model returns by summed oracle returns, so a small loss ranks above a large loss.
    • Oracle-return weighting prevents a low-opportunity test with a small denominator from dominating the full history.
    • Tests are not compounded because overlapping rounds are separate benchmark experiments rather than one sequential portfolio.
  5. Methodology published

    Live and historical AI Risk Appetite published

    #

    CapitalBench now publishes current and historical allocation-based risk signals from weekly and monthly model portfolios.

    • Weekly tactical and monthly strategic readings are calculated separately and then equal-weighted into a 0-100 combined pulse.
    • The landing page shows the current pulse, model agreement, allocation drivers, regime mix, and the risk level of all unresolved portfolios.
    • A dedicated methodology page adds historical views for the decision pulse, model agreement, regime mix, and risk level of the live portfolio book.
    • The methodology page publishes the formula and the complete versioned asset-risk table.
    • The existing historical model risk profiles retain their 1-5 display and now use the same shared asset definitions as the live signal.
  6. Operations updated

    Weekly interim price refresh corrected

    #

    The scheduled interim-performance job now refreshes both weekly and monthly open tests from each eligible daily close.

    • The production workflow now invokes update-interim-performance with the all-track setting instead of limiting scheduled updates to monthly tests.
    • Weekly live-return rows and the homepage priced-test count will update automatically after an eligible post-entry close is fetched and deployed.
    • A regression test now verifies that the checked-in production workflow continues to refresh all tracks.
  7. Data updated

    Final scoring prices now refresh both adjusted-close endpoints

    #

    Resolved weekly scoring files were refreshed onto one adjusted-close basis, and future automated resolution now fetches both start and end prices before final scoring.

    • Final automated resolution now refreshes both entry and exit adjusted closes together after the scoring window closes, avoiding stale entry snapshots when ETF adjusted histories update for distributions.
    • The three resolved weekly rounds were regenerated from same-source adjusted-close price files; Gemini's latest weekly CapitalBench Score remains 85.1, while OpenAI's latest weekly score moved from 53.6 to 53.5 after the price-basis correction.
    • The external price audit now verifies all resolved weekly rounds against adjusted-close data, including local model returns, maximum possible returns, and CapitalBench Scores.
  8. Data updated

    Selected-asset return field corrected for portfolio tests

    #

    Portfolio-format weekly result files now separate the primary selected asset's return from the weighted portfolio return.

    • The affected CapitalBench scores, portfolio returns, S&P 500 comparisons, alpha values, regret values, and rankings did not change.
    • The selected_asset_return field now reports the realized return of selected_option_id, while portfolio_return reports the allocation-weighted portfolio result.
    • A raw CSV score audit command now recomputes every resolved score from leaderboard, return, and allocation files and rejects this field mixup.
  9. Data updated

    Latest-result selection hardened

    #

    CapitalBench now uses the scoring-window end date as the first rule for choosing the latest resolved test across local reports, synced tables, and public pages.

    • Latest leaderboard publishing and Supabase sync now share the same exit-date-first ordering used by the website and Data API.
    • Regression tests cover overlapping rounds where a later decision date and a later scoring end date point to different tests.
    • This prevents hydrated latest-result tables from drifting away from the static latest-result page when schedules overlap.
  10. Data published

    Public data contract validation expanded

    #

    CapitalBench expanded automated checks for displayed benchmark data, documentation examples, sitemap metadata, and API contract behavior.

    • Rendered-page validation now checks homepage, leaderboards, round pages, model pages, universe data, methodology/scoring claims, and API documentation examples against generated source data.
    • The OpenAPI spec now lists only the working production API host and documents only implemented parameters for leaderboard and positioning-change endpoints.
    • The Data API now rejects mixed weekly/monthly leaderboard requests and unknown model or asset identifiers instead of returning misleading empty success responses.
    • The website build and API tests now verify that documented API endpoints are actually served by the runtime handler with current model, asset, and round IDs.
  11. Methodology updated

    Overall scorecards now combine all resolved tests

    #

    Headline weekly and monthly scorecards combine every resolved test in each track.

    • CapitalBench Score leaderboards now use all resolved weekly or monthly tests instead of only the latest model cohort.
    • Models added later are shown with fewer included tests and marked short history until they have the full resolved sample.
    • The Data API cumulative leaderboard now includes resolved-round metadata, included-test counts, and rank-eligibility fields for downstream audits.
    • The website build now validates that generated public result rows have matching canonical leaderboard and return files.
  12. Data published

    Live mark-to-market added for open tests

    #

    Open weekly and monthly tests can now show interim portfolio returns from the latest available close while remaining separate from official final scores.

    • The homepage now includes a Live Portfolio Returns chart for open tests, with filters for all live, weekly, and monthly tracks.
    • Model pages now show each model's current live return before final scoring, using only unresolved rounds and excluding completed results.
    • The Data API exposes GET /v1/live/performance, GET /v1/rounds/{round_id}/live-performance, and GET /v1/models/{model_id}/live-performance for interim mark-to-market data.
    • The scheduled interim refresh now updates all active tracks instead of only active monthly tests.
  13. Methodology published

    CapitalBench Score documented as the primary benchmark score

    #

    Overall weekly and monthly results now explain CapitalBench Score as the primary benchmark score against the maximum possible return in each completed window.

    • Scoring documentation now separates raw portfolio return, S&P 500 return, Portfolio Minus S&P 500, regret, and CapitalBench Score.
    • Overall weekly and monthly pages now lead with the CapitalBench Score chart, with average portfolio return and S&P 500 return kept as supporting context.
    • The Data API read model and OpenAPI schema include max_possible_return_pct and capitalbench_score for resolved result rows and cumulative leaderboards.
  14. Operations published

    Monthly interim performance refresh automated

    #

    Active monthly round charts now refresh from reusable daily full-universe price snapshots instead of manual per-round updates.

    • A new update-interim-performance command creates or reuses one daily price snapshot and applies it to every active monthly round whose timeline includes that close date.
    • The scheduled GitHub Actions refresh runs after U.S. market close, commits changed interim artifacts, and deploys the website only when refreshed data changes.
    • The website deploy workflow now watches universe configs, round artifacts, latest snapshots, and cumulative data so data-only updates can publish without an app-code edit.
    • Existing full-universe entry and exit price packages can also serve as reusable snapshots, reducing Tiingo calls while keeping each round's frozen entry prices unchanged.
    • Per-round Supabase sync failures are reported as warnings so one stale or mismatched round does not block other active monthly charts from updating.
  15. Data published

    Run concentration analytics added

    #

    CapitalBench now reports how concentrated each round is across the model portfolios submitted for that round.

    • Round pages and the latest scored-test view now show run-level consensus allocation, including the largest shared asset, top-three asset share, and effective asset count.
    • Completed rounds remain available for concentration review, while active exposure still excludes completed rounds from live positioning.
    • The Data API now exposes GET /v1/rounds/{round_id}/concentration for round-level concentration summaries, asset weights, category mix, and model-level holders.
  16. Data published

    CapitalBench Data API v1 added

    #

    CapitalBench now has a protected read-only API for published model portfolios, active positioning, cumulative allocation behavior, results, assets, and proof metadata.

    • The web build now generates a static API read model from the same public round files used by the website, so API data refreshes when round artifacts are deployed.
    • Versioned v1 endpoints expose active and cumulative positioning, model holdings, asset holders, rounds, portfolios, latest and cumulative leaderboards, current universe data, and model style metrics.
    • API requests require bearer keys backed by Cloudflare D1, with per-minute and daily fixed-window rate limits.
    • A local API-key CLI can generate one-time keys and insert them into the production D1 database when Cloudflare credentials are available.
  17. Methodology published

    Future rounds move to Universe v2.1

    #

    CapitalBench Universe v2.1 adds five future-round ETF options to broaden the choice set while leaving completed rounds frozen.

    • The new future-round options are Broad AI Technology (AIQ), Autonomous Technology and Robotics (ARKQ), Cybersecurity (CIBR), Solar Energy (TAN), and Metals and Mining (XME).
    • Universe v2.1 keeps all 65 v2.0 options unchanged and adds the new ETFs as neutral exposure options, not as recommendations or performance-ranked choices.
    • New rounds initialized with capitalbench init-round now default to configs/universes/capitalbench_universe_v2_1.yaml and record universe_version: v2.1 unless an older or custom universe is explicitly passed.
    • Existing v1.5 and v2.0 round directories, manifests, option files, hashes, and public results remain unchanged.
  18. Benchmark published

    Claude Opus 4.8 joins future CapitalBench tests

    #

    Claude Opus 4.8 was added as a regular participant starting with the May 28, 2026 weekly and monthly tests.

    • The model is configured as anthropic-claude-opus-4-8 using the Anthropic API model ID claude-opus-4-8.
    • Eligibility begins with the May 28, 2026 weekly and monthly rounds, so older completed tests are not backfilled.
    • Anthropic effort is explicitly pinned to low, the lowest documented effort level, and no thinking field is sent for Claude Opus models.
  19. Benchmark published

    Weekly benchmark track added separately from monthly rounds

    #

    CapitalBench now supports one-week rounds as a separate track with separate website lanes and leaderboard slots.

    • The first weekly packet, CB-2026-05-24-1W, uses its own manifest, prompt, model input, hashes, run folder, entry prices, and resolution job while reusing May 24 source research only as input material.
    • Latest and cumulative public read models now use separate weekly and monthly slots so one-week and one-month scores cannot overwrite or mix with each other.
    • The homepage, leaderboard hub, round index, and round pages now label weekly and monthly tracks separately.
    • The landing page now presents weekly and monthly as equal track lanes with separate status cards, allocation previews, leaderboard links, timelines, and audit packet links.
    • Weekly prompts now make the close-to-close timeline explicit, including the May 22 entry close, Memorial Day market holiday, Tuesday-to-Friday regular-session window, and May 29 exit close.
    • Default monthly prompt generation and generated model-input metadata now reinforce close-to-close scoring and timeline-focused reasoning without using negative one-month wording in weekly rounds.
  20. Benchmark published

    Interim weekly round performance added

    #

    Round pages can now show interim model allocation returns, S&P 500 returns, and Portfolio Minus S&P 500 when price snapshots are available.

    • The CLI can calculate weekly performance from existing price snapshots without resolving the official one-month leaderboard early.
    • Supabase now stores published weekly price and model-performance rows so round pages can render the chart from the public read model.
    • Round 1 is backfilled with the May 8 entry snapshot and the May 15 price snapshot already used for Round 2 inputs.
  21. Methodology published

    One-month prompt objective clarified

    #

    Future CapitalBench prompts now make the one-month scoring window explicit before models choose allocations.

    • Newly initialized portfolio prompts instruct models to optimize for the close-to-close one-month scoring window from entry adjusted close to exit adjusted close.
    • Single-pick prompt defaults received the same clarification so older and newer submission formats remain conceptually aligned.
    • Generated model inputs now include scoring-window, close-to-close scoring, and timeline-focus metadata derived from each round manifest's entry date, exit date, and horizon.
  22. Data published

    Universe v2.0 approved for future rounds

    #

    CapitalBench now has an expanded 65-option universe for future rounds while preserving the original 40-option universe for completed rounds.

    • Universe v2.0 keeps every v1.5 option and adds 25 Tiingo-validated exposures across equal-weight US equity, biotechnology, regional banks, aerospace and defense, country equity, bonds, commodities, currencies, and crypto ETF proxies.
    • Round manifests can now carry a universe_version value so the website and Supabase read model can show which option file was frozen for each round.
    • The public universe page now shows version history and renders the latest approved option table without changing any completed round inputs.
  23. Methodology published

    Portfolio round protocol groundwork added

    #

    CapitalBench now supports a versioned portfolio submission protocol for future rounds while preserving single-pick compatibility for completed rounds.

    • Future portfolio rounds can require 1 to 5 holdings, 5% allocation increments, and exactly 100% total allocation through frozen round manifest constraints.
    • CLI validation, mock submissions, scoring, reports, Supabase sync, and website tables now understand portfolio allocations and holding-level audit rows.
    • Completed single-pick rounds remain labeled and scored under their original methodology; portfolio rounds are reported separately by submission format.
  24. Operations published

    Public changelog format established

    #

    CapitalBench now has a dedicated public changelog for user-approved major changes to benchmark protocol, round data, scoring, and operations.

    • Entries are reverse chronological and anchored so individual updates can be linked directly.
    • Each entry includes a category, status, concise impact summary, implementation notes, and relevant links.
    • Routine UI, UX, copy, and visual-design changes are excluded unless they materially affect benchmark interpretation.