Same Prompt And Asset List

Models receive the same prompt, briefing, asset list, and allowed response format for a public run. CapitalBench is not trying to equalize the internal knowledge of each model; it controls the external information supplied at allocation time.

Round inputs are hashed before public model calls. The public site mirrors the published hashes so readers can verify that the prompt, briefing, asset list, and market-data context were fixed before portfolios were collected.

No Public Backfills

New models become eligible for future rounds only. Older benchmark results remain unchanged, which prevents later model releases from being scored against historical market states they did not face in public at the time.

Run Isolation

Public, consistency, mock, retrospective, and provider smoke runs live under separate run IDs. Public benchmark results only use the selected public run for a round. Repeated consistency runs are shown separately because they answer a different evaluation question.

Tool And Retrieval Boundary

Public model calls do not use browsing, live retrieval, trading tools, or hidden post-cutoff data. If a provider run fails, the failed raw attempt can remain in the audit trail, but it does not become a public result.

Invalid Submission Handling

A public result requires a schema-valid submission for the round's declared format. Missing option identifiers, invalid allocation totals, unparsable responses, and failed attempts are excluded from scoring. This keeps the benchmark result tied to valid frozen portfolios from the same round.