SAMPLE PRIVATE EVAL REPORT

A private-report format built from public benchmark evidence.

This sample uses public round CB-2026-05-28-1W to show the structure buyers receive: decision summary, scorecard, attribution, consistency evidence, failure-mode review, and audit index.

CapitalBench Private Eval Sample

CB-2026-05-28-1W report preview

Outcome window
2026-05-28 to 2026-06-04
Systems compared
5
Methodology
portfolio-v1.0
Decision summary

Gemini 3.1 Pro led this public sample round.

The leading portfolio returned 3.93% against an S&P 500 return of 0.33%. The best eligible asset was Semiconductors with a 4.62% return.

A real private report replaces these public comparator rows with the buyer's system, repeated-run evidence, access notes, cost and latency records, and client-approved confidentiality terms.

Scorecard extract

Top model Gemini 3.1 Pro
Portfolio return 3.93%
CapitalBench Score 85.1
S&P 500 return 0.33%
Best eligible asset Semiconductors (SMH)
Oracle return 4.62%
01

Executive summary

The sample report explains which system led the public comparator set, where returns came from, and what the result can and cannot prove.

02

Comparative scorecard

Performance, score, benchmark difference, validity, consistency, concentration, cost, and latency are kept as separate fields.

03

Failure-mode register

Findings are written as decision records with severity, evidence, likely business impact, and a retest condition.

04

Audit packet index

The final packet links frozen inputs, raw outputs, parsed submissions, prices, hashes, calculations, and methodology version.

Example finding format

Finding Claude Opus 4.8 trailed the leading sample result but still selected a valid diversified portfolio.
Evidence Portfolio return, CapitalBench Score, holdings, and price records are linked to the audit packet.
Limitation One weekly window is real outcome evidence, not proof of durable investment skill.
Retest condition Run additional weekly or monthly windows after the tested configuration changes.