AI Portfolio Benchmark Scoring

Single-Round Result

Each model gets one frozen portfolio for a round, and that portfolio is what appears in the public benchmark result. For a completed round, CapitalBench reports the raw portfolio return, the S&P 500 return over the same close-to-close window, Portfolio Minus S&P 500, and regret versus the highest-returning scored option.

Single-allocation return: selected option ending price divided by starting price, minus one.
Portfolio return: weighted sum of holding returns, using the submitted allocation percentages.
Portfolio Minus S&P 500: portfolio return minus S&P 500 return.
Maximum possible return: the highest realized return among scored options in the saved universe.
Regret: maximum possible return minus the model's return, when full asset-list prices exist.
Cash comparison: whether the model's return beats a cash return.

CapitalBench Score

CapitalBench Score compares a model portfolio directly with the best possible asset choice known after the round. That hindsight result is the oracle return.

100: the portfolio matched the highest-returning asset in the saved universe.
Between 0 and 100: the portfolio earned a positive return below the oracle.
0: the portfolio had no net return.
Below 0: the portfolio lost money; a smaller loss receives a higher score than a larger loss.
Zero-return oracle: matching cash at 0% scores 100; a loss has no finite per-test ratio and is reported as unavailable.

Example: if a portfolio returns 3.93% and the best asset returns 4.62%, the score is 85.1. If the portfolio loses 2% while the best asset gains 4%, the score is -50. A 1% loss in that same round scores -25, correctly ranking the smaller loss above the larger loss.

The maximum possible return is a hindsight ceiling, not a model portfolio or recommendation. It normalizes different market windows while the separate return statistics preserve the size of gains and losses.

Price Inputs

Starting and ending prices come from the round rules. Starting prices may be published while a round is pending. Ending prices and final returns are withheld until the round period is over and the scoring job has produced the result files.

Single-allocation return (ending_price / starting_price) - 1

Portfolio return sum(weight * option_return)

Portfolio Minus S&P 500 portfolio_return - sp500_return

Max possible return max(scored_universe_returns)

CapitalBench Score 100 * portfolio_return / oracle_return

Regret max_possible_return - portfolio_return

Comparison Set Results

Current benchmark results keep weekly and monthly tracks separate and rank models inside equal-run comparison sets. Across the shared included rounds in a set, CapitalBench divides summed model returns by summed oracle returns:

Comparison-set CapitalBench Score 100 * sum(portfolio_returns) / sum(oracle_returns)

This is equivalent to weighting each round by the size of its oracle opportunity. It preserves the difference between small and large losses without allowing a week with a very small oracle return to dominate the entire history. Example: model returns of +4% and -1% against oracle returns of +8% and +2% produce a full-history score of 30: 100 * (4 - 1) / (8 + 2).

Rounds are not compounded because CapitalBench rounds may overlap and represent separate benchmark experiments, not one sequential investable portfolio. A model added later starts a new comparison set with the existing models. A set includes only rounds every model in that set completed; missed rounds are excluded from that set for everyone. All-available history remains available as context, but comparison sets are the fair ranking surface.

Consistency Checks

Consistency checks summarize repeated calls with the same saved round input. They measure whether a model tends to make the same allocation under identical conditions.

These repeated-call checks are not mixed into the main benchmark results because the public contest uses one frozen portfolio per model.

Pending Rounds

Pending rounds can show model portfolios, confidence values, rationale summaries, starting prices, and audit hashes. After at least one interim price snapshot exists, a round page may also show interim return tracking versus S&P 500. That chart is explicitly provisional and separate from the final benchmark result. CapitalBench Score, rank, regret, maximum possible return, and final return versus S&P 500 remain withheld until ending prices are present and the scorer publishes the result files.