Published: Sep 30, 2024
Revised: Feb 28, 2025
Academic article
  • Academic article

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.
Ezra Karger*,1,2, Houtan Bastani*,1, Chen Yueh-Han*,3, Zachary Jacobs1, Danny Halawi4, Fred Zhang4, Philip E. Tetlock1,5 ,
* Equal contribution
1 Forecasting Research Institute
2 Federal Reserve Bank of Chicago
3 New York University
4 University of California, Berkeley
5 Wharton School of the University of Pennsylvania
Correspondence to: forecastbench@forecastingresearch.org
Published: Sep 30, 2024
Revised: Feb 28, 2025
Ezra Karger*,1,2, Houtan Bastani*,1, Chen Yueh-Han*,3, Zachary Jacobs1, Danny Halawi4, Fred Zhang4, Philip E. Tetlock1,5

Abstract

Forecasts of future events are essential inputs into informed decision-making. Machine learning (ML) systems have the potential to deliver forecasts at scale, but there is no framework for evaluating the accuracy of ML systems on a standardized set of forecasting questions. To address this gap, we introduce ForecastBench: a dynamic benchmark that evaluates the accuracy of ML systems on an automatically generated and regularly updated set of 1,000 forecasting questions. To avoid any possibility of data leakage, ForecastBench is comprised solely of questions about future events that have no known answer at the time of submission. We quantify the capabilities of current ML systems by collecting forecasts from expert (human) forecasters, the general public, and LLMs on a random subset of questions from the benchmark (N=200). While LLMs have achieved super-human performance on many benchmarks, they perform less well here: expert forecasters outperform the top-performing LLM (p-value <0.001). We display system and human scores in a public leaderboard here.

* Equal contribution
1 Forecasting Research Institute
2 Federal Reserve Bank of Chicago
3 New York University
4 University of California, Berkeley
5 Wharton School of the University of Pennsylvania
Correspondence to: forecastbench@forecastingresearch.org
    Related Research
    Project
    ForecastBench
    Ongoing
    Project
    The Longitudinal Expert AI Panel (LEAP)
    Ongoing
    Working paper
    The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact
    Nov 10, 2025
    Academic article
    The Forecasting Proficiency Test: A General Use Assessment of Forecasting Ability
    Nov 18, 2024