{"id":949,"date":"2024-09-30T12:00:00","date_gmt":"2024-09-30T12:00:00","guid":{"rendered":"https:\/\/forecastingresearch.org\/?post_type=research&#038;p=949"},"modified":"2026-05-06T14:07:12","modified_gmt":"2026-05-06T14:07:12","slug":"forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities","status":"publish","type":"research","link":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities","title":{"rendered":"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Abstract<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Forecasts of future events are essential inputs into informed decision-making. Machine learning (ML) systems have the potential to deliver forecasts at scale, but there is no framework for evaluating the accuracy of ML systems on a standardized set of forecasting questions. To address this gap, we introduce ForecastBench: a dynamic benchmark that evaluates the accuracy of ML systems on an automatically generated and regularly updated set of 1,000 forecasting questions. To avoid any possibility of data leakage, ForecastBench is comprised solely of questions about future events that have no known answer at the time of submission. We quantify the capabilities of current ML systems by collecting forecasts from expert (human) forecasters, the general public, and LLMs on a random subset of questions from the benchmark (N=200). While LLMs have achieved super-human performance on many benchmarks, they perform less well here: expert forecasters outperform the top-performing LLM (p-value&nbsp;&lt;0.001). We display system and human scores in a public leaderboard <a href=\"http:\/\/www.forecastbench.org\/\">here<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"btn orange\" href=\"https:\/\/doi.org\/10.48550\/arXiv.2409.19839\" target=\"_blank\" rel=\"noreferrer noopener\">Published as a conference paper at ICLR 2025 <svg width=\"7\" height=\"9\" viewBox=\"0 0 7 9\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.000156283 8.60806L4.22416 4.33606V4.24006L0.000156283 6.10352e-05H1.80816L6.06416 4.28806L1.80816 8.60806H0.000156283Z\" fill=\"#102B23\"\/>\n<\/svg>\n<svg width=\"8\" height=\"10\" viewBox=\"0 0 8 10\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.601719 8.85794L4.82572 4.58594V4.48994L0.601719 0.249939H2.40972L6.66572 4.53794L2.40972 8.85794H0.601719Z\" fill=\"#102B23\"\/>\n<\/svg><\/a><\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.","protected":false},"featured_media":1240,"template":"","meta":{"footnotes":""},"research_type":[5],"class_list":["post-949","research","type-research","status-publish","has-post-thumbnail","hentry","research_type-academic-article"],"acf":[],"yoast_head":"<title>ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities &#8211; Forecasting Research Institute<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities &#8211; Forecasting Research Institute\" \/>\n<meta property=\"og:description\" content=\"A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities\" \/>\n<meta property=\"og:site_name\" content=\"Forecasting Research Institute\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-06T14:07:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/09\/illustration_Midjourney_ForecastBench-02.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1607\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities\",\"name\":\"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities &#8211; Forecasting Research Institute\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2024\\\/09\\\/illustration_Midjourney_ForecastBench-02.jpg\",\"datePublished\":\"2024-09-30T12:00:00+00:00\",\"dateModified\":\"2026-05-06T14:07:12+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#primaryimage\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2024\\\/09\\\/illustration_Midjourney_ForecastBench-02.jpg\",\"contentUrl\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2024\\\/09\\\/illustration_Midjourney_ForecastBench-02.jpg\",\"width\":2560,\"height\":1607},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/forecastingresearch.org\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/#website\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/\",\"name\":\"Forecasting Research Institute\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/forecastingresearch.org\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>","yoast_head_json":{"title":"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities &#8211; Forecasting Research Institute","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities","og_locale":"en_US","og_type":"article","og_title":"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities &#8211; Forecasting Research Institute","og_description":"A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.","og_url":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities","og_site_name":"Forecasting Research Institute","article_modified_time":"2026-05-06T14:07:12+00:00","og_image":[{"width":2560,"height":1607,"url":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/09\/illustration_Midjourney_ForecastBench-02.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities","url":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities","name":"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities &#8211; Forecasting Research Institute","isPartOf":{"@id":"https:\/\/forecastingresearch.org\/#website"},"primaryImageOfPage":{"@id":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#primaryimage"},"image":{"@id":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#primaryimage"},"thumbnailUrl":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/09\/illustration_Midjourney_ForecastBench-02.jpg","datePublished":"2024-09-30T12:00:00+00:00","dateModified":"2026-05-06T14:07:12+00:00","breadcrumb":{"@id":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#primaryimage","url":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/09\/illustration_Midjourney_ForecastBench-02.jpg","contentUrl":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/09\/illustration_Midjourney_ForecastBench-02.jpg","width":2560,"height":1607},{"@type":"BreadcrumbList","@id":"https:\/\/forecastingresearch.org\/research\/forecastbench-a-dynamic-benchmark-of-ai-forecasting-capabilities#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/forecastingresearch.org\/"},{"@type":"ListItem","position":2,"name":"ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities"}]},{"@type":"WebSite","@id":"https:\/\/forecastingresearch.org\/#website","url":"https:\/\/forecastingresearch.org\/","name":"Forecasting Research Institute","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/forecastingresearch.org\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/949","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research"}],"about":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/types\/research"}],"version-history":[{"count":15,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/949\/revisions"}],"predecessor-version":[{"id":2196,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/949\/revisions\/2196"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/media\/1240"}],"wp:attachment":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/media?parent=949"}],"wp:term":[{"taxonomy":"research_type","embeddable":true,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research_type?post=949"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}