Existential Risk Persuasion Tournament (XPT)
The XPT explores potential threats to humanity in this century, with a focus on artificial intelligence, biosecurity, climate, and nuclear arms. In the first tournament, over 200 experts and highly skilled forecasters worked individually and in teams to craft forecasts and persuasive explanations. The initial tournament ran from June-October 2022.
Find more about the XPT, including the full policy report for the 2022 tournament, here.
Adversarial collaboration on AI risk
The adversarial collaboration on AI risk followed the Existential Risk Persuasion Tournament, and featured more intensive interaction between parties that disagreed about the threat from AI in this century. Participants then worked collaboratively to create “crux” forecasting questions which would best elucidate the disagreement between differing viewpoints. This provided a richer understanding of the large differences in participants’ AI risk forecasts seen in the 2022 XPT.
Find more about the AI adversarial collaboration project, including the full report, here.
AI Conditional Trees
In “Conditional Trees: A Method for Generating Informative Questions about Complex Topics,” we tested a new process for generating high-value forecasting questions: the “conditional trees” method. Conditional trees are simplified Bayesian networks, which can serve as a framework for both generating and judging high-value forecasting questions. We conducted specialized interviews with AI domain experts and highly-skilled generalist forecasters to generate 75 new AI forecasting questions with resolution dates ranging from 2030-2070. We then assessed how much “Value of Information” (VOI) each question provides for a far-future outcome — whether AI will cause human extinction by 2100 — using forecasts collected from a small sample of superforecasters.
This report provides initial evidence that the conditional trees process can generate higher value questions on important topics than status quo methods. We recommend that future research directly tests conditional trees against other question generation techniques with a larger sample of forecasters, for more robust estimates of VOI differences.
Find out more about “Conditional Trees: A Method for Generating Informative Questions about Complex Topics,” including the full report, here.
Expert Forecasts of Nuclear Risk
The “Expert Forecasts of Nuclear Risk” study systematically assessed expert beliefs about the probability of a nuclear weapons catastrophe by 2045. The study saw 110 domain experts and 41 expert forecasters (“superforecasters”) predict the likelihood of nuclear conflict, explain the mechanisms underlying their predictions, and forecast the impact of specific tractable policies on the likelihood of nuclear catastrophe.
Find more about the “Expert Forecasts of Nuclear Risk” study, including the full report, here.
ForecastBench
ForecastBench is a benchmark that measures the forecasting capabilities of LLMs, comparing their performance to both the general public and superforecasters. As a dynamic and continually-updated benchmark, it asks questions about future events and hence is impervious to the data leakage and overfitting problems associated with static benchmarks. The result is a leaderboard that is updated nightly and datasets that can be used for training forecasting LLMs.
For more about ForecastBench and to see the latest leaderboard, visit www.forecastbench.org.
More projects in the works
Forecasting Proficiency Test: We are designing a one-hour assessment to identify talented forecasters in a general population.
Intersubjective Metrics: To help us incentivize accuracy on long-run and unresolvable questions, we will explore, classify, and test intersubjective metrics like reciprocal scoring.
Epistemic Reviews: We’re working closely with policymakers and nonprofit organizations to assess how forecasting tools could help them reduce uncertainty, identify action-relevant disagreement, and guide their decision processes.
Project Improbable: Elicitation of forecasts in low probability ranges (<10%) is relatively unstudied, and may require very different strategies from typical elicitation contexts. We’re exploring ways of reducing noise in forecaster judgments of low probability events.
Team Dynamics: Teams of forecasters produce more accurate predictions than individual forecasters, but at what point does enlarging a team fail to improve performance? How much time is the optimal amount to spend per forecasting question? We’re running a large-scale RCT to find out how best to allocate team resources for forecasting.
Highlighted publications
Improving Judgments of Existential Risk: Better Forecasts, Questions, Explanations, Policies
Karger, E., Atanasov, P., Tetlock, P. Available at SSRN (January 5, 2022).
Reciprocal Scoring: A Method for Forecasting Unanswerable Questions
Karger, E., Monrad, J., Mellers, B., Tetlock, P. Available at SSRN (2021).
A Better Crystal Ball: The Right Way to Think About the Future
Scoblic, P., Tetlock, P. Foreign Affairs (November/December 2020).
Find a list of all of our publications here.
Our work is supported by grants from Open Philanthropy and other philanthropic foundations.