Published: Oct 29, 2024
  • Working Paper #4
Working paper
  • Working paper
  • Working Paper #4

Can Humanity Achieve a Century of Nuclear Peace?

Can Humanity Achieve a Century of Nuclear Peace?
This study systematically assessed expert beliefs about the probability of a nuclear weapons catastrophe by 2045. Domain experts and superforecasters predicted the likelihood of nuclear conflict, explained the mechanisms underlying their predictions, and forecast the impact of specific tractable policies on the chance of nuclear catastrophe.
Bridget Williams*, Ezra Karger*†, Andreas Persbo, Kseniia Pirnavskaia, Karim Kamel, Victoria Schmidt*, Otto Kuusela*, Zach Jacobs*, Philip E. Tetlock ,
* = Forecasting Research Institute
† = Federal Reserve Bank of Chicago
‡ = Open Nuclear Network, a Programme of Pax Sapiens
§ = Wharton School of the University of Pennsylvania
Published: Oct 29, 2024
Bridget Williams*, Ezra Karger*†, Andreas Persbo, Kseniia Pirnavskaia, Karim Kamel, Victoria Schmidt*, Otto Kuusela*, Zach Jacobs*, Philip E. Tetlock

Abstract

While the world has avoided large-scale nuclear war, questions remain about the role of chance versus policy choices in preventing such events. This study systematically assesses expert beliefs about the probability of a nuclear catastrophe by 2045, the centenary of the bombings of Hiroshima and Nagasaki. We define a nuclear catastrophe as an event where nuclear weapons cause the death of at least 10 million people. Through a combination of expert interviews and surveys, 110 domain experts and 41 expert forecasters (“superforecasters”) predicted the likelihood of nuclear conflict, explained the mechanisms underlying their predictions, and forecasted the impact of specific tractable policies on the likelihood of nuclear catastrophe. Experts assigned a median 5% probability of a nuclear catastrophe by 2045, while superforecasters put the probability at 1%. Factors contributing to higher risk estimates included ongoing geopolitical tensions, the proliferation of nuclear weapons, and technological vulnerabilities. Lower risk estimates highlighted the continued effectiveness of nuclear deterrence. Although Russia and NATO was the adversarial domain thought most likely to cause a nuclear catastrophe, experts believe that risks are dispersed roughly uniformly across regional conflict theaters (Russia and NATO, China and the USA, the Korean Peninsula, India and Pakistan, and Israel and Iran). Participants believe that the implementation of a bundle of six tractable policies, including the establishment of a crisis communications network and the implementation of failsafe reviews, would together halve the risk of a nuclear catastrophe.

Acknowledgments

This research would not have been possible without the generous support of Open Philanthropy.

We thank the following people, who kindly served as expert interviewees: James Acton, Catherine Dill, Robert Einhorn, Peter Hayes, Feroz Khan, Frank O’Donnell, David Santoro, Manpreet Sethi, Sir Graham Stacey, Dmitry Stefanovich, Tatsujiro Suzuki, Jon Wolfsthal, and Tong Zhao. Their expertise helped to inform the direction of the project, but we note that the content of the project and this report don’t necessarily reflect their views.

We also thank Peter Scoblic for helpful feedback on survey questions. We greatly appreciate the assistance of Josh Rosenberg, Kayla Gamin, Sam Glover, Harrison Durland, Nikitas Angeletos Chrysaitis, Catherine Wu, Amory Bennett, Kaitlyn Coffee, Coralie Consigny, Rhiannon Britt, and Rebecca Ceppas de Castro throughout the project.

Lastly, we extend our gratitude to our research participants for their invaluable contributions.

Executive Summary

This study is the largest systematic survey of subject matter experts on the risk posed by nuclear weapons. In addition to experts, we surveyed forecasters with a strong track record of accuracy (“superforecasters”). The study summarizes responses on two complementary surveys taken a month apart: the first survey focused on risk pathways and the second on policy responses.

Participants

A total of 151 participants (110 experts and 41 superforecasters) completed the full first survey, and 148 participants completed both surveys (109 experts and 39 superforecasters). Most respondents engaged deeply with the questions. The median expert or superforecaster participant reported spending nine hours completing the two surveys and wrote around 4,200 words to explain their forecasts.

Key results

Forecasts of nuclear catastrophe by 2045

We asked participants to estimate the probability that, before 2045, one or more incidents involving nuclear weapons will cause the death of at least 10 million people. The median expert forecast was 5% and the median superforecaster response was 1%. The median forecast from a sample of the US public was 10% (see Figure 1). Respondents thought that a nuclear conflict between Russia and NATO was the adversarial domain most likely to be the cause of a nuclear catastrophe of this scale; however risk was dispersed relatively evenly among all adversarial domains we asked about, which also included China and the USA, the Korean Peninsula, India and Pakistan, and Israel and Iran.

Participants who were more concerned about nuclear risk often mentioned ongoing military conflicts between nuclear powers, the proliferation of nuclear weapons, the development of new military technologies, and the weakening of international arms control agreements. Participants less concerned about nuclear risk emphasized the long-standing effectiveness of deterrence, improvements in safety mechanisms, and the assumption that most nuclear states will act rationally.

Figure 1: Distribution of forecasts of the probability of nuclear catastrophe by 2045. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 1: Distribution of forecasts of the probability of nuclear catastrophe by 2045. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.

Factors influencing risk

Participants saw conflict between adversarial countries and new actors acquiring nuclear weapons as the major drivers of nuclear risk. We discuss a large set of events in this study, but to summarize three key events:

  1. The median participant in this study reported that violent conflict between Russia and NATO would triple the risk of a nuclear catastrophe. Both experts and superforecasters reported a low probability of this event occurring: median forecasts of 5% and 1.8%, respectively;
  2. Participants were more concerned about the likelihood of a Chinese invasion of Taiwan, with the median expert estimating a 25% chance of this occurring by 2030. The median expert reported that this event would roughly double their forecast of the risk of nuclear catastrophe;
  3. The median expert also put a 25% chance of Iran acquiring nuclear weapons by 2030. Should this event occur, their forecast of the risk of catastrophe would increase by 50%.

Many of the events we asked about did not influence the median respondent’s forecast of the risk of nuclear catastrophe. These events include: summits between adversarial countries, a nuclear weapons test by North Korea, ballistic missile submarines becoming more detectable, and an accidental non-test detonation of a nuclear weapon.

Effects of policies

We asked participants about their beliefs on the effectiveness of several policy options aimed at reducing the risk of a nuclear catastrophe. The distribution of relative risk scores for the six policies we included are shown in Figure 2. We also asked participants to rank the policies by how much they would like to see each policy implemented and how much they would support funding aimed at implementing the policies.

Figure 2: Violin plots showing distribution of relative risk associated with each policy, and all six of these policies implemented together. The relative risk is the relative change in probability of nuclear catastrophe conditional on policy implementation. The group median is shown in text. The thicker bar within each violin shows the interquartile range (25th to 75th percentile forecasts), and the thin line shows the range of forecasts minus outliers.
Figure 2: Violin plots showing distribution of relative risk associated with each policy, and all six of these policies implemented together. The relative risk is the relative change in probability of nuclear catastrophe conditional on policy implementation. The group median is shown in text. The thicker bar within each violin shows the interquartile range (25th to 75th percentile forecasts), and the thin line shows the range of forecasts minus outliers.1

Two policies emerged as clear favorites: a crisis communications network2 and nuclear-armed states implementing failsafe reviews.3 The median expert thought that a crisis communications network would reduce the risk of a nuclear catastrophe by 25%, and failsafe reviews would reduce it by 20%. The superforecasters were less optimistic about the effects of the policies, with median relative risk of 15% and 10%, respectively. We also asked experts to say how their forecast of nuclear catastrophe by 2045 would change if all six policies we described were implemented. The median expert thought that the combined bundle of policies would halve the risk of a nuclear catastrophe and the median superforecaster thought it would reduce risk by 42%.

Probability of policy implementation

We asked participants to forecast the probability that each policy would be implemented (on a time frame consistent with a decision to implement the policy within the next three years). The median expert forecasted a 15% probability of implementation for the crisis communications network and a 15% probability of implementation for the failsafe reviews policy. Superforecasters estimated probabilities of 10% and 7%, respectively. Both groups thought that funding could improve the probability of policy implementation. Conditional on $500 million of funding being put towards the goal of having the policy implemented, the median expert forecasted a 25% chance that a crisis communications network would be established and a 30% chance of the failsafe reviews policy being implemented. The median superforecaster’s forecast rose to 18% and 10%, respectively.

Limitations and next steps

While this study is the largest and most comprehensive survey of nuclear experts’ beliefs about nuclear risk, there are some important limitations that we hope to address in follow-up work. First, most participants were based either in the USA or Western Europe. The number of expert participants from South Asia was similar to the number from the USA, but there were very few participants from East Asia, Eastern Europe, or the Middle East. Although this was the largest survey of its kind, the sample size was still relatively small, limiting the statistical inferences we can make from the results. The lists of policies we included may not have represented the full range of viewpoints on reducing nuclear weapons risk.

Despite these limitations, this study provides important insight into views on the risk of nuclear war. It clarifies experts’ views on how the world could best mitigate those risks and maintain nuclear peace until and beyond the centennial anniversary of the first and so far only use of nuclear weapons in warfare. We believe there is a role for quantitative forecasts to build upon this work, improving our understanding of beliefs about nuclear weapons use and how the associated risks change over time.

1. Background

Since 1945, the world has lived under the threat of nuclear weapons. So far, we have managed to avoid the disaster that would be nuclear war. Have we been lucky? Or was the probability of nuclear war always low? How confident should we be that humanity will make it to the hundred-year anniversary of the bombings of Hiroshima and Nagasaki without any further nuclear catastrophes?

These questions are difficult to answer. However, understanding views on the magnitude and nature of this risk can inform decisions about how to best prioritize resources to improve humanity’s prospects.

The goal of this study was to understand current views about the potential causes of a nuclear catastrophe. We systematically collected the views of subject matter experts, highly accurate forecasters (“superforecasters”),4 and members of the public on the threat of nuclear weapons. Specifically, we asked participants to forecast the probability that we will survive a century without another nuclear catastrophe, and the effects of events and policies that may alter this outlook. The result is the largest survey ever conducted of policy experts’ views on the magnitude of risks posed by nuclear weapons. A total of 110 experts collectively spent over 1,300 hours and wrote 520,000 words developing forecasts and explaining their reasoning.

1.1 Previous forecasts of nuclear weapons risks

While our study is the largest survey collecting forecasts from nuclear weapons policy experts, it is not the first. Over the past three decades, there have been at least four other surveys asking subject matter experts to make predictions about nuclear weapons risk:

  • As described in Expert Political Judgment (Tetlock 2005), 1988 and 1997 surveys asked non-proliferation experts and non-experts5 about the probability of nuclear war in 10, 25, 50, and 100 years.6 Among experts, the median participant forecasted a 29% chance of nuclear war occurring in the next 50 years, while the median non-expert forecasted a 30% probability.
  • In 2005, the Lugar Survey asked 79 non-proliferation and national security experts to predict the probability of a nuclear attack in the next five or 10 years.7 The median forecast was 10% for the next five years and 20% for the next 10 years.
  • A decade later, in 2015, a study by the Project for Study of the 21st Century (PS21) polled 50 national security experts on the probability of a major nuclear conflict in the next 25 years that causes more fatalities than World War II.8 The median forecast was 5%.
  • More recently, in 2022, Karger et al. ran a project called the Existential Risk Persuasion Tournament (XPT), where people with expertise in catastrophic risks (including nuclear weapons risk) and superforecasters forecasted the risk of various catastrophes and related events.9 This included a question on the probability that nuclear weapons reduce the human population by at least 10% by 2030, 2050, and 2100. The median forecast for this outcome by 2030 was 1% for experts and 0.5% for superforecasters.10

Public surveys conducted in the USA, Russia, and UK show how concerned citizens are about nuclear risk. According to online surveys by Statista and YouGov, the proportion of US adults who saw a nuclear war as “very likely” or “fairly likely” within the next 10 years nearly doubled from February 28, 2022,11 to February 2024, rising from 34% to 67%.12 A survey conducted in 2023 asked Russian citizens whether they thought there is a threat of military conflict involving nuclear weapons in the world today, to which 71% answered “there is” and 20% answered “there is not.”13 Results from these and other relevant surveys are provided in Appendix 1.

EventProbability (median)AnnualizedEstimator info/Estimator numberDate/Retrieval date14Source
Nuclear weapons will be used in combat by 20471529%0.68%11 nuclear experts1997Expert Political Judgment studies
30%0.71%23 non-experts
Nuclear attack occurs in the next 10 years20%2.2%79 non-proliferation and national security experts2005The Lugar Survey
Major nuclear conflict causes more fatalities than WWII (~80,000,000) by 20306.8%0.45%50 national security experts2015PS21 Great Power Conflict Report
Nuclear weapons reduce the human population by at least 10% by the end of (2030, 2050, 2100)

1%

(2030), 3.4% (2050),

8%

(2100)

0.11% (2030) 0.12% (2050)

0.11% (2100)

12 domain expertsJun – Oct 2022Existential Risk Persuasion Tournament (XPT)

0.5% (2030), 1.825% (2050),

4%

(2100)

0.056% (2030)

0.063% (2050)

0.051% (2100)

88 superforecasters
Table 1: Existing surveys of experts eliciting forecasts of nuclear risk.

1.2 Why develop quantitative forecasts?

Although these studies provide some evidence that nuclear experts have engaged with quantitative predictions, the community has been cautious in assigning probabilities to unlikely but catastrophic events. This is understandable given the high levels of uncertainty and the lack of historical precedents for using nuclear weapons since 1945. However, we believe that efforts to quantify these uncertain risks can serve important functions.

First, quantification can enable better understanding of viewpoints on a topic. This partly comes from providing clarity in expression. Famously, the US Joint Chiefs of Staff assessed the probability of success in the 1961 Bay of Pigs invasion at 30%. When advising President Kennedy, they communicated this as “a fair chance” of success. It was later reported that the president assumed this indicated favorable odds of success.16 More recently, a 2018 survey found wide variation in how people interpret qualitative probability terminology. For example, the quantitative probability readers assigned to the term “a real possibility” ranged from 20% to 80%.17 If nothing else, quantification ensures that people are speaking the same language when they share their views.

Developing more precise forecasts helps clarify areas of agreement, disagreement, and uncertainty, and assists in the comparison of potential threats and potential courses of action. As Bertrand Russell put it, being precise helps us realize and identify what is vague.18 The process of developing quantitative forecasts can prompt a person to think more carefully through their mental models of the world and critically analyze their assumptions. Some empirical research also suggests that greater precision can result in more accurate forecasts. A 2018 study found that precise numeric forecasts became less accurate when they were coarsened into forecast ranges that were more akin to qualitative statements of probability.19

A potential drawback of quantification is that it can imply greater certainty than is warranted, as most people associate numbers with more concrete predictions. Therefore, when communicating numeric forecasts, experts should clarify their degree of confidence in the results and how they developed the estimates. However, without clear metrics, discussions around nuclear risks can become clouded by ambiguity or shaped by dominant perspectives. Assigning probabilities not only enhances clarity but also enables policymakers to prioritize effectively, allowing them to focus on the most urgent threats and align actions with evidence-based insights. By translating abstract concerns into measurable probabilities, engaging in informed, rational prioritization amid a chaotic and noisy political environment becomes easier.

1.3 Judgmental forecasting

Assessing the probability of highly uncertain events, like the use of nuclear weapons, is challenging. Computational models that are used for prediction in other domains (such as climate change and epidemiology) cannot as readily capture the geopolitical and human factors that impact nuclear risk. Given these limitations in traditional forecasting techniques, judgmental forecasting emerges as a possible alternative.

Judgmental forecasting, relying on individuals making considered predictions, has shown promise at producing more accurate forecasts in domains where other methods have failed. Aggregating predictions from the forecasters with the best track-records has been effective in accurately predicting complex geopolitical events, economic trends, and technological developments that have eluded traditional forecasting models.20 This approach’s effectiveness was demonstrated in the Intelligence Advanced Research Projects Activity’s (IARPA) Aggregative Contingent Estimation (ACE) program from 2011 to 2015. A series of geopolitical forecasting tournaments, the Good Judgment Project, used judgmental forecasting techniques to consistently outperform competitors in predicting complex events ranging from pandemics to political leadership changes.21

That said, the application of judgmental forecasting to catastrophic risks suffers from two salient limitations. First, empirical evidence for sustained accuracy over extended time horizons is limited. Much of the evidence for the ability of select groups of forecasters to consistently outperform chance has primarily focused on forecasts with time horizons of one to six months.22 The longer-term forecasting that has been empirically studied has generally involved questions that are easier to predict due to large amounts of relevant data, such as forecasting medium-term GDP or defense spending. Second, previous efforts to forecast catastrophic events have resulted in a wide range of predictions—including some that differ by several orders of magnitude—underscoring the substantial uncertainty inherent in making these sorts of predictions.23

Despite these limitations, we believe that judgmental forecasting may be a useful tool in assessing the probability of highly important, but highly uncertain events.

2. Methods

The overarching aim of this project was to characterize views on the probability of a large-scale nuclear weapons disaster in the coming decades. We focused on understanding views about the likelihood that humanity makes it through a full century of nuclear weapons without repeat use of such weaponry. Our primary question was:

What is the probability that by 2045, one or more incidents involving nuclear weapons will cause the death of more than 10 million humans, within a 5-year time period?

Throughout this report, we use the term nuclear catastrophe to refer to the outcome specified in this question: one or more incidents involving nuclear weapons causing the death of more than 10 million humans within a five-year period.

We chose to focus this study on a very large-scale nuclear event for two main reasons. First, what makes nuclear weapons, of all deadly weapons, particularly horrific is their potential to cause death and destruction on a massive scale within minutes. All weapons can cause harm, but nuclear weapons are perhaps unique in their ability to cause such a catastrophic event so quickly. We wanted to focus analysis on this feature of nuclear weapons, which is a key reason why so much attention is given to nuclear weapons, relative to other weapons. Second, it seems there is a relative lack of discussion of how to prevent worst-case scenarios for nuclear weapons, compared to how to prevent any use of nuclear weapons.24 It seems possible that the strategies and interventions that are most effective at reducing the risk of any nuclear weapon use may be different from those aimed at reducing the risk of large-scale harms from nuclear weapons (although there is likely overlap). We therefore sought to address this relatively neglected aspect of the risk of nuclear weapons.

This project had three main components:

  • Interviews with a small number of highly experienced nuclear weapons policy experts
  • A survey asking about risk pathways for a nuclear catastrophe
  • A survey asking about policies that aim to mitigate the risk of a nuclear catastrophe

The interviews focused on identifying potential ideologically charged cruxes—that is, questions whose answers would influence views on the risk of nuclear catastrophe and where there is likely to be disagreement among experts. We used the results to develop the surveys. For a detailed discussion of the interviews and the process of developing the surveys, please see Appendix 2.

2.1 Survey content

2.1.1 Survey 1

The first survey focused on risk pathways to a nuclear catastrophe. It also asked for forecasts of the probability of nuclear catastrophe by 2045, participants’ beliefs about the probability of five adversarial domains causing a nuclear catastrophe, if one were to occur, and participants’ beliefs about four aspects of nuclear weapons policy: the strength of nuclear deterrence, the likelihood of nuclear escalation following a first strike, the merits of aiming for total disarmament, and the proliferation risk of nuclear energy.

The main body of the survey consisted of forecasting questions that resolve in 2030. These questions asked about the probability of certain events happening by 2030. These events were intended to capture ideologically charged cruxes: events with the potential to sway clashing camps’ forecasts of nuclear war. Examples of questions are shown in Box 1, and the full list of questions is provided in Appendix 3. Descriptions of questions linked to information sheets, which linked to information that we thought forecasters would likely seek out to inform their forecasts.

Box 1: Examples of 2030 Crux Questions

  • What is the probability of a [x] non-test detonation of a nuclear weapon occurring before the 1st of January, 2030?
    • a) Accidental
    • b) Inadvertent
    • c) Deliberate
  • What is the probability that [x] conducts a nuclear weapons test or comes into possession of nuclear weapons before the 1st of January 2030?
    • a) Iran
    • b) Any state other than Iran, that is not currently believed to have nuclear weapons
    • c) A non-state actor
  • What is the probability that, by the 1st of January 2030, the US will have formally announced its intention to withdraw from NATO?
  • What is the probability that, by January 1st 2030, there will have been more than 500 deaths in militarized conflict between [adversarial domain] in one calendar year?

Some of the questions were general, but some related to specific countries or adversarial domains. Participants were asked to choose one of four adversarial domains according to their expertise.

The four domains were:

  • China and the USA
  • India and Pakistan
  • Korean Peninsula
  • Russia and NATO

Participants were then randomly assigned a second domain. Due to the high number of participants choosing the Russia and NATO domain, we altered the survey settings soon after the survey began so that this domain would not be randomly assigned. Every participant answered questions on 14 general topics, plus questions on one additional topic per domain. The exact number of questions answered depended on chosen and allocated domains.

For each question, participants gave a forecast of the probability that the question would resolve positively (i.e. that the event would occur), and then provided a forecast of the probability of a nuclear weapons catastrophe conditional on the question resolving positively, and the probability conditional on the question resolving negatively.

2.1.2 Survey 2

The second survey focused on policy responses to nuclear weapons risk. Using the results from interviews, the first survey, and policy suggestions published by organizations working on nuclear issues,25 we developed a list of potential policies to ask about. Out of this list, policies for the second survey were selected with input from analysts from the Open Nuclear Network and other external advisors. Policy selection was based on the policies’ potential to influence nuclear catastrophic risk, their interest to the nuclear weapons policy community, their practicability, and likelihood of implementation.

As with the first survey, participants were asked to choose an adversarial domain according to their expertise. Participants were not allocated an additional domain for the second survey. We investigated views on 23 different policies, including six general policies (i.e., not specific to any adversarial domain) and some domain-specific policies. For most domains, we included three domain-specific policies. As we anticipated a greater number of respondents electing to answer questions on the Russia and NATO domain, we included eight policies specific to this domain, although each participant only provided answers for three of these eight policies. These policies are listed briefly in Box 2 and described fully in Appendix 4.

Box 2: Policies included in Survey 2

  • General (answered by all participants)
    • All nuclear-armed states sign and ratify the Comprehensive Test Ban Treaty
    • All nuclear-armed states conduct a failsafe review
    • A secure multilateral crisis communications network is established with all nuclear-armed states participating
    • A Fissile Material Cut-off Treaty is signed by all of the P5 countries and India and Pakistan
    • The USA removes the President of the United States’ sole authority to authorize the use of nuclear weapons
    • P5 states 1) jointly develop a risk assessment framework for the use of AI models in nuclear command, control and communication systems, and 2) agree to a moratorium on the use of high-risk AI models in NC3 systems
  • China and the USA domain
    • The USA implements a no-first-use policy
    • The USA and China sign a missile launch notification agreement
    • China and the USA establish regular, high-level nuclear dialogue
  • India and Pakistan domain
    • India and Pakistan formalize their low-alert status and agree to maintain their ground-based nuclear weapons in a de-mated state
    • India and Pakistan establish a mechanism to conduct regular exchanges of information on nuclear and military matters
    • An India-Pakistan Nuclear Risk Reduction Center has been established
  • Korean Peninsula domain
    • The USA declares that it will not conduct left of launch attacks on North Korean nuclear command, control and communications systems
    • The United States establishes a liaison office in Pyongyang, North Korea, to facilitate communication, diplomacy, and engagement with the North Korean government
    • The USA and North Korea establish Track 1.5 diplomacy to facilitate regular dialogue and cooperation
  • Russia and NATO domain
    • Russia and the USA sign an arms control treaty succeeding New START
    • Russia and the USA agree on limits or bans for intermediate-range missiles
    • The USA eliminates its launch-on-warning posture
    • Russia eliminates its launch-on-warning posture
    • The USA decreases the role of nuclear weapons with a yield of less than 50 kt in its nuclear posture
    • Russia decreases the role of nuclear weapons with a yield of less than 50 kt in its nuclear posture
    • The USA increases the role of nuclear weapons with a yield of less than 50 kt in its nuclear posture
    • Russia increases the role of nuclear weapons with a yield of less than 50 kt in its nuclear posture

We asked participants to say whether (and how) the implementation of these policies would influence their forecast of the probability of nuclear catastrophe. We asked participants to only consider the causal effects of the policy, rather than what such a policy being implemented would say about the state of the world. For example, someone might reduce their forecast of nuclear catastrophe for two reasons if they knew an arms control agreement between the USA and Russia would be implemented: i) the limitations of the agreement might reduce the number of weapons available to use, or ii) the fact that the USA and Russia reached an agreement might signal improved relations between the countries. We asked participants to only include the first consideration, not the second.

In addition to forecasting the effect of the policies on the probability of nuclear catastrophe by 2045, we also asked participants to forecast the probability that, within the next three years, action would be taken to implement the policy. We asked participants to give an unconditional forecast, and to give their forecast of this probability conditional on a nonprofit team being given $500 million dedicated to getting the policy implemented.

Participants were asked to rank the policies according to how much they would like them to be implemented and how much they would like $500 million of funding to go to attempts to have the policy be implemented. We also asked about other effects (other than effects on the probability of a nuclear catastrophe) of the policies, including other positive and negative effects.

2.1.3 Reciprocal scoring

Because most of the questions participants were asked to forecast in this study won’t resolve for many years, we included some questions to give an earlier indication of participant accuracy. This included some questions that will resolve in 2026 (see Appendix 3), and some questions that asked participants to predict the forecasts of other participants. Specifically, we asked all participants to predict what the median expert in the study would forecast for the probability of nuclear catastrophe in 2045, and to predict the median expert’s forecast on five of the crux questions that will resolve in 2030.26 We used these results to generate “reciprocal scores” for each participant. Forecasts elicited this way can be as accurate as forecasts incentivized using comparisons to the truth.27

2.2 Recruitment

The two main participant groups for the survey were people with expertise relevant to nuclear weapons policy (we use the term experts for this group), and people with a strong track record of accurate forecasting.

We recruited subject matter experts through three channels: advertisement via relevant professional organizations,28 review of staff pages of websites of relevant organizations and author lists of relevant reports,29 and snowball sampling that asked prospective participants to nominate other people who may be appropriate participants. In an effort to capture viewpoint diversity, we asked people to nominate two people who they thought would largely agree with their views on nuclear risks and two people who they thought would largely disagree with them.

Ultimately, we emailed 514 subject matter experts (who we thought might meet our requirements) directly about the survey, and likely reached more through the general approaches to advertisement described above.

To ensure that our sample of subject matter experts reflected the population that would generally be considered “expert,” we required expert participants to have a minimum of five years of experience relevant to nuclear weapons policy (or two years and a relevant graduate degree). To ensure that our subject matter experts met this bar, we invited interested experts to register their interest in participating through completing a form that asked for details of their relevant education and professional experience. 239 people registered their interest in the study. Of these, 171 met the required level of experience and were invited to complete the surveys. 110 experts completed the first survey and all but one of these completed the second.

We recruited accurate forecasters by directly inviting so-called “superforecasters”—people who have been shown to be highly accurate forecasters and outperformed experts and intelligence analysts in large-scale forecasting tournaments held by the Good Judgment Project and subsequent forecasting exercises run by Good Judgment, Inc.30 A total of 55 superforecasters initially expressed interest in the study and were invited to participate. Of these, 41 completed the first survey, and 39 of these completed the second.

We also recruited members of the public to complete a shortened version of the surveys. These were previous participants in studies run by the Forecasting Research Institute, recruited via Facebook ads targeting people interested in global news, geopolitics, and other topics. We report some key findings from this public survey for the purpose of comparison with the responses from the expert and superforecaster participants. However, this report focuses on the results of the expert and superforecaster surveys.

2.3 Participant compensation

We provided an honorarium of $250 to expert and superforecaster participants who completed both surveys to compensate them for their time. Participants who chose to spend more time on the surveys were paid an additional $50/hour for self-reported hours of work above five hours, up to a maximum of 10 additional hours. Participants who spent more than 10 hours on the first survey were awarded an additional $100 for completing the second survey. We incentivized high quality engagement in the survey by offering additional monetary prizes, which will be awarded on the basis of the quality of text rationales given for forecasts and accuracy on some forecasting questions. Prior to the awarding of these additional prizes, participants received an average of $525 to compensate them for the time spent completing the two surveys.

For the shortened survey of the public, we recruited people who had been participants in previous studies run by the Forecasting Research Institute by email invitation. Members of the public received a payment of $20 for each of the two surveys they completed.

2.4 Survey engagement

Participants completed Survey 1 between March and May 2024 and Survey 2 between June and August 2024. There was generally a high level of engagement with the surveys. The median expert or superforecaster participant reported spending nine hours completing the two surveys. Superforecasters spent a longer time on the surveys than did experts. The median superforecaster spent approximately 13 hours on the surveys and the median expert spent around 7.5 hours. Superforecasters also wrote more words in their rationales (a median of roughly 4,900, compared to an expert median of roughly 3,800). Collectively, expert and superforecaster participants wrote over 747,000 words in their rationales.

3. Participants

Key points

  • 151 participants (110 experts and 41 superforecasters) completed the first survey and 148 (109 experts and 39 superforecasters) completed the second.
  • Experts largely worked in think tanks and academia, and had a median of nine years of relevant experience.
  • Participants were from 37 different countries, although around a quarter of experts and half of superforecasters were born in the USA.
  • 36% of experts think that nuclear deterrence is robust, 33% think it is fragile, and 30% think it is fragile at present but could be robust in the future.
  • 56% of experts think that nuclear escalation is very likely following a nuclear first strike, 16% think that escalation can be prevented, and 27% are very uncertain about whether escalation would occur.

A total of 151 participants completed the full first survey. This included 110 expert participants and 41 superforecaster participants. Of these, three participants (one expert and two superforecasters) did not complete the second survey, so a total of 148 participants completed both surveys (109 experts and 39 superforecasters). Here we present details of the 151 participants who completed at least the first survey.

3.1 Demographics

3.1.1 Age and gender

The age and gender breakdown of both participant groups are shown in Figure 3. The majority of participants identified as male, with 69% of expert participants and 93% of superforecasters identifying as male.31 Proportionally, the expert group was younger than the superforecasters, with 20–34 years old being the most common age category for experts, accounting for 40% of expert participants. The most common age category for superforecasters was 45–54 years old, which accounted for 29% of these participants.

Figure 3: Age and gender distribution of expert and superforecaster participants.
Figure 3: Age and gender distribution of expert and superforecaster participants.

3.1.2 Geographic region

The USA was the most common country of birth of participants. This was particularly true for the superforecaster group, 49% of whom were born in the USA. The USA was still the most common country of birth for experts, but it accounted for only 25% of the participants. The next most common country of birth was Pakistan, where 15% of expert participants were born. Figure 4 shows the most common countries of birth for expert participants. For more detail on the country of origin and country of residence of participants, please see Appendix 5.

Figure 4: Count of experts born in the most common countries of birth for expert participants.
Figure 4: Count of experts born in the most common countries of birth for expert participants.

3.2 Expertise

Expert participants were required to have a minimum of five years of experience working in a field relevant to nuclear weapons policy, or to have two years of relevant work experience as well as a relevant graduate degree (master’s or doctorate). The distribution of years of experience is shown in Figure 5.

Figure 5: Distribution of years of experience of experts.
Figure 5: Distribution of years of experience of experts.

We asked experts to list the organizations they were affiliated with. We then classified these organizations into several types. Figure 6 shows the distribution of these organizational affiliations. Academic institutions and think tanks were the most common type of affiliation. Many experts were affiliated with more than one type of organization.

We also asked experts about postgraduate education relevant to nuclear weapons policy. The majority of expert participants (86%) had a relevant postgraduate degree (master’s or PhD). The most common field of study was international relations, with 36 experts holding at least one graduate degree in this field. This was followed by security studies and political science. Many experts combined two or more of these fields of study.

Figure 6: Type of organizational affiliations and fields of study of experts.
Figure 6: Type of organizational affiliations and fields of study of experts.

3.3 Beliefs about contentious issues

We asked participants about four issues in order to capture important ideological differences about nuclear weapons policy:

  • The fragility / robustness of nuclear deterrence
  • The likelihood that a nuclear strike would be met with nuclear retaliation
  • The proliferation risk posed by nuclear energy
  • The desirability of complete nuclear disarmament

For each of these issues we asked participants to rank three statements representing different viewpoints on the issue. Two of these statements were intended to represent two opposing views and one was intended as a “middle ground” between the opposing views. As an example, the statements for the “nuclear deterrence” issue are shown in Box 3, and the full list of statements is available in Appendix 6.

Box 3: Statements for assessing views on nuclear deterrence

  • Opposing view 1: Nuclear deterrence is inherently fragile (easily shattered by human irrationality and chance events—so not a reliable safeguard against nuclear war).
  • Opposing view 2: Nuclear deterrence can be robust with clear communications, tight command and control, and mutually assured destruction.
  • Middle-ground view: Nuclear deterrence could be effective, but the current state of global communication, command and control systems, and weapon deployment are easily fallible, and so deterrence is not a safe system at present.

Figure 7 shows the distribution of experts and superforecasters who selected each of the statements as most representative of their views. There were no statistically significant differences between the proportions of the two groups choosing each statement (see Appendix 5 for details).

Figure 7: Proportion of respondents selecting each statement as the closest match (of the three) to their own view on four nuclear weapons policy issues: views on the nuclear deterrence, views on nuclear escalation risk, views on the goal of total disarmament, views on the proliferation risk of nuclear energy programs. See Appendix 6 (view PDF) for full issue statements.
Figure 7: Proportion of respondents selecting each statement as the closest match (of the three) to their own view on four nuclear weapons policy issues: views on the nuclear deterrence, views on nuclear escalation risk, views on the goal of total disarmament, views on the proliferation risk of nuclear energy programs. See Appendix 6 for full issue statements.

4. Forecasts of nuclear catastrophe risk

Key points

  • The median expert forecast for the probability of a nuclear weapons incident killing more than 10 million people before 2045 was 5%. The median superforecaster’s forecast was 1%, and the median member of the public’s forecast was 10%.
  • Conditional on a nuclear weapons catastrophe occurring by 2045, on average experts forecast a 26% probability that Russia and NATO would be the cause, roughly 20% for both the Korean Peninsula and India and Pakistan, and roughly 13% for both Israel and Iran, and China and the USA.
  • Violent conflict and new actors acquiring nuclear weapons were the events associated with the highest increase in risk.
  • For many participants many of the events wouldn’t influence risk including: an accidental non-test detonation, no-first use policies, summits between adversarial countries, and more.

Here, we present key findings from the forecasting components of the surveys. For most questions we present the median response from the expert and superforecaster participants. This represents the mid-point of the group responses; half the group’s responses are higher than this value and half are lower than this value.32

There are many ways to aggregate forecasts, but we choose the median because it is straightforward to calculate, transparent, robust to extreme outlying observations, and easier to understand than most other methods. Also, reassuringly, in previous work we have found that it is never the highest nor the lowest of several aggregation methods that were considered.33

4.1 Probability of nuclear catastrophe

Participants were asked to answer the following question:

“What is the probability that by 2045, one or more incidents involving nuclear weapons will cause the death of more than 10 million humans, within a five-year time period?”

The median expert forecast was 5% (IQR: 1–18.5%) and the median superforecaster response was 1% (IQR: 0.15–2.3%). There was substantial variation in forecasts within both groups, although this was more pronounced for experts, where the standard deviation was 18.4%, compared to 5.3% for superforecasters. The median forecast from our survey of the public was 10%, with a standard deviation of 24.9%.

Table 2 and Figure 8 summarize the responses to this question from experts, superforecasters and members of the public. Figure 9 shows the distribution of responses from experts and superforecasters. Figure 10 shows the proportion of expert and superforecaster responses that fall within different ranges.

GroupNumber of respondentsMedian forecastInterquartile range (IQR)Standard Deviation (SD)
Experts1105%1-18.5%18.4%
Superforecasters411%0.15-2.3%5.3%
The public40110%1-35%24.9%
Table 2: Summary of forecasts on the probability of nuclear catastrophe from experts, superforecasters, and the public.
Figure 8: Plots show forecasts of the probability of nuclear catastrophe by 2045. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 8: Plots show forecasts of the probability of nuclear catastrophe by 2045. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 9: Density plot showing the distribution of forecasts of the probability of nuclear catastrophe for expert and superforecaster participants. The dashed line shows the median forecast for each group. The x-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 9: Density plot showing the distribution of forecasts of the probability of nuclear catastrophe for expert and superforecaster participants. The dashed line shows the median forecast for each group. The x-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 10: Plot shows the proportion of respondents whose forecasts of the probability of nuclear catastrophe by 2045 fall into each of the ranges.
Figure 10: Plot shows the proportion of respondents whose forecasts of the probability of nuclear catastrophe by 2045 fall into each of the ranges.

Participants were also asked to provide a rationale for their forecasts. Respondents who forecasted a higher probability for nuclear catastrophe pointed to increased tensions and ongoing conflicts between nuclear powers, especially between Russia and NATO, China and the USA, and India and Pakistan. Many suggested that nuclear weapons proliferation, new military technologies, and weakening of international arms control agreements heighten risk. These rationales also expressed concerns that disinformation and misunderstanding could lead to escalation.

Rationales for lower probability estimates emphasized that there has been no use of nuclear weapons since 1945. They also argued that the doctrine of mutually assured destruction disincentivizes using nuclear weapons even in times of conflict, and that most decision makers are rational actors who wish to avoid catastrophic outcomes. According to some participants, a death toll of 10 million would require an extensive nuclear exchange where major cities are targeted, which they deemed unlikely. They also cited improvements in safety mechanisms, which reduce the likelihood of inadvertent and accidental use. A more detailed summary of the arguments provided for different ranges of forecasts is provided in Appendix 7.

4.2 Risk from specific adversarial domains

Participants were asked which of the adversarial domains would be most likely to have been the primary cause of a nuclear catastrophe (that killed more than 10 million people), if such a catastrophe were to occur before 2045. They were asked to allocate probabilities among five adversarial domains—Russia and NATO, China and the USA, the Korean Peninsula, India and Pakistan, and Israel and Iran—and an “Other” category. The average probability allocated to each domain is shown in Figure 11.34

Figure 11: Plot shows the average probability placed on domains being the primary cause of a nuclear catastrophe by 2045.
Figure 11: Plot shows the average probability placed on domains being the primary cause of a nuclear catastrophe by 2045.

We tested whether expert participants were more likely to forecast a higher probability that their chosen domain would be the primary cause of catastrophe. The results are shown in Table 3. Although there was a trend to give higher forecasts for three of the domains (all except India and Pakistan) when experts chose that domain, the only statistically significant difference was for experts choosing the Korean Peninsula domain.

Adversarial domainMedian forecast from experts choosing domainMedian forecast from experts not choosing domainP-value from Mann-Whitney U test*
China and the USA15% (n=12)10% (n=98)0.95
Russia and NATO25% (n=56)20% (n=54)0.55
India and Pakistan15% (n=27)15% (n=83)1
Korean Peninsula28% (n=15)20% (n=95)0.01
Table 3: Views of the probability of domain being most likely cause of catastrophe disaggregated by whether experts chose the domain. *This test was performed with a Bonferroni correction.

4.3 Risk pathways

We asked participants questions about the probability of various events occurring by 2030. These questions were intended to represent potential ideological cruxes, by which we mean questions whose answer would influence participants’ assessment of the risk of nuclear catastrophe by 2045. For each question, participants were asked for their forecast on the likelihood of the event occurring and to describe how their forecast of the probability of nuclear catastrophe by 2045 would change conditional on the event occurring and conditional on the event not occurring.35 The full list of questions is available in Appendix 3.

In response to these questions, some participants gave forecasts that were incoherent. For example, if a participant’s forecast of catastrophe conditional on an event occurring is lower than their unconditional forecast of catastrophe, then their forecast of catastrophe conditional on the event not occurring cannot also be lower than their unconditional forecast of catastrophe. (Similarly, the forecast of catastrophe conditional on the event occurring and conditional on the event not occurring cannot both be higher than the unconditional forecast of catastrophe). When respondents gave forecasts that were incoherent in this way, or contradicted their written rationales, we dropped these responses from the analysis. For this reason, the number of respondents varies between the questions. Fewer than 5% of forecasts analyzed in this report were dropped from the dataset due to incoherence. While 50 of the 151 respondents had at least one of their forecasts dropped, this is not surprising given the length of the survey, which required that participants submit forecasts and rationales for a median of 9 hours. For more detail on how we managed incoherent responses, please see Appendix 8.

Perhaps unsurprisingly, deliberate and inadvertent non-test nuclear weapons detonations were associated with a large increase in risk of nuclear catastrophe, as were violent conflicts between nuclear-armed states and horizontal proliferation of nuclear weapons to new actors. Here we discuss how forecasts of nuclear catastrophe by 2045 change conditional on these and other events that might influence risk.

4.3.1 Accidental, inadvertent, and deliberate non-test detonation

To understand views on different risk pathways, we asked participants how their forecast of nuclear catastrophe would change if they knew that an accidental, inadvertent, or deliberate non-test nuclear weapon detonation occurred by 2030. For these questions, we took our definition of the different types of non-test detonation from Barrett, Baum and Hostetler (2012).36 An accidental detonation is one where “system safeguards or procedures to maintain control over nuclear weapons fail in such a way that a nuclear weapon … explodes without direction from leaders.” An inadvertent detonation is one in which the attacking group “mistakenly concludes that it is under attack and launches nuclear weapons in what it believes is a counterattack.” A deliberate detonation is one in which “the attacking nation decides to attack based on accurate information about the state of affairs.” Figure 12 shows how forecasts on the probability of catastrophe change conditional on each type of detonation occurring by 2030.

Figure 12: Violin plots showing distribution of forecasts of the probability of nuclear catastrophe, unconditional and conditional on different types of non-test nuclear detonations occurring before 2030. The group median is shown in text. The thicker bar within each violin shows the interquartile range (25th to 75th percentile forecasts), and the thin line shows the range of forecasts minus outliers.
Figure 12: Violin plots showing distribution of forecasts of the probability of nuclear catastrophe, unconditional and conditional on different types of non-test nuclear detonations occurring before 2030. The group median is shown in text. The thicker bar within each violin shows the interquartile range (25th to 75th percentile forecasts), and the thin line shows the range of forecasts minus outliers.

Deliberate and inadvertent non-test detonations were associated with a large increase in forecasts of catastrophe. The median expert would increase their forecast of catastrophe by four times, conditional on a deliberate non-test detonation occurring. The median superforecaster would increase theirs by 6.7 times. Conditional on an inadvertent detonation, the median expert and the median superforecaster would triple their forecast. Figures 13 to 15 show how participants’ forecasts of nuclear catastrophe change conditional on a deliberate non-test detonation occurring by 2030.

Figure 13: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on a deliberate non-test detonation before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 13: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on a deliberate non-test detonation before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Non-test detonation event by 2030Expert
Median (IQR)
Superforecaster
Median (IQR)
N*Relative riskProbability of occurringN*Relative riskProbability of occurring
Deliberate874x
(1.5x – 31.3x)
1.0%
(0.1% – 10%)
376.7x
(1.65x – 16.7x)
0.5%
(0.1% – 2%)
Inadvertent883x
(1.3x – 18.5x)
1.3%
(0.1% – 10%)
373x
(1.5x – 10x)
0.1%
(0.01% – 0.5%)
Accidental871x
(1x – 2x)
1.0%
(0.01% – 10%)
371x
(1x – 1.1x)
0.05%
(0.01% – 0.3%)
Table 4: Relative risk of nuclear catastrophe conditional on different types of non-test nuclear detonations occurring before 2030 and the probability of this occurring. The median and interquartile ranges for experts and superforecasters are shown.
*N is the number of responses for relative risk. 110 experts and 39 superforecasters provided forecasts on the probability of the events occurring.
Figure 14: Plot shows the proportion of respondents who would increase, decrease, or not change their forecast of nuclear catastrophe, if they knew that different types of non-test detonations would occur by 2030.
Figure 14: Plot shows the proportion of respondents who would increase, decrease, or not change their forecast of nuclear catastrophe, if they knew that different types of non-test detonations would occur by 2030.
Figure 15: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for a deliberate non-test detonation falls into each of the ranges.
Figure 15: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for a deliberate non-test detonation falls into each of the ranges.

Perhaps more surprisingly, participants were split on the effects of an accidental non-test detonation on the risk of nuclear catastrophe. Figures 16 and 17 show how forecasts would change if this were to occur. Around 60% of superforecasters and 49% of experts would increase their probability of a nuclear catastrophe, but roughly 15% of participants would decrease their forecast. In rationales, those who decreased their forecast suggested that an accidental nuclear detonation could serve as a wake-up call that prompts greater action to reduce nuclear weapons risk. (For more detail on arguments, see Appendix 9.)

Figure 16: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on an accidental non-test detonation before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 16: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on an accidental non-test detonation before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 17: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for an accidental non-test detonation falls into each of the ranges.
Figure 17: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for an accidental non-test detonation falls into each of the ranges.

Most participants thought the probability of any type of non-test nuclear detonation before 2030 was quite low. The median expert forecasted a 1% chance of a deliberate non-test detonation occurring and a 1.3% chance of an inadvertent non-test detonation occurring. The median superforecaster thought these events were even less likely, forecasting 0.5% and 0.1% probabilities for deliberate and inadvertent non-test detonations occurring, respectively. Superforecasters also thought the probability of an accidental non-test detonation was very low, with a median probability of 0.05%. The median expert put the probability of an accidental detonation at 1%. Table 4 summarizes these results and views on the relative change in risk conditional on different types of detonations.

4.3.2 Key factors influencing risk: conflict and horizontal proliferation

The other events driving higher forecasts of nuclear catastrophe either involved conflict with nuclear-armed countries or the spread of nuclear weapons to new actors (horizontal proliferation).

In particular, participants believed conflict between Russia and NATO would increase risk. We asked about both conflict between Russia and the USA and conflict between Russia and a NATO member other than the USA. Conditional on conflict between Russia and the USA, the median participant of both groups would roughly triple their forecast of the risk of nuclear catastrophe. The median expert forecast a 5% probability of such conflict occurring before 2030, and the median superforecaster 1.8%. Both groups thought that conflict between Russia and a NATO member other than the USA was more likely. The median expert forecast a 10% chance of this happening before 2030, and the median superforecaster a 5.5% chance. The median expert thought this would triple the risk of nuclear catastrophe by 2045, and the median superforecaster that it would roughly double it.

Participants thought the probability of a Chinese invasion of Taiwan was more likely than other types of conflict, with the median expert forecasting a 25% probability of this happening before 2030, and the median superforecaster 19%. A Chinese invasion of Taiwan was also associated with a substantial increase in the probability of nuclear catastrophe, increasing by roughly 2.3 times for the median expert and roughly doubling for the median superforecaster.

The median expert also thought there was a 20% chance of violent conflict between India and Pakistan before 2030, which they thought would increase the risk by around 40%. Superforecasters were more skeptical of both the probability of this event occurring (median of 6.5%) and its importance for nuclear risk (with a median relative risk of 1, indicating no change in risk).

Event by 2030Expert
Median (IQR)
Superforecaster
Median (IQR)
N*Relative riskProbability of occurringNRelative riskProbability of occurring
500 militarized deaths between Russia and the USA503.1x
(1.5–11.8x)
5%
(1–10%)
362.8x
(1.5–12.5x)
1.8%
(0.6–5%)
500 militarized deaths between Russia and a different NATO country503x
(1.3–7.5x)
10%
(2.8–32.5%)
361.9x
(1.2–3.2x)
5.5%
(1.2–15%)
China invades Taiwan362.3x
(1–5.5x)
25%
(10–45%)
111.9x
(1.3–3.6x)
19%
(4–31.5%)
500 militarized deaths between North Korea and USA481.7x
(1–5x)
4%
(1–12.5%)
152x
(1.4–9x)
2%
(1–4%)
500 militarized deaths between China and the USA361.8x
(1–5x)
10%
(1.3–30%)
112x
(1.2–3.4x)
6%
(2.8–14%)
500 militarized deaths between North Korea and South Korea491.6x
(1–5x)
8%
(4.5–21.3%)
151.4x
(1.1–3x)
3.3%
(2–5.3%)
500 militarized deaths between India and Pakistan421.4x
(1–3.5x)
20%
(5–50%)
141x
(1–1.1x)
6.5%
(3.5–22.8%)
Table 5: Relative risk of nuclear catastrophe conditional on different types of violent conflict occurring before 2030 and probability of event occurring by 2030.
*N is the number of responses for relative risk. 110 experts provided forecasts on the probability of the events occurring.

The median expert thought that non-state actors acquiring nuclear weapons would double the probability of nuclear catastrophe. The median expert forecast a 1% probability of this occurring by 2030, and the median superforecaster a 0.3% probability. Both groups thought it more likely that Iran would acquire a nuclear weapon (with median forecasts of 25% and 30% for experts and superforecasters, respectively). The median expert’s forecast of nuclear catastrophe would increase by roughly 50% if this were to occur, and the median superforecaster’s by approximately 20%. For a more detailed discussion of the rationales participants gave for their responses to these crux questions, please see Appendix 10.

Event by 2030Expert
Median (IQR)
Superforecaster
Median (IQR)
N*Relative riskProbability of occurringNRelative riskProbability of occurring
A non-state actor acquires nuclear weapons882x

(1.2–10x)

1%

(0.002–5%)

391.8x

(1–5x)

0.3%

(0.1–1.4%)

Iran acquires nuclear weapons881.5x

(1.1–3x)

25%

(15–50%)

391.2x

(1.1–1.5x)

30%

(10– 50%)

Any state other than Iran acquires nuclear weapons891.3x

(1–2.4x)

1%

(0.3–10%)

391.2x

(1.1–1.7x)

5%

(1–10%)

Table 6: Relative risk of nuclear catastrophe conditional on different actors acquiring nuclear weapons before 2030 and probability of this occurring. The median and interquartile ranges for experts and superforecasters are shown.
*N is the number of responses for relative risk. 110 experts provided forecasts on the probability of the events occurring.

Several other cruxes (including the USA withdrawing from NATO or ROKUS, increasing entanglement of nuclear and non-nuclear forces, vertical proliferation, and states other than North Korea conducting weapons tests) were also associated with smaller increases in the risk of nuclear catastrophe. For more detail, see Appendix 11.

4.3.3 Factors that generally did not influence forecasts

Perhaps the most striking finding is that many of the participants wouldn’t change their forecast of nuclear risk if many of the cruxes were to occur. As discussed earlier, these cruxes included an accidental non-test detonation. They also included whether nuclear-armed states do or do not have no-first-use policies. Figure 18 shows how participants’ forecasts of nuclear catastrophe changed conditional on the USA having a no-first-use policy by 2030. Most participants thought that this wouldn’t affect risk at all. Some thought it would reduce risk, and a smaller number thought it would increase risk.

Figure 18: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on the USA having a no-first-use policy before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 18: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on the USA having a no-first-use policy before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.

Other potential cruxes that the median participant thought would not affect their forecast of catastrophe were:

  • Summits between adversarial countries
  • A terrorist attack in India that is blamed on Pakistan
  • A nuclear weapons test by North Korea
  • Ballistic missile submarines becoming more detectable
  • The US rejoining JCPOA or a similar agreement

As with the USA having a no-first-use policy, many participants did think that these cruxes would influence the risk of nuclear catastrophe, although there wasn’t consensus in which direction (see Figure 19). However, in most of these cases, a plurality of respondents thought that the event occurring would have no impact on the probability of nuclear catastrophe. Details on how participants responded to all of the cruxes are in Appendix 11.

Figure 19: Plot shows the proportion of respondents who would increase, decrease, or not change their forecast of nuclear catastrophe, if they knew that different events would occur by 2030. For each of the events in this plot, the median participant’s relative risk was one, indicating the event would not change their forecast of nuclear catastrophe.
Figure 19: Plot shows the proportion of respondents who would increase, decrease, or not change their forecast of nuclear catastrophe, if they knew that different events would occur by 2030. For each of the events in this plot, the median participant’s relative risk was one, indicating the event would not change their forecast of nuclear catastrophe.

5. Views on policies

Key points

  • Of the policies we asked about, the two most popular amongst experts and superforecasters were:
    • A secure multilateral crisis communications center
    • All nuclear-armed states conducting failsafe reviews
  • These two policies were thought to be the most effective in reducing the risk of a nuclear catastrophe, had the best average ranks, and had at least a 20% chance of being implemented, conditional on funding aimed at their implementation.
  • If six policies were to be implemented, the median expert thought that the risk of nuclear catastrophe could be halved, and the median superforecaster thought it could be reduced by 40%.

We investigated views on 23 different policies that have been suggested as mechanisms to reduce the risk of nuclear catastrophe. We included six general policies (i.e., not specific to any adversarial domain) and 17 domain-specific policies. The full description of the policies is in Appendix 4.

As different domain-specific policies were answered by different participants, and the sample size for each of these domain-specific policies was relatively small, we advise caution in interpreting the domain-specific policy results and particularly in making comparisons across the adversarial domains. For this reason, we present the results of the general policies and the domain-specific policies separately.

5.1 General policies

5.1.1 Policy impact on risk of nuclear catastrophe

Most expert participants thought each of these six policies would reduce the risk of a nuclear catastrophe by between 9% and 25% relative to their unconditional forecast. In general, compared to experts, superforecasters thought policies would make less of a difference to nuclear risk. The median superforecaster thought most policies would reduce risk by between 3% and 15%. Figures 20 and 21 show how the median expert and median superforecaster’s estimate of the probability of nuclear catastrophe changes when considering each of these six policies. We also asked participants how their forecast of the probability of nuclear catastrophe would change if all of these six policies were implemented. Although this scenario is very unlikely to occur, it gives an indication of how nuclear risk could change if drastic actions were taken. The median participant thought these six policies implemented together could halve the probability of nuclear catastrophe by 2045. Table 7 shows the median participants’ views on the effectiveness of the six general policies in terms of relative and absolute changes in risk.

Figure 20: Median relative change in probability of nuclear catastrophe conditional on policy implementation, for experts (blue) and superforecasters (orange). The difference between the height of the bar associated with the policy and the bar labeled “unconditional” represents relative reduction in risk of nuclear catastrophe associated with the policy by the median expert (left) and superforecaster (right). E.g., the median expert would reduce their forecast of nuclear catastrophe by 25% if a crisis communications network were to be established.
Figure 20: Median relative change in probability of nuclear catastrophe conditional on policy implementation, for experts (blue) and superforecasters (orange). The difference between the height of the bar associated with the policy and the bar labeled “unconditional” represents relative reduction in risk of nuclear catastrophe associated with the policy by the median expert (left) and superforecaster (right). E.g., the median expert would reduce their forecast of nuclear catastrophe by 25% if a crisis communications network were to be established.
Figure 21: Violin plots showing distribution of relative risk associated with each policy. The relative risk is the relative reduction in probability of nuclear catastrophe conditional on policy implementation. The group median relative risk is shown in text. The thicker bar within each violin shows the interquartile range (25th to 75th percentile forecasts), and the thin line shows the range of forecasts minus outliers.
Figure 21: Violin plots showing distribution of relative risk associated with each policy. The relative risk is the relative reduction in probability of nuclear catastrophe conditional on policy implementation. The group median relative risk is shown in text. The thicker bar within each violin shows the interquartile range (25th to 75th percentile forecasts), and the thin line shows the range of forecasts minus outliers.
Expert

Median forecast (IQR)

Superforecaster

Median forecast (IQR)

PolicyN*Relative change in riskAbsolute change in riskNRelative change in riskAbsolute change in risk
AI risk assessment940.8x
(0.6–0.96x)
0.5p.p.
(0–3p.p.)
370.93x
(0.83–0.98x)
0.04p.p.
(0.004–0.2p.p.)
Crisis communications network960.75x
(0.49–0.9x)
0.85p.p.
(0.02–4.4p.p.)
370.85x
(0.71–0.95x)
0.1p.p.
(0.02–0.65p.p.)
CTBT is ratified950.9x
(0.67–1x)
0.05p.p.
(0–3p.p.)
370.95x
(0.8–1x)
0.01p.p.
(0–0.25p.p.)
Failsafe reviews960.8x
(0.61–0.94x)
0.5p.p.
(0–2p.p.)
370.9x
(0.75–0.98x)
0.1p.p.
(0.01–0.4p.p.)
FMCT is signed950.91x
(0.78–1x)
0.01p.p.
(0–2p.p.)
370.97x
(0.89–1x)
0.01p.p.
(0–0.1p.p.)
USA removes ‘sole authority’960.9x
(0.75–1x)
0.1p.p.
(0–1p.p.)
370.97x
(0.83–1x)
0.02p.p.
(0–0.13p.p.)
All six policies together960.5x
(0.2–0.69x)
2p.p.
(0.18–5p.p.)
370.58x
(0.25–0.76x)
0.3p.p.
(0.09–1.5p.p.)
Table 7: Views on effects of policies on probability of nuclear catastrophe. The median and interquartile range (25th percentile and 75th percentile estimates) are shown for the relative risk (or relative change in risk) and the absolute risk reduction (in percentage points (p.p.)).
*The values in this column show the number of experts whose data was used in the relative risk summary statistics. The number for the absolute change in risk is higher by two (as two experts gave an unconditional forecast of catastrophe of zero).

Some respondents thought that some of the policies would increase risk. Of the general policies we asked about, the USA removing its “sole authority” policy that allows the President to launch a nuclear weapon without approval from others was most often thought to increase risk. 8% of experts and 11% of superforecasters thought this policy would increase risk. Rationales from these respondents argued that the policy would compromise the USA’s ability to act quickly in times of crisis, and would reduce the credibility of US nuclear deterrence (see Appendix 12 for more detail). Several expert forecasters believed that all six policies being implemented would increase risk, as this would be a destabilizing change. Table 8 and Figure 22 show the proportion of respondents who thought that each policy would decrease, increase, or not affect risk.

ExpertSuperforecaster
PolicyNDecrease riskNot affect riskIncrease riskNDecrease riskNot affect riskIncrease risk
AI risk assessment9478%22%0%3786%14%0%
Crisis communications network9689%11%0%3795%5%0%
CTBT is ratified9561%39%0%3768%32%0%
Failsafe reviews9678%22%0%3795%5%0%
FMCT is signed9558%42%0%3765%35%0%
USA removes ‘sole authority’9664%28%8%3762%27%11%
All six general policies9693%4%3%37100%0%0%
Table 8: Proportion of respondents who thought the policy would decrease risk (relative risk < 1), not affect risk (relative risk = 1), and increase risk (relative risk > 1).
Figure 22: Plot shows the proportion of respondents who would increase, decrease, or not change their forecast of nuclear catastrophe, conditional on policies being implemented.
Figure 22: Plot shows the proportion of respondents who would increase, decrease, or not change their forecast of nuclear catastrophe, conditional on policies being implemented.

5.1.2 Probability of policy implementation and effects of funding

We asked participants to forecast the probability that each of these policies would be implemented. When participants were forecasting the effects of the policies, we asked them to make their forecast as if work to implement the policy would begin immediately. The description of each policy included a date by which implementation would be complete. When we asked participants to forecast the probability that the policy would be implemented, we pushed this date back by three years. So, we asked participants to forecast the probability that the policy would be implemented by three years later than the original date we described when asking participants to forecast the effects of policies. We asked for an unconditional forecast and a forecast conditional on a nonprofit team being given $500 million with the goal of getting the policy implemented. Table 9 and Figure 23 summarize these results.

PolicyExpert
Median (IQR)
Superforecaster
Median (IQR)
NProbability implementedFunding multiplierNProbability implementedFunding multiplier
BaselineWith fundingBaselineWith funding
AI risk assessment10220%
(10–40%)
30%
(10–53.8%)
1.5x
(1.1–1.9x)
399%
(3–21.5%)
12%
(4–31.5%)
1.3x
(1.1–1.5x)
Crisis comms. network10415%
(5–30%)
25%
(10–50%)
1.4x
(1.1–1.8x)
3910%
(4–25%)
18%
(6–36.5%)
1.3x
(1.1–2x)
CTBT is ratified976.5%
(1–15%)
10%
(2.6–28.8%)
1.3x
(1–2x)
365%
(1–9.5%)
7%
(2–10%)
1.1x
(1–1.4x)
Failsafe reviews10315%
(10–30%)
30%
(15–50%)
1.5x
(1.2–2x)
387%
(3.5–16%)
10%
(5.1–23%)
1.4x
(1.1–2x)
FMCT is signed975%
(1–10%)
6.5%
(2–15.8%)
1.2x
(1–1.8x)
385%
(1–13.5%)
9%
(2–15%)
1.1x
(1–1.5x)
USA removes sole authority9810%
(3–20%)
20%
(6–35%)
1.5x
(1.2–2x)
354%
(0.5–10%)
5%
(1–16.5%)
1.3x
(1–1.9x)
Table 9: Views on the probability of policies being implemented (with implementation on a time scale that suggests a decision to implement is made within the next three years). The table shows the unconditional forecast of policy implementation and the forecast conditional on $500 million of funding going to a nonprofit tasked with getting the policy implemented. The median and interquartile range (25th percentile and 75th percentile estimates) are shown for each value.
Figure 23: Violin plots showing distribution of forecasts of the probability of policies being implemented, unconditionally (top) and conditional on $500 million in funding being provided to a non-profit group with the goal of getting the policy implemented (bottom). The thick line next to each violin shows the interquartile range (25th to 75th percentile forecasts), the thin line shows the range of forecasts minus outliers, and the text shows the median forecast.
Figure 23: Violin plots showing distribution of forecasts of the probability of policies being implemented, unconditionally (top) and conditional on $500 million in funding being provided to a non-profit group with the goal of getting the policy implemented (bottom). The thick line next to each violin shows the interquartile range (25th to 75th percentile forecasts), the thin line shows the range of forecasts minus outliers, and the text shows the median forecast.

Beliefs about the probability that these policies would be implemented vary widely, both among participants and among policies. However, the median expert thought that, for the three policies believed to be most effective in reducing risk (crisis communications, failsafe reviews, and AI risk), there was at least a 15% chance of the policy being implemented. The median superforecaster thought there was at least a 7% probability that these policies are implemented. Both groups thought that funding could make a meaningful difference to the probability a policy is implemented. The median expert thought the probability could increase by 20-50% depending on the policy, and the median superforecaster thought it could increase by 11-41%.

5.1.3 Policy ranking

We asked participants to rank the policies they were shown in two ways: first, by how much they would like the policy to be implemented, and second, by how much they would like $500 million to go to a hypothetical nonprofit aiming to have the policy implemented. When ranking for funding, participants were asked to consider the effects of the policy, probability the policy would be implemented, and the difference funding would make.

Table 10 shows the average rank and the proportion of participants who ranked each policy within the top three, for implementation and for funding. These values include the domain-specific policies in the ranks (rank is out of nine).

Experts
(N = 109)
Superforecasters
(N = 39)
Average rankRank in top 3Average rankRank in top 3
PolicyTo implementFor fundingTo implementFor fundingTo implementFor fundingTo implementFor funding
AI risk assessment4.63.639%52%4.73.933%46%
Crisis comms. network3.32.958%70%3.02.462%77%
CTBT is ratified4.34.741%28%4.74.833%26%
Failsafe reviews3.83.550%61%3.73.351%62%
FMCT is signed5.25.728%17%4.94.931%18%
USA removes sole authority6.76.313%11%6.06.428%18%
Table 10: Results of ranking exercises. Participants were asked to rank the policies in two ways: by how much they would like the policy to be implemented and by how much they would like $500 million in funding to go to a nonprofit that had the goal of getting the policy implemented. For both types of ranking we show the average rank for experts and superforecasters and the proportion of each group who ranked the policy within the top 3.

Figures 24 and 25 show experts’ and superforecasters’ ranking of the six general policies by how much participants would like to see funding go towards having the policy implemented. Ranks closer to one indicate a stronger preference for that policy relative to others. Participants also ranked the three domain-specific policies they answered questions on. The ranks shown in Figures 24 and 25 exclude those domain-specific policies (and so ranks are shown out of six).

Figure 24: Experts’ ranking of the six general policies when considering where they would like $500 million funding to go to a nonprofit group who has the goal of getting the policy implemented. These are listed in order of average rank (most favored to least favored). The values inside the squares show the proportion of expert respondents who gave the policy that rank out of the six general policies.
Figure 24: Experts’ ranking of the six general policies when considering where they would like $500 million funding to go to a nonprofit group who has the goal of getting the policy implemented. These are listed in order of average rank (most favored to least favored). The values inside the squares show the proportion of expert respondents who gave the policy that rank out of the six general policies.
Figure 25: Superforecasters’ ranking of the six general policies when considering where they would like $500 million funding to go to a nonprofit group who has the goal of getting the policy implemented. These are listed in order of average rank (most favored to least favored). The values inside the squares show the proportion of expert respondents who gave the policy that rank out of the general policies.
Figure 25: Superforecasters’ ranking of the six general policies when considering where they would like $500 million funding to go to a nonprofit group who has the goal of getting the policy implemented. These are listed in order of average rank (most favored to least favored). The values inside the squares show the proportion of expert respondents who gave the policy that rank out of the general policies.

Of the six general policies, two were clearly favored by both groups of participants: establishing a crisis communications network, and all nuclear-armed countries conducting failsafe reviews. These policies were generally seen as more effective in reducing risk and more likely to be implemented than others. These policies were ranked within the top three by more than half of all participants. When asked about which policies they’d prefer to see funding go towards, these two policies were ranked within the top three by at least 60% of participants in both groups.

5.2.1 Crisis communications network

This policy would see a secure multilateral crisis communications network established with all nuclear-armed states participating. The full details of the policy (as it was described to participants) is available in Appendix 4. In summary, the policy would see that a secure multilateral crisis communication network (such as the proposed CATALINK network)37 is established. The network would be encrypted and robust to threats, and would allow for direct leader-to-leader communication, with the ability to conduct bilateral or multilateral communications. All nuclear-armed states would be actively participating in the network. This policy would build on existing “hotlines” between adversaries by providing a more secure connection that was more actively maintained and allowed for multilateral communications.

Effect of policy

This policy was associated with the largest relative risk reduction for both the median expert and the median superforecaster. Figure 26 shows the distribution of relative risk reduction attributed to this policy. It shows that roughly 80% of superforecasters and 60% of experts thought that this policy would reduce risk by up to 50%. Figure 27 shows how each individual participants’ forecast would change conditional on the policy being implemented.

Figure 26: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for the crisis communications network policy falls into each of the ranges.
Figure 26: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for the crisis communications network policy falls into each of the ranges.
Figure 27: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on a crisis communications network being established. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 27: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on a crisis communications network being established. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.

When explaining the rationales for their forecasts, participants who thought this policy would likely reduce risk substantially cited the importance of secure and prompt communication to prevent misunderstandings and the benefits of building trust and transparency between states. Many participants pointed to the Cuban Missile Crisis as an example of the importance of effective communication during crises. Participants who were less optimistic about the policy’s effects suggested that some states may not use the network (as some states have not used bilateral hotlines), or may misuse it to spread misinformation or sow confusion. Some also noted that this policy doesn’t affect the underlying drivers of nuclear risk. For more details of arguments made in rationales, see Appendix 12.

Probability of policy implementation

The median superforecaster thought that, of the six general policies, a crisis communications network was the most likely to be implemented (with or without funding). They forecast a 10% probability of the policy being implemented, with that figure rising to 18% with dedicated funding. Experts were more optimistic about the chances of the policy being implemented, with a median unconditional forecast of 15%, which rose to 25% with funding. Figure 28 shows how forecasts of the probability of the policy being implemented change with funding.

Participants who gave higher forecasts for the probability of this policy being implemented suggested that this policy is a sensible and relatively easy step to take, especially as it builds on existing bilateral hotlines. Those who gave lower forecasts suggested that existing tensions would make cooperation difficult, especially as some states favor policies of strategic ambiguity. Some suggested that having all nuclear-armed states participate was a high bar.

Figure 28: Plots show forecasts of the probability of a crisis communications network being established, unconditionally and conditional on $500 million funding going to a hypothetical nonprofit with the goal of getting the policy implemented. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose.
Figure 28: Plots show forecasts of the probability of a crisis communications network being established, unconditionally and conditional on $500 million funding going to a hypothetical nonprofit with the goal of getting the policy implemented. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose.

5.2.2 Failsafe reviews

This policy would see all national governments of nuclear-armed states establish a review mechanism to identify risks of inadvertent or accidental nuclear use and develop plans for mitigation of these risks. This would include, but is not limited to, false alarms, technical malfunctions, and human error. Each government would conduct an analysis to identify potential pathways to unintentional launches of nuclear weapons or false perceptions of being under attack, and would then consider areas of intervention to reduce the likelihood of these outcomes. The main focus should be on measures the government can take unilaterally to reduce risk from their systems. But if the reviews identify opportunities for risk mitigation that require multilateral action, these should be proposed to other nations’ governments. This policy was loosely based on the “independent review of the safety, security, and reliability of U.S. nuclear weapons, NC3, and integrated tactical warning/attack assessment systems” announced in the US 2022 Nuclear Posture Review.38 The description of this policy provided to respondents is available in Appendix 4.

Effect of policy

This policy was associated with the second largest relative risk reduction for both the median expert and the median superforecaster. Figure 29 shows how the distribution of forecasts on the probability of nuclear catastrophe shifts conditional on this policy being implemented. Figure 30 shows how each individual participants’ forecast would change conditional on the policy being implemented.

Figure 29: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for the Failsafe Reviews policy falls into each of the ranges.
Figure 29: Plot shows the proportion of respondents whose relative risk (relative change in risk of nuclear catastrophe) for the Failsafe Reviews policy falls into each of the ranges.

Rationales for forecasts indicated that many participants thought that failsafe reviews could reduce the risk of accidental or inadvertent nuclear detonations and, by identifying ways to improve decision-making processes, reduce the risk of deliberate escalation during a crisis. Those who thought this policy would do little to reduce nuclear risk suggested that this policy primarily addresses accidental risks, which contribute little to nuclear risk, and do not affect risks of deliberate use. They also noted that nuclear-armed states regularly conduct their own reviews and that the quality of the reviews could vary between states. For more details of arguments made in rationales, see Appendix 12.

Figure 30: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on the Failsafe Reviews policy being implemented before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 30: Plot shows how individual respondents’ forecasts of nuclear catastrophe would change conditional on the Failsafe Reviews policy being implemented before 2030. The blue and orange dots show the baseline forecasts, and the tips of the arrows show the forecast conditional on the event. Yellow dots indicate forecasts that would not change conditional on the event. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Probability of policy implementation

The median expert forecasted a 15% probability of this policy being implemented, with this increasing to 30% with funding. Superforecasters forecasted a 7% probability unconditionally and a 10% probability with funding. Figure 31 shows how the distribution of forecasts of the probability of the policy being implemented changes with funding.

Those who were more optimistic about this policy being implemented suggested that this is a relatively uncontroversial policy that is low-risk and provides a way for states to demonstrate responsibility. Many also suggested that wariness about cyber threats and the impact of AI might motivate states to conduct such a review. Those who were less optimistic suggested that countries like North Korea and Israel are unlikely to participate and that there is little incentive for states to agree to conduct a review.

Figure 31: Plots show forecasts of the probability of the Failsafe Reviews policy being implemented, unconditionally and conditional on $500 million funding going to a hypothetical nonprofit with the goal of getting the policy implemented. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose.
Figure 31: Plots show forecasts of the probability of the Failsafe Reviews policy being implemented, unconditionally and conditional on $500 million funding going to a hypothetical nonprofit with the goal of getting the policy implemented. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose.

5.3 Domain-specific policies

As different domain-specific policy questions were answered by different participants, and the sample size for each of these domain-specific policies was relatively small, we advise caution in interpreting these results, and particularly in making comparisons across the adversarial domains.

While noting the need for caution with these results, there are a few findings worth commenting on. Participants generally ranked domain-specific policies less favorably than general policies. The average rank for general policies was 4.6, and the average rank for domain-specific policies was 5.8 (when ranking for funding the values were 4.4 and 6.2). However, there were some exceptions to this.

Among the 25 experts who were shown the policy of Russia and the USA signing an arms control agreement similar to New START, the average rank was 3.5, slightly lower than the average rank for the failsafe review policy (3.8). The median expert thought this policy would reduce the probability of catastrophe by 20%, and that it had a 20% probability of being implemented, with this increasing to 25% with funding.

Among the 10 experts who answered on the China and the USA domain, the policy that would see the establishment of regular high-level dialogue between the USA and China was notably popular. The median expert in this group thought the policy would reduce the risk of nuclear catastrophe by 25% and had a 45% probability of being implemented (rising to 55% with additional funding). These experts gave it an average rank of 3.4 (or 4 when ranking for funding).

More detail on the rationales provided for forecasts relating to these two policies is available in Appendix 13. Domain-specific policy results are provided in more detail in Appendix 14 and quantitative forecasts from all policies are provided in Appendix 15. We also asked participants which policies they would have liked to see included in the survey. The responses are in Appendix 16.

6. Factors influencing forecasts

Key points

  • Compared to experts, superforecasters generally gave lower forecasts for the probability of nuclear catastrophe and thought that policies would make less difference to the risk and were less likely to be implemented.
  • There was no significant difference in forecasts of nuclear catastrophe for different age groups or, for experts, years of experience in the nuclear weapons field. However, experts who had more experience were more likely to be skeptical about the impacts of policies.
  • Participants who were better at predicting the median expert forecast of catastrophe generally gave lower forecasts of catastrophe. However, participants who were better at predicting the median expert forecast of 2030 crux questions generally gave higher forecasts of catastrophe.

6.1 Demographics and expertise

As noted earlier, superforecaster participants put lower probabilities on nuclear catastrophe by 2045 than did subject matter experts. Their median forecast was 1%, compared to the median expert forecast of 5%.

Compared to superforecasters, experts also thought that policies would make a greater difference to the probability of nuclear catastrophe. To get a general indication of a participant’s beliefs about how much of a difference policies could make, we took the average of the relative risks assigned to the general policies (i.e., not including the domain-specific policies), except for the policy that would see the USA relinquish the US President’s sole authority to launch nuclear weapons (Sole Authority). Given that many participants thought this policy would increase risk, we excluded it from this calculation. When taking the average of the relative risk reduction for these five general policies, the median expert result is 0.75 (indicating a 25% reduction in risk), while the median superforecaster result is 0.82 (indicating an 18% reduction in risk).

There was no significant difference in forecasts of nuclear catastrophe for different age groups (including when superforecasters are excluded from the sample) or, for experts, years of experience in the nuclear weapons field (see figures 32 and 33). However, experts who had more experience were more likely to be skeptical about the impacts of policies, with higher average relative risk for the general policies (excluding Sole Authority) (see Figure 33).

Figure 32: Forecasts on the probability of catastrophe disaggregated by age group (experts only). The median forecast is provided in text. The 18–25 age group for experts is not displayed in the chart due to insufficient sample size (n=1), which does not meet the minimum threshold (n=3) for inclusion in the boxplot visualization. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 32: Forecasts on the probability of catastrophe disaggregated by age group (experts only). The median forecast is provided in text. The 18–25 age group for experts is not displayed in the chart due to insufficient sample size (n=1), which does not meet the minimum threshold (n=3) for inclusion in the boxplot visualization. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 33: Correlation between years of relevant experience and forecast of probability of catastrophe (left) and average relative risk for five general policies (minus Sole Authority) (right). This only shows data from experts.
Figure 33: Correlation between years of relevant experience and forecast of probability of catastrophe (left) and average relative risk for five general policies (minus Sole Authority) (right). This only shows data from experts.

There was a trend toward participants who reported affiliation with government giving lower forecasts of the probability of nuclear catastrophe, and those working in advocacy organizations giving higher responses. However, given the small sample sizes, these differences were not statistically significant (see Figure 34, and Appendix 17 for further detail). There was no difference in average relative risk for the general policies, and no clear trend across organizations (see Appendix 17).

Figure 34: Forecasts on the probability of catastrophe disaggregated by type of affiliated organization (experts only). The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 34: Forecasts on the probability of catastrophe disaggregated by type of affiliated organization (experts only). The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.

6.2 Beliefs about contentious issues

We analyzed how participants’ stated beliefs about contentious issues in nuclear weapons policy interacted with their forecast of the probability of nuclear catastrophe by 2045. Experts who thought that deterrence is inherently fragile and that nuclear escalation is very likely had higher median forecasts of nuclear catastrophe, but the difference was not statistically significant. Superforecasters showed the opposite pattern, but this was also not statistically significant. Figures 35 and 36 show the distribution of forecasts disaggregated by views on nuclear deterrence and the likelihood of nuclear escalation.

Figure 35: Forecasts on the probability of catastrophe disaggregated by beliefs about the fragility / robustness of nuclear deterrence. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 35: Forecasts on the probability of catastrophe disaggregated by beliefs about the fragility / robustness of nuclear deterrence. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 36: Forecasts on the probability of catastrophe disaggregated by beliefs about the likelihood of nuclear escalation after an initial nuclear strike. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 36: Forecasts on the probability of catastrophe disaggregated by beliefs about the likelihood of nuclear escalation after an initial nuclear strike. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.

Table 11 compares the mean (average) rank given to the two most popular policies by participants who had selected the opposing statements for the contentious policy issues. Superforecasters who agreed with the statement that deterrence is inherently fragile ranked the crisis communications network significantly more favorably than did superforecasters who agreed with the statement that deterrence is robust. There was no significant difference (or any clear trend) in the average relative risk of the five general policies (minus Sole Authority) across the different views on the contentious policy issues (see Appendix 17 for details).

Statement selected as closest match to views39Crisis communications NetworkFailsafe reviews
ExpertSuperforecasterExpertSuperforecaster
Mean rankp-valueMean rankp-valueMean rankp-valueMean rankp-value
Deterrence
Deterrence is inherently fragile3.420.751.670.0054.330.053.330.31
Deterrence is robust3.263.473.414.12
Escalation
Nuclear escalation is very likely3.530.152.710.614.070.053.770.85
Nuclear escalation can be prevented2.833.093.113.91
Table 11: Mean rank (out of 9) for the two most popular policies, disaggregated by beliefs on contentious issues.
*The p-values compare the responses for the participants who selected the two responses as being closest to their views (it does not compare experts and superforecasters). The p-value is derived from Welch’s t-test, evaluating for difference in the distribution of rank between respondents choosing each statement, within expert and superforecaster groups.

6.3 Reciprocal scores

As mentioned, participants were asked to predict what the median expert in the study would forecast for the probability of nuclear catastrophe by 2045, and to predict the median expert’s forecast on five of the crux questions that will resolve in 2030. We used these results to generate “reciprocal scores” for each participant.

We found that, compared to experts, superforecasters had better reciprocal scores when predicting the expert median forecast of nuclear catastrophe, but that experts had better reciprocal scores when predicting the expert median forecast of the crux questions. So, the superforecasters were generally better at predicting what experts would say about the probability of a nuclear catastrophe by 2045. But experts were better at predicting experts’ views on whether different events related to nuclear risk will or won’t occur by 2030. To calculate each participant’s accuracy on this “reciprocal scoring” exercise, we rank each participant’s accuracy on each question and then average their accuracy rank across each question. Figure 37 shows the ranking of reciprocal scores for forecasts of catastrophe and the ranking of reciprocal scores for forecasts of the crux questions. Ranks closer to one indicate greater accuracy on these questions. Because there were 151 participants who filled out Survey 1, the worst possible rank a participant could receive is 151: this would mean that on all questions, they were less accurate than all other participants at predicting the group’s beliefs.

Figure 37: Reciprocal scores for predicting the expert median forecast of the probability of nuclear catastrophe by 2045 (left) and for predicting the expert median forecasts for the resolution of crux questions (right). Lower scores indicate greater accuracy. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose.
Figure 37: Reciprocal scores for predicting the expert median forecast of the probability of nuclear catastrophe by 2045 (left) and for predicting the expert median forecasts for the resolution of crux questions (right). Lower scores indicate greater accuracy. The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose.

Figure 38 shows how forecasts of nuclear catastrophe vary according to reciprocal scores. Expert participants who were within the bottom third, in terms of ability to predict the expert median forecast of nuclear catastrophe, gave much higher forecasts of nuclear catastrophe than did others (a median of 25%, compared to 2% for the top third of performers and 0.5% for the middle third. However, experts who were better at predicting expert forecasts of crux resolution generally gave higher forecasts for the probability of nuclear catastrophe by 2045. In superforecasters, there was a slight trend in the opposite direction: the superforecasters who were better at predicting experts’ crux forecasts generally gave slightly lower forecasts for the probability of catastrophe (see Figure 38).

Figure 38: Probability of catastrophe predictions from the top, middle, and bottom third of expert and superforecaster reciprocal scoring performers (based on both probability of catastrophe reciprocal scoring accuracy and crux question reciprocal scoring accuracy. Each groups’ top, middle and bottom thirds are determined within-group (i.e., the top third of experts is composed of the best performers from the Expert camp, even if some of the same individuals wouldn’t rank in the top third of overall performers). The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.
Figure 38: Probability of catastrophe predictions from the top, middle, and bottom third of expert and superforecaster reciprocal scoring performers (based on both probability of catastrophe reciprocal scoring accuracy and crux question reciprocal scoring accuracy. Each groups’ top, middle and bottom thirds are determined within-group (i.e., the top third of experts is composed of the best performers from the Expert camp, even if some of the same individuals wouldn’t rank in the top third of overall performers). The median forecast is provided in text. The boxes show the 25th–75th percentile forecasts, and the lines the range of forecasts minus outliers. We jitter the data points horizontally to allow for better visualization of the distribution of forecasts. Horizontal variation within each group serves no other empirical purpose. The y-axis uses a logarithmic scale to informatively show variation in forecasts in the 0–10% range.

We also found that expert participants who performed better at predicting experts’ forecasts of crux resolution tended to forecast a higher probability that policies would be implemented. Conversely, experts who performed better at predicting forecasts of nuclear catastrophe tended to forecast a lower probability that policies would be implemented (Figure 39). There was no significant correlation between superforecasters’ reciprocal scores and their views on the likelihood of policy implementation. There was also no significant correlation between reciprocal scoring ranks (of either group) and views on the effectiveness of policies (see Appendix 17 for more detail).

Figure 39: Correlation between average forecasts of policy implementation (five general policies, excluding “Sole Authority”) and reciprocal scoring ranks. Reciprocal scoring rank for predicting the median expert’s forecast of nuclear catastrophe is shown on the left and reciprocal scoring rank for predicting the median expert’s forecasts of crux resolution is shown on the right. This only shows data from experts.
Figure 39: Correlation between average forecasts of policy implementation (five general policies, excluding “Sole Authority”) and reciprocal scoring ranks. Reciprocal scoring rank for predicting the median expert’s forecast of nuclear catastrophe is shown on the left and reciprocal scoring rank for predicting the median expert’s forecasts of crux resolution is shown on the right. This only shows data from experts.

While we do have some evidence that accuracy in predicting the aggregate forecast of a group is predictive of actual accuracy on other geopolitical forecasting questions,40 this evidence base is limited and may not extend to longer-run questions like the ones in this study. As the questions from this study resolve, the results will inform our understanding of this method of assessing forecasting accuracy.

7. Limitations

This study has several important limitations. Despite active efforts to recruit a diverse group of expert participants, the final sample was disproportionately from the USA, and to a lesser extent western Europe. Although we had some success with efforts to recruit participants from South Asia, there were very few participants from eastern Europe and eastern Asia. Given the importance of Russia, China, and North and South Korea to the global nuclear weapons situation, it is disappointing that we have few participants with direct experience within these countries represented in the survey. If we were to conduct a similar survey in the future, we would consider partnering with organizations with connections in these countries, and we would consider translating the survey. The survey was only available in English, which was likely an important factor limiting participation from some regions.

Our sample also largely represented experts from academia and think tanks. Although some participants had experience in government, they were in the minority. Future surveys could be strengthened by increased efforts to engage experts within government and the military to understand the perspectives of these groups.

More generally, the number of participants limited some of the conclusions we could draw from this data. Although this was the largest forecasting study of nuclear weapons experts, we would need a bigger sample to determine whether some of the trends we identified are statistically significant (rather than due to chance). This was most notably an issue for the questions on the effects of policies. As we allowed participants to choose a domain to answer questions on, there were some questions about some domains that few participants answered.

This study investigated a global perspective on nuclear weapons. This broad perspective is valuable in providing a holistic assessment of nuclear weapons risk. However, it does mean that the depth to which we could explore aspects of nuclear weapons policy was limited. For example, a study of similar size to this one could be conducted specifically on any one of the adversarial domains we investigated. Limiting the scope of future studies could allow specific topics to be explored in greater depth. We believe there is value in both broader studies such as this one and more focused studies.

While our study included a large number of questions, covering over 20 different topics in the crux questions and over 20 different policies, there are many more questions we could have included. While we tried to achieve a balanced viewpoint in the questions, it’s possible that the ultimate list of policies leans towards policies aimed at reducing nuclear capacities, rather than strengthening nuclear arsenals, which some experts believe would reduce the odds of a nuclear catastrophe through increased deterrence. We did include policies involving Russia and the USA increasing the role within their arsenals of low-yield nuclear weapons, but generally, policies leaned “dove”-ish in their approach to nuclear weapons policy.

The focus on a single type of nuclear weapons event was another constraint. We did not explore the probability of smaller, more probable incidents or, on the other end of the scale, catastrophes of an even greater magnitude than our main outcome. Given the research on nuclear winters, it is possible that a nuclear war could kill many more than 10 million people. When considering the potential benefits of policies aiming at reducing the risk of nuclear weapons, low-probability but high-consequence events will likely account for much of the expected value of interventions. Therefore, we caution against using these results to estimate the cost-effectiveness of the policies we explored in the survey.

8. Next steps

Every January the Bulletin of the Atomic Scientists updates the “Doomsday Clock” to indicate how close the world is to a nuclear catastrophe.41 Some experts might pose the question, “So you’re recreating the ‘Doomsday Clock’?” And our answer is… not quite.

Our attempt to quantify risk does not solely aim to sound the alarm on nuclear weapons risk. Decision-makers often face a range of threats, each with varying degrees of probability and impact. A quantified risk framework helps clarify which threats are more immediate or severe. It also introduces a systematic approach that minimizes biases, promotes objectivity, and mitigates the influence of noise in the decision-making process. Without a structured approach to quantifying risks, decision-makers may disproportionately emphasize some risks while underestimating or overlooking others. To this end, we hope to provide an illuminating tool to help governments and other decision-makers with competing priorities allocate attention and resources.

The Doomsday Clock update is based on a survey of around 20 experts. This study, in a sense, provides a more molecular reading of the status of nuclear risk by incorporating more experts and enabling them to systematically assess and express risks using probabilities. However, nuclear weapons risk is a large and complicated topic. Our study offers a broad overview of the risk landscape, but much more could be done to investigate specific aspects of this landscape in greater depth. For example, future work could explore how a multilateral crisis communications network should operate. How should the center attempt to overcome problems that have plagued bilateral communications between adversaries, such as a lack of engagement and a lack of trust between leaders? How should leaders respond when such crises arise? Future studies could focus on these and other more detailed questions. Rather than providing all the answers, our study serves as a basis to build upon with future work.

More engagement with policymakers could also be highly beneficial. It would be ideal to include policymakers in the process of designing studies—particularly the forecasting questions and policies to be evaluated—as well as have them participate in surveys. Improved, future versions of these studies could help track views on nuclear weapons with enough granularity to inform policy choices.

There is scope for a wider range of activities that bring a systematic approach to assessing nuclear risks. For example, researchers could use foresight exercises and scenario planning sessions to further interrogate significant findings from the study. There is also potential to combine insights from open-source information to understand and monitor early warning indicators for nuclear escalation.

9. Conclusion

This year, 2024, has witnessed the intensification of two major geopolitical conflicts: Russia-Ukraine on the one hand, and Israel-Gaza (and now Lebanon) on the other. Policymakers and diplomats are seeking to broker ceasefire, de-escalation, and détente, but the biggest challenge they face is the degradation of communication channels. As we write this in October 2024, the recent rounds of escalation in the Middle East have demonstrated the difficulty of exercising restraint. This highlights the importance of one of the most popular policies in this study, a crisis communications network. The other most popular policy, failsafe reviews, suggests a proactive approach to identifying areas where misunderstandings or miscommunication might arise. While nuclear weapons are a complex problem, the participants in this survey were optimistic that steps can be taken to reduce the risk they pose.

It’s difficult to compare highly complex and abstract threats like nuclear war to other existential risks without a common metric. Using probabilities allows for a more nuanced understanding of how nuclear risks stack up against other potentially catastrophic events. Quantifying the risk of a nuclear catastrophe alongside other existential risks provides a clearer framework for understanding and communicating which threats require immediate action, sustained attention, or strategic monitoring. By translating abstract concerns into measurable probabilities, it becomes easier to engage in informed, rational prioritization amid a chaotic and noisy political environment.

Notes

  1. We define an outlier as any observation that falls below (Q1 – 1.5×IQR) or above (Q3 + 1.5×IQR), where Q1 and Q3 are the first and third quartiles and IQR is the interquartile range. ↩︎
  2. This policy would see all nuclear-armed states participating in a secure (to damage to physical infrastructure and cyberattacks) communications network that allowed bilateral and multilateral communication between country leaders. See Appendix 4 for more detail. ↩︎
  3. This policy would see all nuclear-armed states establish a review mechanism based on national defense guidelines to identify and develop plans to mitigate risks of inadvertent or accidental nuclear use. See Appendix 4 for more detail. ↩︎
  4. “Superforecasters” outperformed experts and intelligence analysts in forecasting tournaments held by the Good Judgment Project, or had equivalent forecasting skill according to follow-up work by Good Judgment Inc. See Tetlock, Philip E., Barbara A. Mellers, Nick Rohrbaugh, and Eva Chen. “Forecasting Tournaments: Tools for Increasing Transparency and Improving the Quality of Debate.” Current Directions in Psychological Science 23, no. 4 (2014): 290–295. https://doi.org/10.1177/0963721414534257. ↩︎
  5. These were non-experts or laypeople who had some interest in current affairs but lacked specialized
    training. ↩︎
  6. Tetlock, Philip E., Christopher Karvetski, Ville A. Satopää, and Kevin Chen. “Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment.” Futures & Foresight Science 6 (2024): e157. https://doi.org/10.1002/ffo2.157. ↩︎
  7. Lugar, Richard G. “The Lugar Survey on Proliferation Threats and Responses.” Washington, D.C.: United States Senate Foreign Relations Committee, 2005. https://irp.fas.org/threat/lugar_survey.pdf.
    ↩︎
  8. Project for the Study of the 21st Century. “Great Power Conflict Report.” November 12, 2015. https://projects21.org/2015/11/12/ps21-survey-experts-see-increased-risk-of-nuclear-war/. ↩︎
  9. Karger, Ezra, Josh Rosenberg, Zach Jacobs, et al. “Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament.” FRI Working Paper #1. Forecasting Research Institute, 2023. https://forecastingresearch.org/research/existential-risk-persuasion-tournament. ↩︎
  10. Karger et al., XPT, 44. ↩︎
  11. The Russian invasion of Ukraine began on February 24th, 2022. ↩︎
  12. YouGov. “How Likely Do You Think We Are to Get into a Nuclear War within the Next Ten Years?” Survey, February 28, 2022. https://today.yougov.com/topics/politics/survey-results/daily/2022/02/28/c6993/3. Statista. “How Likely Do You Think We Are to Get into a Nuclear War within the Next Ten Years?” Survey, February 1–7, 2024. https://www.statista.com/statistics/1308926/us-opinion-likelihood-nuclear-war/. “How likely do you think we are to get into a nuclear war within the next ten years?,” survey,
    February 1 to 7, 2024, https://www.statista.com/statistics/1308926/us-opinion-likelihood-nuclear-war/. ↩︎
  13. FOMnibus. “Ядерное оружие.” Survey conducted October 27–29, 2023. Fond Obshchestvennoe Mnenie (Public Opinion Foundation), November 10, 2023. https://fom.ru/Bezopasnost-i-pravo/14942. ↩︎
  14. For crowd forecasts. Retrieval date for ongoing forecasts. ↩︎
  15. “Not just a warning shot – a targeting of military or civilian targets.” ↩︎
  16. Wyden, Peter H. Bay of Pigs: The Untold Story. Simon and Schuster, 1979.Peter H. Wyden, Bay of Pigs: The Untold Story (New York: Simon and Schuster, 1979). ↩︎
  17. Mauboussin, Andrew, and Michael J. Mauboussin. “If You Say Something Is ‘Likely,’ How Likely Do People Think It Is?” Harvard Business Review, July 3, 2018. https://hbr.org/2018/07/if-you-say-something-is-likely-how-likely-do-people-think-it-is. ↩︎
  18. Cited in Gleditsch, Kristian Skrede. “One without the Other? Prediction and Policy in International Studies.” International Studies Quarterly 66, no. 3 (2022): sqac036. https://doi.org/10.1093/isq/sqac036. ↩︎
  19. Friedman, Jeffrey A., Joshua D. Baker, Barbara A. Mellers, Philip E. Tetlock, and Richard Zeckhauser. “The Value of Precision in Probability Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament.” International Studies Quarterly 62, no. 2 (2018): 410–422. https://doi.org/10.1093/isq/sqx078. ↩︎
  20. Tetlock, Philip E. Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press, 2005. ↩︎
  21. Mellers, Barbara A., Philip E. Tetlock, Joshua D. Baker, Jeffrey A. Friedman, and Richard Zeckhauser. “Chapter 12. Improving the Accuracy of Geopolitical Risk Assessments.” In The Future of Risk Management, edited by Howard Kunreuther, Robert J. Meyer, and Erwann O. Michel-Kerjan. University of Pennsylvania Press, 2019. https://doi.org/10.9783/9780812296228-013. ↩︎
  22. Muehlhauser, Luke. “How Feasible Is Long-Range Forecasting?” Open Philanthropy, October 10, 2019. https://www.openphilanthropy.org/research/how-feasible-is-long-range-forecasting/. ↩︎
  23. Karger et al., XPT. ↩︎
  24. Ruhl, Christian. “Philanthropy to the Right of Boom.” Founders Pledge, February 13, 2023. https://www.founderspledge.com/research/philanthropy-to-the-right-of-boom. ↩︎
  25. These organizations included but were not limited to Nuclear Threat Initiative, Arms Control Association, Founders Pledge, Open Philanthropy, Carnegie Endowment, Chatham House, and the Union of Concerned Scientists. ↩︎
  26. Each participant was allocated five of the general (i.e. not domain-specific) crux questions. ↩︎
  27. Karger, Ezra, Joshua Monrad, Barbara Mellers, and Philip Tetlock. “Reciprocal Scoring: A Method for Forecasting Unanswerable Questions.” SSRN Working Paper, 2021. https://dx.doi.org/10.2139/ssrn.3954498. ↩︎
  28. Organizations that kindly helped with distribution of the survey were: Emerging Voices Network at
    BASIC, European Leadership Network, Asia Pacific Leadership Network, Younger Generation Leaders
    Network, Project on Nuclear Issues at CSIS, and the Pacific Forum. ↩︎
  29. A full list of the organizations and reports we reviewed is available in Appendix 2. ↩︎
  30. Tetlock, Philip E., Barbara A. Mellers, Nick Rohrbaugh, and Eva Chen. “Forecasting Tournaments: Tools for Increasing Transparency and Improving the Quality of Debate.” Current Directions in Psychological Science 23, no. 4 (2014): 290–295. https://doi.org/10.1177/0963721414534257. ↩︎
  31. We believe that this very low proportion of female superforecaster participants is largely a function of recruitment from the study that originally identified superforecasters among a broad population of participants (Mellers, Barbara, Lyle Ungar, Jonathan Baron, et al. “Psychological Strategies for Winning a Geopolitical Forecasting Tournament.” Psychological Science 25, no. 5 (2014): 1106–1115. https://doi.org/10.1177/0956797614524255.) Of the participants in that study, 83% were male. ↩︎
  32. In some places, we report the average or mean. These are instances where participants are asked to
    distribute votes, ranks, or probabilities (i.e., saying how likely several mutually exclusive but collectively exhaustive events are, such that the total probabilities sum to 100%). ↩︎
  33. Karger et al., XPT, 20-22. ↩︎
  34. Here we report averages, rather than medians, as the aggregate group measure. This is done so that
    the totals across the domains sum to 100%. ↩︎
  35. Participants were also asked how their probability of nuclear catastrophe by 2045 would change if they knew that this event would occur. Therefore, these changes in forecasts shouldn’t be taken as
    representing the causal effect of the event. For example, it’s possible that a participant who would reduce their probability of catastrophe if an arms control agreement occurred might not think that the agreement itself would cause any change in risk. Instead, they might think that an arms control agreement would indicate that the relationship between the countries has improved, and they might reduce their predicted risk of catastrophe for that reason. ↩︎
  36. Barrett, Anthony M., Seth D. Baum, and Kelly Hostetler. “Analyzing and Reducing the Risks of Inadvertent Nuclear War Between the United States and Russia.” Science & Global Security 21, no. 2 (2013): 106–133. https://doi.org/10.1080/08929882.2013.798984. ↩︎
  37. Institute for Security and Technology. “CATALINK.” Accessed October 17, 2024. https://securityandtechnology.org/catalink/. ↩︎
  38. U.S. Department of Defense. “2022 National Defense Strategy of the United States of America.” U.S. Department of Defense, October 27, 2022. https://media.defense.gov/2022/Oct/27/2003103845/-1/-1/1/2022-NATIONAL-DEFENSE-STRATEGY-NPR-MDR.pdf. ↩︎
  39. Summaries of the full statements are presented here. We suggest reading the full statements, which
    can be found in Appendix 4. ↩︎
  40. Atanasov, Pavel D., Ezra Karger, and Philip Tetlock. “Full Accuracy Scoring Accelerates the Discovery of Skilled Forecasters.” SSRN Working Paper, February 13, 2023. https://dx.doi.org/10.2139/ssrn.4357367. ↩︎
  41. Bulletin of the Atomic Scientists. “Doomsday Clock.” Accessed October 17, 2024. https://thebulletin.org/doomsday-clock/current-time/. ↩︎

* = Forecasting Research Institute
† = Federal Reserve Bank of Chicago
‡ = Open Nuclear Network, a Programme of Pax Sapiens
§ = Wharton School of the University of Pennsylvania
    Related Research
    Working paper
    Assessing Near-Term Accuracy in the Existential Risk Persuasion Tournament
    Sep 2, 2025
    Academic article
    Long‐range subjective‐probability forecasts of slow-motion variables in world politics: Exploring limits on expert judgment
    Apr 26, 2023
    Working paper
    Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament
    Jul 10, 2023
    Academic article
    Project Improbable: Improving Low-Probability Judgments
    Jan 10, 2025