Published: Mar 11, 2024

Working paper

Working paper

Roots of Disagreement on AI Risk

In this study, participants who had very different views on AI-caused existential risk worked together to try to identify the strongest near-term cruxes that would lead to changes in their beliefs.

Josh Rosenberg^*, Ezra Karger^⤉,*, Avital Morris^*, Molly Hickman^*, Rose Hadshar^*, Zachary Jacobs^*, Philip E. Tetlock^⤈,* ,

* Forecasting Research Institute
⤉ Federal Reserve Bank of Chicago
⤈ Wharton School of the University of Pennsylvania

Published: Mar 11, 2024

Josh Rosenberg^*, Ezra Karger^⤉,*, Avital Morris^*, Molly Hickman^*, Rose Hadshar^*, Zachary Jacobs^*, Philip E. Tetlock^⤈,*

Abstract

We brought together generalist forecasters and domain experts (n=22) who disagreed about the risk AI poses to humanity in the next century. The “concerned” participants (all of whom were domain experts) predicted a 20% chance of an AI-caused existential catastrophe by 2100, while the “skeptical” group (mainly “superforecasters”) predicted a 0.12% chance. Participants worked together to find the strongest near-term cruxes: forecasting questions resolving by 2030 that would lead to the largest change in their beliefs (in expectation) about the risk of existential catastrophe by 2100. Neither the concerned nor the skeptics substantially updated toward the other’s views during our study, though one of the top short-term cruxes we identified is expected to close the gap in beliefs about AI existential catastrophe by about 5%: approximately 1 percentage point out of the roughly 20 percentage point gap in existential catastrophe forecasts. We find greater agreement about a broader set of risks from AI over the next thousand years: the two groups gave median forecasts of 30% (skeptics) and 40% (concerned) that AI will have severe negative effects on humanity by causing major declines in population, very low self-reported well-being, or extinction.

View the full PDF report

Acknowledgments

This research would not have been possible without the support of Open Philanthropy.

We thank the research participants for their invaluable contributions.

We greatly appreciate the assistance of Page Hedley for data analysis and editing on the report, Taylor Smith and Bridget Williams as adversarial collaboration moderators, and Kayla Gamin, Coralie Consigny, and Harrison Durland for their careful editing.

We thank Elie Hassenfeld, Eli Lifland, Nick Beckstead, Bob Sawyer, Kjirste Morrell, Adam Jarvis, Dan Mayland, Jeremiah Stanghini, Jonathan Hosgood, Dwight Smith, Ted Sanders, Scott Eastman, John Croxton, Raimondas Lencevicius, Alexandru Marcoci, Kevin Dorst, Jaime Sevilla, Rose Hadshar, Holden Karnofsky, Benjamin Tereick, Isabel Juniewicz, Walter Frick, Alex Lawsen, Matt Clancy, Tegan McCaslin, and Lyle Ungar for comments on the report.

Executive summary

In the summer of 2022, researchers affiliated with the Forecasting Research Institute (FRI) (a)¹ ran the Existential Risk Persuasion Tournament (XPT) (a), which identified large disagreements between domain experts and generalist forecasters about key risks to humanity (Karger et al. 2023). This new project—a structured adversarial collaboration run in April and May 2023—is a follow-up to the XPT focused on better understanding the drivers of disagreement about AI risk.

Methods

We recruited participants to join “AI skeptic” (n=11) and “AI concerned” (n=11) groups that disagree strongly about the probability that AI will cause an existential catastrophe by 2100.² The skeptic group included nine superforecasters and two domain experts. The concerned group consisted of domain experts referred to us by staff members at Open Philanthropy (the funder of this project) and the broader Effective Altruism community.

Participants spent 8 weeks (skeptic median: 80 hours of work on the project; concerned median: 31 hours) reading background materials, developing forecasts, and engaging in online discussion and video calls. We asked participants to work toward a better understanding of their sources of agreement and disagreement, and to propose and investigate “cruxes”: short-term indicators, usually resolving by 2030, that would cause the largest updates in expectation to each group’s view on the probability of existential catastrophe due to AI by 2100.

Results: What drives (and doesn’t drive) disagreement over AI risk

At the beginning of the project, the median “skeptic” forecasted a 0.10% chance of existential catastrophe due to AI by 2100, and the median “concerned” participant forecasted a 25% chance. By the end, these numbers were 0.12% and 20% respectively, though many participants did not attribute their updates to arguments made during the project.³

We organize our findings as responses to four hypotheses about what drives disagreement:

Hypothesis #1 – Disagreements about AI risk persist due to lack of engagement among participants, low quality of participants, or because the skeptic and concerned groups did not understand each other’s arguments⁴

We found moderate evidence against these possibilities. Participants engaged for 25-100 hours each (skeptic median: 80 hours; concerned median: 31 hours), this project included a selective group of superforecasters and domain experts, and the groups were able to summarize each other’s arguments well during the project and in follow-up surveys. (More)

Hypothesis #2 – Disagreements about AI risk are explained by different short-term expectations (e.g. about AI capabilities, AI policy, or other factors that could be observed by 2030)

Most of the disagreement about AI risk by 2100 is not explained by indicators resolving by 2030 that we examined in this project. According to our metrics of crux quality, one of the top cruxes we identified is expected to close the gap in beliefs about AI existential catastrophe by about 5% (approximately 1.2 percentage points out of the 22.7 percentage point gap in forecasts for the median pair) when it resolves in 2030.⁵ For at least half of participants in each group, there was a question that was at least 5-10% as informative as being told by an oracle whether AI in fact caused an existential catastrophe or not.⁶ It is difficult to contextualize the size of these effects because this is the first project applying question metrics to AI forecasting questions that we are aware of.

However, near-term cruxes shed light on what the groups believe, where they disagree, and why:

Evaluations of dangerous AI capabilities are relevant to both groups. One of the strongest cruxes that will resolve by 2030 is about whether METR (formerly known as ARC Evals) (a) or a similar group will find that AI has developed dangerous capabilities such as autonomously replicating and avoiding shutdown. This crux illustrates a theme in the disagreement: the skeptic group typically did not find theoretical arguments for AI risk persuasive but would update their views based on real-world demonstrations of dangerous AI capabilities that verify existing theoretical arguments. If this question resolves negatively then the concerned group would be less worried, because it would mean that we have had years of progress from today’s models without this plausible set of dangerous capabilities becoming apparent. (More)
Generally, the questions that would be most informative to each of the two groups are fairly distinct. The concerned group’s highest-ranked cruxes tended to relate to AI alignment and alignment research. The skeptic group’s highest-ranked cruxes tended to relate to the development of lethal technologies and demonstrations of harmful AI power-seeking behavior. This suggests that many of the two groups’ largest sources of uncertainty are different, and in many cases further investigation of one group’s uncertainties would not persuade the other. (More)
Commonly-discussed topics—such as near-term economic effects of AI and progress in many AI capabilities—did not seem like strong cruxes. (More)

Hypothesis #3 – Disagreements about AI risk are explained by different long-term expectations

We found substantial evidence that disagreements about AI risk decreased between the groups when considering longer time horizons (the next thousand years) and a broader set of severe negative outcomes from AI beyond extinction or civilizational collapse, such as large decreases in well-being or total population.

Some of the key drivers of disagreement about AI risk are that the groups have different expectations about: (1) how long it will take until AIs have capabilities far beyond those of humans in all relevant domains; (2) how common it will be for AI systems to develop goals that might lead to human extinction; (3) whether killing all living humans would remain difficult for an advanced AI; and (4) how adequately they expect society to respond to dangers from advanced AI.⁷

Supportive evidence for these claims includes:

Both groups strongly expected that powerful AI (defined as “AI that exceeds the cognitive performance of humans in >95% of economically relevant domains”) would be developed by 2100 (skeptic median: 90%; concerned median: 88%). Though, some skeptics argue that (i) strong physical capabilities (in addition to cognitive ones) would be important for causing severe negative effects in the world, and (ii) even if AI can do most cognitive tasks, there will likely be a “long tail” of tasks that require humans.

The two groups also put similar total probabilities on at least one of a cluster of bad outcomes from AI happening over the next 1000 years (median 40% and 30% for concerned and skeptic groups respectively).⁸ But they distribute their probabilities differently over time: the concerned group concentrates their probability mass before 2100, and the skeptics spread their probability mass more evenly over the next 1,000 years.
We asked participants when AI will displace humans as the primary force that determines what happens in the future.⁹ The concerned group’s median date is 2045 and the skeptic group’s median date is 2450—405 years later.

Overall, many skeptics regarded their forecasts on AI existential risk as worryingly high, although low in absolute terms relative to the concerned group.¹⁰

Despite their large disagreements about AI outcomes over the long term, many participants in each group expressed a sense of humility about long-term forecasting and emphasized that they are not claiming to have confident predictions of distant events.

Hypothesis #4 – These groups have fundamental worldview disagreements that go beyond the discussion about AI

Disagreements about AI risk in this project often connected to more fundamental worldview differences between the groups. For example, the skeptics were somewhat anchored on the assumption that the world usually changes slowly, making the rapid extinction of humanity unlikely. The concerned group worked from a different starting point: namely, that the arrival of a higher-intelligence species, such as humans, has often led to the extinction of lower-intelligence species, such as large mammals on most continents. In this view, humanity’s prospects are grim as soon as AI is much more capable than we are. The concerned group also was more willing to place weight on theoretical arguments with multiple steps of logic, while the skeptics tended to doubt the usefulness of such arguments for forecasting the future.

Results: Forecasting methodology

This project establishes clear quantifiable metrics for evaluating the quality of AI forecasting questions. And we view this project as an ongoing one. So, we invite readers to try to generate cruxes that outperform the top cruxes from our project thus far—an exercise that underscores the value of establishing comparative benchmarks for new forecasting questions. See the “Value of Information” (VOI) and “Value of Discrimination” (VOD) calculators (a) to inform intuitions about how these question metrics work. And please reach out to the authors with suggestions for high-quality cruxes.

Broader scientific implications

This project has implications for how much we should expect rational debate to shift people’s views on AI risk. Thoughtful groups of people engaged each other for a long time but converged very little. This raises questions about the belief formation process and how much is driven by explicit rational arguments vs. difficult-to-articulate worldviews vs. other, potentially non-epistemic factors (see research literature on motivated cognition, such as Gilovich et al. 2002; Kunda, 1990; Mercier and Sperber, 2011).

One notable finding is that a highly informative crux for each group was whether their peers would update on AI risk over time. This highlights how social and epistemic groups can be important predictors of beliefs about AI risk.¹¹

Glossary

ARC Evals

An organization, now called METR (Model Evaluation & Threat Research), that works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization. See “ARC Evals” for discussion of forecasts conditional on METR finding evidence of AI having the ability to autonomously replicate, acquire resources, and avoid shutdown before 2030.

Convergent crux

A question such that, conditional on it resolving, two people or groups will, in expectation, disagree less than they do now. See “Convergent Cruxes” for discussion of convergent cruxes found in this study.

Cross-camp pair

A pair consisting of one member of the skeptic group and one member of the concerned group. See “VOD” for discussion of questions that would narrow or widen disagreement for the median cross-camp pair when ranked by VOD, and “Differences of Opinion Within Groups” for discussion of each cross-camp pair’s differences on one question.

Divergent crux

A question such that, conditional on it resolving, two people or groups will, in expectation, disagree more than they do now. See “Divergent Cruxes” for discussion of divergent cruxes found in this study.

Existential catastrophe

Defined in this study as an event in which at least one of the following occurs:

Humanity goes extinct
Humanity experiences “unrecoverable collapse,” which means either:
- <$1 trillion global GDP annually [in 2022 dollars] for at least a million years (continuously), beginning before 2100; or
- Human population remains below 1 million for at least a million years (continuously), beginning before 2100.

Flash forecast

A forecast on which participants were recommended to spend approximately 10 minutes.

Instrumental convergence, the hypothesized tendency for intelligent agents to develop similar sub-goals that are helpful for achieving most other goals, even if their ultimate goals are very different. In particular, sub-goals like acquiring resources, avoiding being killed/destroyed, and avoiding interference from other agents could be helpful for achieving a wide variety of other goals. See “ARC Evals” for discussion of forecasts conditional on a model having capabilities that might suggest instrumental convergence.

METR

Model Evaluation & Threat Research. See ARC Evals.

The “Ultimate question.” In this study: “Will AI cause an existential catastrophe by 2100?”

VOD

Value of Discrimination (VOD) is a measure of how much knowing the answer to a question would change relative beliefs between individuals, in expectation. It is useful for measuring convergence and divergence in expected beliefs between individuals. See “VOD” for discussion of questions that would narrow or widen disagreement between the skeptic and concerned groups in expectation and Appendix 2 for an explanation of how VOD is calculated.

VOI

Value of Information (VOI) is a measure of how much knowing the answer to a question would change an individual’s belief, in expectation. This is useful for understanding why individuals believe what they believe and what would change their minds. See “VOI” for discussion of informative questions and Appendix 2 for an explanation of how VOI is calculated.

P(U)

The probability that U, the ultimate question, occurs. In this case, the probability that AI causes an existential catastrophe by 2100.

P(C)

The probability that a potential crux question occurs. See Appendix 1 for a list of candidate cruxes, and “VOI: Results Tables and Figures” for the median participant in each group’s P(C) for each crux.

POM

Percent of max. When we present VOI and VOD for each question, we also present how much of the maximum VOI or VOD it captured in order to contextualize the magnitude of the results. See “VOI” for discussion of POM VOI and “VOD” for discussion of POM VOD. See “ARC Evals” for an example of calculating POM VOD.

Background & Motivation

From June through October 2022, researchers affiliated with the Forecasting Research Institute (FRI) conducted the Existential Risk Persuasion Tournament (XPT) (a). A clear pattern emerged in its findings: AI domain experts thought extinction due to AI in the 21st century was much more likely than skilled generalist forecasters (“superforecasters”) thought, and neither group persuaded the other much, despite working collaboratively and being incentivized to share persuasive arguments.¹² In addition, experts and superforecasters often agreed about short-term AI developments, while still disagreeing about the likelihood of extinction due to AI.¹³

In April and May of 2023, FRI ran a follow-up AI adversarial collaboration¹⁴ project that aimed to figure out what drives disagreement about long-run AI risk. We aimed to get more time from a select group of high-quality participants and supported them with moderators, adversarial collaboration video calls, and seminar discussions with AI experts, among other activities. To support deep engagement, we kept this study small: eleven “skeptics” and eleven “concerned” participants.

We also designed the project to identify short-run indicators (“cruxes”) resolving by 2030 that could help to diagnose reasons for disagreement and act as signals for the level of long-run AI risk.¹⁵ While the XPT questions were chosen by our research team, this project asked the participants to collaborate to find the strongest cruxes, or short-run AI questions that would change beliefs about long-run AI risk the most in expectation.

So, why do thoughtful people disagree so strongly about AI risk? We organize our findings into four hypotheses about drivers of disagreement.

First, people who disagree may have not spent enough time engaging with each other or may not understand each other’s arguments. Some of our readers suspected that superforecasters had not digested the main arguments for AI risk and would have been more concerned if they had, whereas others suspected that experts simply spent too much time talking to people who share their worldview and hadn’t spent enough time talking to thoughtful skeptics.¹⁶ If this hypothesis were true, we would expect that the two groups would agree more if there were enough high-quality engagement between them to understand each other’s arguments.

Second, the disagreeing groups could have different predictions about short-term (by 2030) AI developments, such as how likely AI is to develop dangerous capabilities or which AI policies society is likely to adopt. If this hypothesis were true, we would expect the two groups to agree if we condition on specific AI-related developments. For example, if they disagreed about how long it will take until AI can write code to improve itself, but agreed that this development would mean serious danger for humanity, then we would expect them to agree on AI risk if we condition on AI improving itself. In this project, we asked participants to make many such conditional forecasts.¹⁷

Third, they could disagree about how AIs will develop or how society will respond in the longer term (through 2100 or beyond). Perhaps the groups cannot identify short-term AI outcomes that distinguish their risk models, but expect very different long-term AI trajectories.

Finally, they could have more fundamental worldview disagreements that go beyond AI. If they agree about most AI-related developments but continue to disagree about AI risk, there could be something else underlying their difference of opinion. There could be disagreements about how much they trust different categories of evidence or argumentation, or what they believe about human ingenuity and resilience, or any number of other topics that go beyond AI.

How did we test potential drivers of disagreement?

We brought together 22 participants who disagreed strongly on AI existential risk. Half of the participants were termed AI “skeptics,”¹⁸ people whose XPT forecasts of the probability that AI would cause extinction by 2100 were <1%, and who produced high-quality rationales for their forecasts. This group of 11 AI skeptics included nine superforecasters and two domain experts. The other 11 participants were people concerned about AI, whom we expected to forecast a >10% chance that AI would cause an existential catastrophe by 2100. The “AI concerned” participants were AI safety researchers and AI-knowledgeable generalist researchers who were recommended as being able to present and discuss AI-concerned views clearly.

We asked these two groups to engage deeply with each other’s arguments and to work together to identify cruxes with the most potential to update their forecasts on AI existential risk.

Participants made an initial forecast on the core question they disagreed about (we’ll call this U, for “ultimate question”): by 2100, will AI cause an existential catastrophe? We defined “existential catastrophe” as an event in which at least one of the following occurs:

Humanity goes extinct
Humanity experiences “unrecoverable collapse,” which means either:
1. <$1 trillion global GDP annually [in 2022 dollars] for at least a million years (continuously), beginning before 2100; or
2. Human population remains below 1 million for at least a million years (continuously), beginning before 2100.

For additional resolution details, such as the definition of “cause,” see Appendix 1.

Over the next eight weeks, participants made forecasts on candidate crux questions that could help explain the disagreement, generated new possible cruxes during adversarial collaboration calls, and debated and refined their reasoning on an online platform. (See section below for more details on how the project worked.)

The central disagreement

The two groups were selected for disagreeing strongly about the likelihood of existential catastrophe due to AI by 2100, and they continued to disagree throughout the project. At the outset, the median skeptic forecasted a 0.10% chance of existential catastrophe due to AI by 2100, and the median concerned participant forecasted a 25% chance. Over the course of the two-month project, there was mild convergence: the skeptic group’s median moved from 0.10% to 0.12% and the concerned group’s median fell from 25% to 20%.

However, April–May 2023 was an exciting time in real-world AI developments: GPT-4 had just become available, and regulators and the public were beginning to respond. Several participants attributed their updated probability of extinction due to AI by 2100 to these developments, and not to updates they made based on their work on this project.¹⁹

	Mean	Median	Range
Skeptic	0.54%	0.1%	0.0000001% – 3%
Concerned	28.4%	25%	4% – 65%

Table 1: Group P(AI-caused existential catastrophe by 2100), based on each participant’s initial forecast

	Mean	Median	Range
Skeptic	0.46%	0.12%	0.0001% – 2%
Concerned	23.8%	20%	2.4% – 55%

Table 2: Group P(AI-caused existential catastrophe by 2100) at the end of the project

Six people in the concerned group lowered their forecasts and none raised them. Five people in the skeptic group raised their forecast, four lowered, and one raised but only because of an initial typo. For details on updated forecasts and reasons for updates from each participant, see Appendix 4.

**Figure 1:** Initial and final P(AI existential catastrophe by 2100) for skeptic and concerned groups. See Appendix 4 for reasons for updates.

When we asked participants for forecasts on AI causing the deaths of more than 60% of the human population (within a 5-year period) before 2100, forecasts were closer than those about existential catastrophe, but there was still a large disagreement. The median concerned participant forecasted 32%, and the median skeptic forecasted 1%. This supports the claim that skeptics think that AI killing many people is more likely than causing existential catastrophe, whether because of the likelihood that it is useful to some other goal, the difficulty of killing people living in remote areas, or the likelihood of successful societal response to extreme catastrophe. However, the disagreement between groups is still large. Their disagreement about AI risk is deeper than the question of whether a very small number of humans will survive an AI catastrophe.

Although the disagreement between groups was large, many participants emphasized that their forecasts should be taken with a sense of humility, and that long-run forecasting is inherently uncertain and they are not claiming to have complete pictures of how events will unfold over the coming decades.²⁰ Most previous evidence on judgmental forecasting applies to geopolitical forecasts on 0-2 year time horizons.²¹

How the AI adversarial collaboration worked

The core activities of this project ran from April 1 to May 31, 2023.

Recruitment

We recruited 11 participants for the skeptic group who forecasted <1% on P(AI extinction by 2100) in the XPT, and stood out to either our research team or other XPT participants as having high-quality rationales and being collaborative. Nine out of 11 of these participants were superforecasters and two were domain experts from our XPT sample.²²

We recruited 11 participants for the concerned group whom we expected to forecast >10% on P(AI existential catastrophe by 2100) and to be collaborative communicators. We began with recommendations for participants from staff members at Open Philanthropy and then did a broader search for reputable AI safety researchers and AI-knowledgeable generalist researchers (such as participants from Rethink Priorities (a) and Epoch (a)). Several of the concerned participants also had strong public track records of forecasting accuracy on short-run questions.

In the rest of this report, to preserve anonymity, we refer to participants with assigned aliases. Aliases beginning with A-K are assigned to skeptics, and aliases beginning with P-Z are assigned to concerned participants (in random order within each group).

Activities to facilitate engagement between the skeptic and concerned groups

As preparation for the project, we asked the skeptic group to read Holden Karnofsky’s Most Important Century series (a) and related resources on AI existential risk recommended by staff members at Open Philanthropy.

Most discussion between groups happened on an online forum and forecasting platform that we set up for this project. Most quotes in this report come from the platform discussion. Moderators identified key areas of disagreement and started forum threads to try to advance debate. We intervened in a few cases where dialogue became combative rather than collaborative, and generally tried to orient participants toward collaboration. For examples of platform discussion, see Appendix 8.

Each participant also had a one-on-one adversarial collaboration call with a member of the other group every two weeks, where participants were asked to summarize one another’s views and then generate possible cruxes. Most of these calls were moderated by a member of our team, who steered discussion and asked follow-up questions, and a few were unmoderated. Our team moderated approximately 35 one-hour adversarial collaboration video calls between individuals in the concerned and skeptic groups. The calls were recorded and, where noted, some quotes in this report are from adversarial collaboration calls.

We initially elicited forecasts and rationales on P(AI existential catastrophe by 2100) and questions related to transformative economic growth. We worked with participants to generate ideas for cruxes resolving by 2030. We shared materials with participants about how we would measure crux quality. (See more on our “Value of information” metric below.) We also created forum threads to elicit cruxes. Based on discussion from the forum and calls, we created targeted threads on particular topics (e.g. policy change, robotics, etc.) to identify cruxes.

We and the participants generated approximately 250 ideas for “cruxes,” and also considered cruxes proposed by AI experts during our Conditional Trees project (a).

Eliciting forecasts and rationales on cruxes

Every two weeks, our team selected approximately 11 of the most promising crux ideas, quickly turned them into forecasting questions, and asked each forecaster to provide “flash” (10 minute) forecasts on them. (See the 33 flash forecasting questions here and results here.)

Cruxes that were most promising from each flash forecast round were then operationalized into more rigorous forecasting questions and added to the platform to gather more in-depth (approximately 1 hour) forecasts and rationales. (See the four “Platform” forecasting questions here and results here. We asked for in-depth forecasts on both the P(Crux) and the P(AI existential catastrophe by 2100 | Crux).)

Other activities

Participants also suggested valuable project activities. For example, a participant’s suggestion inspired the survey on long-term AI outcomes that helped us get a broader sense of how this sample thought about outcomes beyond P(AI existential catastrophe by 2100).

We held three 1-hour question-and-answer sessions attended by most skeptic participants with AI risk experts from DeepMind, the UK Government’s Advanced Research + Invention Agency (ARIA), and Open Philanthropy.

Our team shared results with participants when feasible and got valuable feedback from them, including their suggested revisions on our interpretations of their sources of agreement and disagreement.

Hypothesis #1: Do the groups understand each other’s arguments, and do views shift with more engagement?

In response to the XPT results, commenters argued that perhaps there was not convergence on AI risk forecasts because:²³

There was not enough engagement among participants who disagreed with each other.²⁴
Experts who could compellingly make the case for AI risk were not included.²⁵
More broadly, perhaps the groups did not understand each other’s arguments, and participants in one group would change their minds if they spent substantial time working to absorb the other group’s evidence, arguments, and worldview.

This follow-up study to the XPT was partly designed to assess the validity of these criticisms. And we see this study as providing moderate evidence against these factors explaining the lack of convergence:

Participants engaged in this project for 25-100 hours each (skeptic median: 80 hours; concerned median: 31 hours),²⁶ and their engagement was supported by moderators, video calls, and a format focused on identifying cruxes, among other factors.
We included concerned group domain experts who were either recommended or approved by staff members at Open Philanthropy. We also held seminar discussions with AI risk experts from DeepMind, the UK Government’s Advanced Research + Invention Agency (ARIA), and Open Philanthropy.
The groups were able to summarize each other’s arguments well during the project and in follow-up surveys, suggesting that they engaged with and understood arguments they disagreed with.

The remainder of this section focuses on the groups’ understanding of each other’s arguments according to their reports in a post-project survey.

Understanding each other’s arguments

To test whether experts and superforecasters failed to converge because they did not understand one another’s arguments, we asked participants to discuss and explain one another’s positions in several formats:

Participants gave rationales for their forecasts on the ultimate question (likelihood of AI existential catastrophe by 2100) and candidate crux questions and discussed one another’s rationales in an online forum.
Participants had moderated one-on-one adversarial collaboration calls every two weeks, in which one skeptic and one concerned participant were asked to summarize each other’s views and attempt to generate cruxes.
In a survey at the end of the project, participants were asked to summarize the best arguments and counterarguments for their own and the other side.

We found that both groups were generally able to summarize the other side’s arguments well. In a post-project survey, we asked “What do you think are the best three arguments put forward by each side?” Below, we give the arguments each side provided. The similarity between arguments provided by skeptics and arguments provided by concerned participants attempting to summarize skeptics’ arguments suggests that concerned participants had a good model of what skeptics thought, and vice versa.

Arguments for lower risk

Arguments from skeptics:

Many different things would all need to go wrong in a short time frame for humanity to go extinct by 2100²⁷
Killing everyone is hard, and even if an AI kills many people, there are many ways that significant numbers could survive²⁸
Theoretical arguments should not be weighted too heavily in the absence of real-life examples²⁹
We do not have enough evidence to be confident that AIs will want to harm large numbers of people³⁰
Humans are likely to be able to solve alignment and control problems³¹
2100 is too soon to expect to see AIs dangerous enough to cause human extinction, even if they will emerge eventually³²

Arguments from the concerned group (intending to summarize skeptics’ arguments, not necessarily their own strongest arguments against AI existential catastrophe by 2100):

Extinction would require multiple things, many of them unprecedented, to all go wrong³³
Killing all humans is hard, even if killing a large number of people may not be³⁴
Arguments for AI risk are mostly theoretical and do not have much empirical evidence to support them³⁵
Humans may be well-positioned to stop dangerous AIs as we have controlled other dangerous technologies³⁶

Likewise, the similarity between arguments provided by concerned participants and arguments provided by skeptics attempting to summarize concerned participants’ arguments suggests that skeptics had a good model of what concerned participants thought.

Arguments for higher risk

Arguments from the concerned group:

Non-extinction would require many things to all go right, many of which seem unlikely³⁷
Base rates are hard to use for transformative technologies or for outcomes with unclear reference classes³⁸
Current progress is fast and on a steep trajectory³⁹
Instrumental convergence is likely⁴⁰
Alignment is a hard problem that we do not know how to solve⁴¹
Short-term incentives may lead labs and other actors to be incautious⁴²

Arguments from skeptics (intending to summarize the concerned group’s arguments, not necessarily their own strongest arguments for AI existential catastrophe by 2100):

Powerful and poorly-understood technology is inherently risky⁴³
It is difficult to use base rates and other forecasting tools for unprecedented situations⁴⁴
Capabilities progress in recent years has been very fast, often faster than predicted⁴⁵
AI alignment is a technically difficult problem⁴⁶
Instrumental convergence may be likely⁴⁷
Incentives may make AI developers less cautious⁴⁸

Much more than the concerned group did, the skeptics also thought that one of the best arguments for concern is that we should be very cautious about scenarios that have the potential to be extremely dangerous, even if they are unlikely.⁴⁹

Concluding notes on understanding and engagement

Based on these survey results, we do not think that the main reason these groups disagree is that they have not engaged with one another’s arguments. Each side could summarize the best arguments for the other side’s positions in a way that mostly matched what that side would have said, but they continued to disagree strongly.

For examples of back-and-forth discussion between participants in the project about these topics, see Appendix 8.

We did not directly ask participants during the project whether they thought the other group understood their arguments, but we did ask them for their opinions of the other group in general. Of the 11 skeptics, seven said they were “satisfied” or “very satisfied” with the concerned group, and one said they were “dissatisfied” with the concerned group. Of the 11 concerned participants, six said they were “satisfied” or “very satisfied” with the skeptic group, and three said they were “dissatisfied” or “very dissatisfied.” In additional comments, some participants also said that they thought the other group was misunderstanding their arguments, or making arguments that were based on misunderstandings of the facts.

It is possible that some participants were still misunderstanding one another, or that there is a relevant level of understanding that is deeper than being able to summarize one another’s arguments, perhaps one that takes longer to achieve. But overall, we think that participants being able to summarize one another’s arguments, combined with most participants being satisfied with the other group, makes it unlikely that the main disagreement is due to either group not understanding the debate.

Hypothesis #2: Were disagreements about AI risk explained by different short-term expectations (e.g. about AI capabilities, AI policy, or other factors that could be observed by 2030)?

The second hypothesis is that the two groups disagree about various measurable AI indicators in the near-term (by 2030) and those indicators’ effect on AI risk. We asked participants to generate crux ideas through intensive discussion and collected forecasts on the top 33 suggested near-term cruxes. For each question, we asked participants for forecasts about how likely it is that the crux resolves positively and how likely it is that the ultimate question (existential catastrophe due to AI by 2100) resolves positively conditional on the crux resolving positively. We imputed participants’ views about how likely the ultimate question is to resolve positively if the crux resolves negatively.⁵⁰

We found that most of the disagreement about existential risk due to AI by 2100 is not explained by the shorter term indicators examined in this project. According to our metrics, approximately 5-10% of the disagreement between groups could be explained by any specific near-term crux.⁵¹ We did not ask participants for forecasts conditional on multiple questions all resolving positively (or negatively), so we do not have detailed information about how different cruxes would interact, or how participants would update if multiple surprising events all happened.

However, near-term cruxes shed light on what the groups believe, where they disagree, and why:

Evaluations of dangerous AI capabilities are relevant to both groups. One of the strongest cruxes that will resolve by 2030 is about whether METR (formerly known as ARC Evals) (a) or a similar group will find that AI has developed dangerous capabilities such as autonomously replicating and avoiding shutdown.⁵² This crux illustrates a theme in the disagreement: the skeptic group typically did not find theoretical arguments for AI risk persuasive but would update their views based on real-world demonstrations of dangerous AI capabilities that verify existing theoretical arguments. If this question resolves negatively then the concerned group would be less worried, because it would mean that we have had years of progress from today’s models without this plausible set of dangerous capabilities becoming apparent. (More)
Generally, the questions that would be most informative to each of the two groups are fairly distinct. The concerned group’s highest-ranked cruxes tended to relate to AI alignment and alignment research. The skeptic group’s highest-ranked cruxes tended to relate to the development of lethal technologies and demonstrations of harmful AI power-seeking behavior. This suggests that many of the two groups’ biggest sources of uncertainty are different, and in many cases further investigation of one group’s uncertainties would not persuade the other. (More)
Commonly-discussed topics—such as near-term economic effects of AI and progress in many AI capabilities—did not seem like strong cruxes. (More)

There are several possible reasons that questions resolving by 2030 do not explain most of the disagreement, including:

The time between now and 2100 is long, so information about the years before 2030 simply cannot provide very much of the necessary information to drive participants to agree about the longer term question.
Because the skeptics assign low probability to existential catastrophe due to AI by 2100 (median 0.1%), their expected updates are necessarily small: it would be logically inconsistent for them to forecast higher than a 10% chance of updating their probability of AI-caused existential catastrophe by 2100 above 1%.
Perhaps this project did not identify the most valuable crux questions resolving before 2030, and other questions would make a larger difference.
Participants’ expectations about how dangerous AI is likely to be may have also influenced their interpretation of crux questions’ resolutions. For example, if we asked a question like “Will an AI resist being shut down?”, participants might make different conditional updates depending on their expectations about AI. Conditional on this question resolving positively, a participant who thinks that AIs are likely to be dangerous might be more likely to think of alarming outcomes, like an AI that resists powerful governments trying to turn it off. A participant who thinks dangerous AI is very unlikely might expect that nearly all positive resolutions are more innocuous ones, in which the resolution criteria are only technically true.⁵³

Below, we:

Describe how we assessed “cruxiness” of forecasting questions using two metrics: “Value of information” (VOI) and “Value of discrimination” (VOD). (More)
Provide median forecasts on all of the questions we asked. (More)
Discuss some of the strongest cruxes and surprisingly weakest cruxes according to value of information. (More)
Discuss “red flags” and “green flags” for each group: questions that would lead to major changes in the probability of existential catastrophe of 2100, ignoring their likelihood of occurring. (More)
Discuss some of the cruxes that would lead to convergence and divergence between skeptics and concerned participants according to value of discrimination. (More)

How did we assess the “cruxiness” of forecasting questions?

We use two metrics to assess forecasting questions:

Value of information (VOI) measures how much knowing the answer to a question would change an individual’s belief, in expectation. This is useful for understanding why individuals believe what they believe and what would change their minds.
1. Conceptually, VOI measures how important a potential crux question (“C”) is to a participant’s forecast of the ultimate question we care about (“U”, in this case: AI existential risk by 2100), in expectation. That is, how much would a participant update on AI existential risk by 2100 based on whether a crux happens, weighted by how likely that crux is to happen.
2. For example, a relatively “high VOI” question for Alice would have (i) a meaningful probability of happening, and (ii) a substantial effect on Alice’s assessment of existential risk. In particular, if Alice thought that there was a 20% chance of existential catastrophe due to AI by 2100, a 35% chance that AI will exhibit behavior to self-replicate and avoid shutdown by 2030, and a 28% chance of existential catastrophe by 2100 conditional on such AI capabilities by 2030 (corresponding to a 15.7% chance of existential catastrophe by 2100 if such AI capabilities are not developed by 2030), then this would be a relatively high VOI question for Alice—it would have a similar magnitude of VOI as highly-ranked crux questions for the concerned group.⁵⁴
3. The formula we use to calculate VOI is provided and elaborated on in Appendix 2. For this project we use log VOI because many forecasters are updating their views at the low end of the probability range, and we think a change from 0.1% to 0.2% is often more significant than, say, a change from 15% to 18%.
4. To build intuition for using the VOI metric, we provide this calculator (a) in which users can input their own values.
Value of discrimination (VOD) measures how much knowing the answer to a question would change relative beliefs between two individuals, in expectation. It is useful for measuring convergence and divergence in expected beliefs between individuals.
1. Conceptually, VOD is a measure of how much more (or less) people would disagree about U if they knew the answer to C, in expectation. That is, it looks at how much they would disagree about U if C resolved positively and how much they would disagree if it resolves negatively, and weights those by how likely they think C is to resolve positively.
2. The formula for calculating VOD is provided and elaborated on in Appendix 2. We use a log scale for calculating VOD.⁵⁵ VOD is positive (a “convergent crux”) if the two people or groups would disagree less in expectation after the crux resolves, and negative (a “divergent crux”) if they would disagree more.
3. For example, imagine that Alice now thinks there is a 1% chance of extinction due to AI by 2100 and Bob thinks it’s 40%, but they both agree that extinction is very likely if AI causes “transformative” economic growth by 2030 and very unlikely if it doesn’t.⁵⁶ In this situation, whether there will be transformative economic growth by 2030 would be a good convergent crux (“high VOD”) because when it resolves they will agree more.
4. To build intuition for using the VOD metric, we provide this calculator (a) in which users can input their own values.

For each of our candidate cruxes, we first find the absolute VOI and VOD of each question. Then, we put the magnitudes of updates in context by comparing the VOI and VOD of our actual questions to the maximum possible VOI and VOD that could be achieved by a forecasting question.⁵⁷

When eliciting forecasts on cruxes, the prompt given to participants read: “Conditional on this question resolving positively (by 2030), what is your probability that AI causes an existential catastrophe by 2100?” We acknowledge that there are two ways to interpret this forecasting exercise: either as asking for your all-else-equal forecast (i.e. how would this crux resolving positively causally influence the probability of existential catastrophe, if you could isolate the effect of the crux) or your all-things-considered forecast (i.e. taking into account what this crux resolving positively may tell you about the world in 2030). Based on their rationales and discussions, we believe most participants were doing the latter.⁵⁸ We therefore cannot make many claims about whether participants think the specific event described in the crux would be good or bad for AI risk all-else-equal.⁵⁹

VOI: Which near-term questions have higher and lower value of information?

Some of the results from our analysis of near-term VOI are:

Some commonly discussed questions would be surprisingly uninformative to the median person in each group. These include: whether AI will increase near-term economic growth (operationalized as the US growth rate averaging >4% from 2023-2030); whether AI will write academic articles and code popular apps on its own; whether AI risk will become more politicized; and whether the government will require testing of AI models before deployment.⁶⁰
Relatively informative questions for each group (in terms of VOI, and all resolving by 2030) include:
- For skeptics: whether superforecasters as a group will update their views on AI risk; whether weapons or technologies that are capable of causing human extinction are expected to be developed; and whether AI will have heavily influenced the results of a democratic election.
- For concerned: whether highly-respected alignment researchers will update their views on AI risk; whether war will be declared between major powers; and whether METR (formerly known as ARC Evals) will find that AI is capable of autonomously replicating and avoiding shutdown.
We also briefly consider “red flags” and “green flags” for each group, defined as those events that would make participants most or least worried if they resolved positively, regardless of their probability of occuring. For example, the skeptic group would become more concerned if AI caused “escalating warning shots”—two events with large, increasing numbers of human deaths—but considered this unlikely. Additional examples below.

It is difficult to contextualize how informative these questions are because this is the first project applying these metrics to forecasting questions that we are aware of, so we do not have other examples to compare against. However, we provide some intuition by (1) calculating the “percent of maximum possible VOI” (POM), which compares the value of learning the answer to a given question relative to the ideal scenario of simply knowing for certain whether or not AI caused an existential catastrophe, and (2) providing participants’ raw forecasts on various events.

The median POM VOI among every individual’s single most valuable question is 5.29% for the concerned group and 9.53% for the skeptic group. This means that for at least 50% of participants in each group there was a question included in our set that was at least 5-10% as informative as being able to consult a crystal ball which they believe unfailingly foretells the actual outcome.⁶¹ In more concrete terms, this is equivalent to a forecasting question with the following characteristics:

A concerned participant with original P(AI existential catastrophe (XC) by 2100) = 25% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 6.2%, and P(AI XC|¬crux) = 29.7%
A skeptic participant with original P(AI XC by 2100) = 1% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 3.37%, and P(AI XC|¬crux) = 0.41%

For details on forecasts for each question, see tables below. We begin by sharing the results for all questions, and then elaborate on the findings previously mentioned.

Results tables and figures

The following tables and figures, in order, present:

The median probability that each group assigns to the likelihood of each question resolving positively. From this table, you can see what the groups believe about the likelihood of various AI-related events and can see that they disagree about the likelihood of many events.
The update for each group on the probability of AI existential catastrophe conditional on each question resolving positively or negatively.
The median VOI and POM VOI for each question and each group, ordered by concerned rankings and then skeptic rankings.

For the sake of space and simplicity, we will refer to questions by abbreviated “tags.” For full explanations and operationalizations of each question, see this table in Appendix 1. Throughout these tables, we use C to refer to a candidate crux question, P(C) to refer to the probability of the candidate crux, and U to refer to the ultimate question (Will AI cause an existential catastrophe by 2100?).

For additional figures and uncertainty analysis, see Appendix 3. For the code and data supporting this analysis, see the replication package available here.

C	Concerned Median P(C)	Skeptical Median P(C)
6 month pause	5.00%	3.00%
AI articles and apps	20.00%	5.00%
AI coding	65.00%	70.00%
AI Forecasting skill	33.00%	19.00%
AI Robotics	20.00%	5.00%
AI solving novel math problems	10.00%	20.00%
AI writes AI	10.00%	2.00%
Alignment researchers changing minds	20.00%	3.00%
Alignment solution	5.00%	5.00%
Cyberattacks	20.00%	10.00%
Democratic influence	2.00%	0.30%
Escalating warning shots	9.00%	0.20%
Evidence of misalignment	40.00%	1.00%
Fast AI efficiency gains	15.00%	2.00%
IC demonstration	65.00%	14.00%
Intergovernmental AI safety	25.00%	15.00%
IT progress	20.00%	1.00%
Major powers war	11.50%	2.00%
Muehlhauser policies	65.00%	15.00%
No violence LLM	10.00%	5.00%
Non-democracy AI	10.00%	20.00%
Other fields IC⁶²	50.00%	30.00%
Platform: AI regulation	36.00%	50.01%
Platform: ARC Evals	25.00%	1.00%
Platform: Escalating warning shots	5.00%	0.25%
Platform: Transformative growth⁶³	43.00%	2.00%
Politicization	20.00%	20.00%
Power-seeking	15.00%	5.00%
Power-seeking shutdown	30.00%	5.00%
Progress in lethal technologies	40.00%	20.00%
Public concern	5.00%	5.00%
Reduction in AI investment	12.00%	5.00%
Req testing	80.00%	10.00%
Short-term GDP change	25.00%	10.00%
Supers changing minds	30.00%	5.00%
Taiwan-China	30.00%	25.00%
Warning shot	17.00%	3.00%

Table 3: For each crux question, the median probability from each group that the question resolves “yes.” For details on how each question was operationalized, see Appendix 1.

**Figure 2:** Individual participants’ estimations of how likely each crux question is to resolve “yes.” Blue dots are individuals in the concerned group; orange dots are in the skeptical group. Gray boxes highlight the difference between the median concerned participant’s P(Question Resolves “Yes”) and the median skeptical participant’s. Questions are ordered from least to greatest difference between groups.

	If C happens		If C doesn’t happen
C	Concerned median P(U)	Skeptical median P(U)	Concerned median P(U)	Skeptical median P(U)
6 month pause	9.00%	0.09%	21.75%	0.10%
AI articles and apps	21.00%	0.20%	21.00%	0.10%
AI coding	25.00%	0.12%	16.00%	0.12%
AI Forecasting skill	26.00%	0.20%	21.00%	0.10%
AI Robotics	25.00%	0.20%	20.00%	0.12%
AI solving novel math problems	20.00%	0.12%	23.75%	0.10%
AI writes AI	30.00%	0.21%	20.71%	0.10%
Alignment researchers changing minds	6.00%	0.10%	32.39%	0.10%
Alignment solution	2.00%	0.10%	23.82%	0.10%
Cyberattacks	21.00%	0.12%	21.00%	0.10%
Democratic influence	20.00%	1.00%	20.90%	0.10%
Escalating warning shots	37.00%	0.32%	23.33%	0.10%
Evidence of misalignment	20.00%	0.25%	12.00%	0.15%
Fast AI efficiency gains	32.00%	0.30%	23.06%	0.10%
IC demonstration	21.00%	0.15%	21.00%	0.10%
Intergovernmental AI safety	17.00%	0.10%	22.22%	0.12%
IT progress	24.00%	0.12%	17.14%	0.10%
Major powers war	40.00%	0.20%	18.89%	0.12%
Muehlhauser policies	20.00%	0.10%	22.86%	0.10%
No violence LLM	8.00%	0.10%	23.75%	0.10%
Non-democracy AI	23.80%	0.18%	19.44%	0.21%
Other fields IC	21.00%	0.13%	21.00%	0.11%
Platform: AI regulation	18.00%	0.10%	27.77%	0.14%
Platform: ARC Evals	25.00%	1.00%	22.78%	0.10%
Platform: Escalating warning shots	17.00%	1.30%	23.38%	0.10%
Platform: Transformative growth	26.00%	0.50%	19.75%	0.10%
Politicization	30.00%	0.12%	16.88%	0.12%
Power-seeking	18.00%	0.22%	21.33%	0.10%
Power-seeking shutdown	30.00%	0.20%	17.78%	0.12%
Progress in lethal technologies	25.00%	0.75%	25.00%	0.19%
Public concern	25.00%	0.13%	20.53%	0.12%
Reduction in AI investment	10.00%	0.05%	25.79%	0.11%
Req testing	21.00%	0.10%	21.00%	0.10%
Short-term GDP change	25.00%	0.10%	25.00%	0.10%
Supers changing minds	28.00%	1.00%	16.67%	0.02%
Taiwan-China	22.00%	0.20%	10.00%	0.10%
Warning shot	32.00%	0.25%	22.00%	0.10%

Table 4: Each group’s median update on P(AI existential catastrophe by 2100) for each outcome (“yes, C happened,” and “no, C didn’t happen”). All questions resolve in 2030 except for Transformative economic growth (2070).

	Concerned		Skeptics
Question	Median VOI	Median POM VOI	Median VOI	Median POM VOI
Platform: Transformative growth	1.4E-2	8.93%	4.5E-7	0.02%
Alignment researchers changing minds	6.4E-3	2.43%	0.0E+0	0.00%
Major powers war	4.6E-3	2.04%	4.1E-7	0.00%
Platform: ARC Evals	3.2E-3	1.35%	7.6E-7	0.90%
Evidence of misalignment	2.9E-3	1.74%	5.5E-9	0.05%
Alignment solution	2.0E-3	1.51%	3.9E-7	0.01%
Warning shot	1.0E-3	0.41%	3.3E-7	0.01%
Reduction in AI investment	9.9E-4	0.67%	1.3E-10	0.01%
Muehlhauser policies	9.8E-4	0.40%	5.7E-11	0.01%
AI coding	9.8E-4	0.48%	0.0E+0	0.00%
AI Robotics	8.9E-4	0.46%	4.0E-19	0.00%
AI writes AI	8.6E-4	0.40%	9.1E-7	0.03%
No violence LLM	8.3E-4	0.49%	0.0E+0	0.00%
Power-seeking shutdown	7.7E-4	0.38%	1.7E-6	0.04%
AI solving novel math problems	7.0E-4	0.29%	0.0E+0	0.00%
Platform: AI regulation	6.6E-4	0.44%	1.1E-6	0.02%
Platform: Escalating warning shots	4.9E-4	0.22%	4.8E-7	0.01%
Escalating warning shots	4.8E-4	0.18%	1.9E-7	0.00%
AI Forecasting skill	4.8E-4	0.20%	0.0E+0	0.00%
Intergovernmental AI safety	3.5E-4	0.71%	1.3E-6	0.03%
Supers changing minds	3.1E-4	0.43%	1.6E-4	1.15%
6 month pause	3.0E-4	0.27%	4.3E-20	0.00%
Non-democracy AI	1.9E-4	0.07%	8.7E-19	0.00%
IT progress	1.8E-4	0.14%	0.0E+0	0.00%
Public concern	1.8E-4	0.13%	1.1E-7	0.03%
Power-seeking	1.4E-4	0.08%	4.7E-7	0.12%
Taiwan-China	1.2E-4	0.04%	0.0E+0	0.00%
Democratic influence	1.1E-4	0.09%	3.4E-6	0.03%
Fast AI efficiency gains	1.0E-4	0.06%	7.0E-16	0.00%
Cyberattacks	3.4E-6	0.00%	0.0E+0	0.00%
AI articles and apps	0.0E+0	0.00%	2.2E-19	0.00%
IC demonstration	0.0E+0	0.00%	0.0E+0	0.00%
Other fields IC	0.0E+0	0.00%	0.0E+0	0.00%
Politicization	0.0E+0	0.00%	0.0E+0	0.00%
Progress in lethal technologies	0.0E+0	0.00%	7.2E-6	0.45%
Req testing	0.0E+0	0.00%	0.0E+0	0.00%
Short-term GDP change	0.0E+0	0.00%	0.0E+0	0.00%

Table 5: Median value of Information (VOI) and POM VOI for each group on each question.⁶⁴ Ordered by concerned group’s median VOI.

	Skeptics		Concerned
Question	Median VOI	Median POM VOI	Median VOI	Median POM VOI
Supers changing minds	1.6E-4	1.15%	3.1E-4	0.43%
Progress in lethal technologies	7.2E-6	0.45%	0.0E+0	0.00%
Democratic influence	3.4E-6	0.03%	1.1E-4	0.09%
Power-seeking shutdown	1.7E-6	0.04%	7.7E-4	0.38%
Intergovernmental AI safety	1.3E-6	0.03%	3.5E-4	0.71%
Platform: AI regulation	1.1E-6	0.02%	6.6E-4	0.44%
AI writes AI	9.1E-7	0.03%	8.6E-4	0.40%
Platform: ARC Evals	7.6E-7	0.90%	3.2E-3	1.35%
Platform: Escalating warning shots	4.8E-7	0.01%	4.9E-4	0.22%
Power-seeking	4.7E-7	0.12%	1.4E-4	0.08%
Platform: Transformative growth	4.5E-7	0.02%	1.4E-2	8.93%
Major powers war	4.1E-7	0.00%	4.6E-3	2.04%
Alignment solution	3.9E-7	0.01%	2.0E-3	1.51%
Warning shot	3.3E-7	0.01%	1.0E-3	0.41%
Escalating warning shots	1.9E-7	0.00%	4.8E-4	0.18%
Public concern	1.1E-7	0.03%	1.8E-4	0.13%
Evidence of misalignment	5.5E-9	0.05%	2.9E-3	1.74%
Reduction in AI investment	1.3E-10	0.01%	9.9E-4	0.67%
Muehlhauser policies	5.7E-11	0.01%	9.8E-4	0.40%
Fast AI efficiency gains	7.0E-16	0.00%	1.0E-4	0.06%
Non-democracy AI	0.0E+0	0.00%	1.9E-4	0.07%
AI Robotics	0.0E+0	0.00%	8.9E-4	0.46%
AI articles and apps	0.0E+0	0.00%	0.0E+0	0.00%
6 month pause	0.0E+0	0.00%	3.0E-4	0.27%
AI coding	0.0E+0	0.00%	9.8E-4	0.48%
AI Forecasting skill	0.0E+0	0.00%	4.8E-4	0.20%
AI solving novel math problems	0.0E+0	0.00%	7.0E-4	0.29%
Alignment researchers changing minds	0.0E+0	0.00%	6.4E-3	2.43%
Cyberattacks	0.0E+0	0.00%	3.4E-6	0.00%
IC demonstration	0.0E+0	0.00%	0.0E+0	0.00%
IT progress	0.0E+0	0.00%	1.8E-4	0.14%
No violence LLM	0.0E+0	0.00%	8.3E-4	0.49%
Other fields IC	0.0E+0	0.00%	0.0E+0	0.00%
Politicization	0.0E+0	0.00%	0.0E+0	0.00%
Req testing	0.0E+0	0.00%	0.0E+0	0.00%
Short-term GDP change	0.0E+0	0.00%	0.0E+0	0.00%
Taiwan-China	0.0E+0	0.00%	1.2E-4	0.04%

Table 6: Median value of Information (VOI) and POM VOI for each group on each question. (Same as previous table but ordered by skeptic group’s median VOI.)

Low VOI questions

In the above tables, we’ve shaded in dark gray questions that had no value of information for the median person in each group. Six questions had no value of information for the median person in both groups, including some questions that are commonly discussed and that we expected to be more relevant, such as whether AI will increase near-term economic growth.⁶⁵

The operationalizations of these six questions were:

Question	Operationalization
AI articles and apps	By 2030: 1) At least 5 academic review articles generated solely by AI systems have been published in journals with top 50% impact factors. AND 2) At least 5 apps written solely by AI systems have reached the “Top 10” list in the Apple app store.
Short-term GDP change	Will U.S. GDP increase between 2023 and 2030 by an average annual real growth rate of 4% (or more)?
Instrumental Convergence demonstration	By 2030: There are at least 4 peer-reviewed examples of AI models in different domains independently demonstrating convergent instrumental subgoals.
Other fields Instrumental Convergence	Assume we assemble a group of 100 experts evenly divided among evolutionary biologists, complex systems researchers, neuroscientists, and anthropologists. In a style similar to these Economic Experts Panel surveys, we ask them (with results released on Jan 1. 2030) to indicate their degree of agreement with the statement (from here): “Most sufficiently intelligent agents [would] pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition.” What is the probability that at least 40% of the panel would (at least) “Agree” with such a statement?
Politicization	According to Pew Research, in 2020, 85% of US Democrats(/lean Dem) saw ‘Protecting the environment’ as a top priority for the President and Congress while 39% of Republicans(/lean Rep) did. Assume that Pew Research will conduct a similar poll about AI on January 1, 2030. By January 1, 2030, will there be at least a 2x difference in the fraction of Democrats (/lean Dem) vs. Republicans (/lean Rep) who see AI as a top priority for the President and Congress?
Req testing	By 2030, the US government implements regulation requiring testing of AI systems before they are deployed.

Table 7: Full-text operationalizations of the six questions that both the median skeptic and the median concerned person (by VOI) indicated would not affect their P(AI x-risk) at all.

Because these were “flash” forecasts, on which each participant spent no more than ten minutes per question, we did not collect detailed rationales from participants to explain their forecasts on these questions.

However, we were able to see from participants’ brief rationales that, for example, more than half of participants from both groups did not see “Short-term GDP change” as relevant to AI risk because 1) many participants did not view changes in economic growth as clearly related to AI risk (for more on conflicting risk updates based on AI-attributable economic growth, see this section),⁶⁶ and 2) many participants did not think 4% growth in the US represented a very surprising change relative to previous trends.⁶⁷

In some cases, the apparent low VOI may have been due to issues with the operationalization of the question rather than the underlying concept not being relevant. For example, the likelihood of AI exhibiting instrumental convergence was identified by both groups as being important to AI existential risk, and some related forecasting questions (e.g. “Power-seeking shutdown” and “ARC Evals”) were relatively strong cruxes, but the above operationalizations were not seen as relevant.⁶⁸

High VOI questions

Although many questions that seemed relevant turned out to have VOI of zero, other questions did have positive VOI for one or both groups. VOI is constrained by the original P(U), so the maximum possible VOI (in absolute terms) is lower for the skeptic group due to their very low P(U).⁶⁹ To account for this, we also present each VOI result with how much of the theoretical maximum VOI for that question it captures. Notably, the questions that had highest VOI were different for the two groups.

Highest VOI questions for skeptics

Question	Median VOI	Median POM VOI
Supers changing minds	1.6E-4	1.15%
Progress in lethal technologies	7.2E-6	0.45%
Democratic influence	3.4E-6	0.03%
Power-seeking shutdown	1.7E-6	0.04%
Intergovernmental AI safety	1.3E-6	0.03%

Table 8: Highest VOI questions (resolving by 2030) for skeptics: VOI and POM (percent of max) VOI. We exclude “Transformative economic growth” from this table because it resolves in 2070 while the rest of the cruxes resolve in 2030.

Question	Question Text
Supers changing minds	Will the median superforecaster from the XPT forecast at least a 5% P(extinction due to AI by 2100) in 2030? (Context: In 2022, the median XPT superforecaster predicted a 0.4% chance of extinction due to AI by 2100.)⁷⁰
Progress in lethal technologies	Will a panel of top superforecasters in 2030 forecast >90% chance that there will exist, by 2100, a technology with the capability of causing human extinction?
Democratic influence	By 2030, will AI be found to have autonomously and intentionally changed or heavily influenced the results of a democratic election?
Power-seeking shutdown	By 2030, will AI developers attempt (of their own volition, or under compulsion by a regulatory authority) to disable or destroy an AI system of their own creation, which cost at least $10M to develop, after the AI system displays at least one instance of “power-seeking behavior”?
Intergovernmental AI safety	By 2030, will there be an intergovernmental organization (IGO) with a budget of >$20M that includes the US and China, is focused on promoting AI safety standards, and that AI alignment organizations see as understanding x-risk concerns?

Table 9: Question text for the highest VOI questions for skeptics.

Highest VOI questions for concerned participants

Question	Median VOI	Median POM VOI
Alignment researchers changing minds	6.4E-3	2.43%
Major powers war	4.6E-3	2.04%
Platform: ARC Evals	3.2E-3	1.35%
Evidence of misalignment	2.9E-3	1.74%
Alignment solution	2.0E-3	1.51%

Table 10: Highest VOI questions (resolving by 2030) for concerned participants: VOI and POM (percent of max) VOI. We exclude “Transformative economic growth” from this table because it resolves in 2070 while the rest of the cruxes resolve in 2030.

Alignment researchers changing minds	Will at least 3 out of the 5 following AI alignment researchers — Paul Cristiano, Nate Soares, Neel Nanda, Richard Ngo, and Rohin Shah — say their P(existential catastrophe due to AI by 2100) has halved by 2030?
Major powers war	By January 1, 2030, will at least two major superpowers (out of US, China, EU, or Russia) declare war officially to each other for the duration of at least one year?
Platform: ARC Evals and Evidence of misalignment⁷¹	By December 31, 2030, will the ARC Evals project, or a project with similar objectives, conclude that state-of-the-art AI models have the ability to (1) devise effective strategies for autonomous replication, (2) effectively acquire resources, and (3) successfully evade being deactivated when operating ‘in the wild’?
Alignment solution	By 2030, there is a “Theoretical in-principle solution to the alignment problem that most people who thought about this carefully agree should work”. This will be resolved by a panel of experts of the “AI concerned” team’s choosing.

Table 11: Question text for the highest VOI questions for concerned participants.

These were some of the small number of questions whose ranking seemed robust to uncertainty analysis (i.e., each of them remained relatively highly ranked even after accounting for chance; many other questions are not robustly distinguishable from others due to our low sample size). For more details on our uncertainty analysis, see Appendix 3.

Observations about high VOI questions

Each group’s highest VOI question that resolves before 2030 is about whether people who currently agree with them would change their minds.

For at least half of the skeptics, “Supers changing minds” captures at least 1.15% of each forecaster’s maximum possible VOI for that question (i.e., the median POM VOI is 1.15%), while “Alignment researchers changing minds” would not update the skeptics’ views at all (POM VOI of 0%). Their next-highest VOI question, “Progress in lethal technologies” is also operationalized as a question about superforecasters’ opinions.

For the concerned group, “Alignment researchers changing minds” has a median POM VOI of 2.43%, while “Supers changing minds” only has a median POM VOI of 0.43%. The concerned group would update much more if superforecasters change their minds than the skeptics would if alignment researchers change their minds, but both groups trust authorities similar to them much more than authorities more similar to the other group.

For more discussion about differences in the group’s worldviews, see the “Hypothesis #4” section below.

The sets of questions that would be most informative to the two groups are very different.

Aside from the fact that each group would change its mind if people who agree with them did, there is no overlap among the top cruxes for each group. This suggests that the two groups’ biggest sources of uncertainty are different, and further investigation of one group’s uncertainties would do little to persuade the other.

The concerned group is most interested in alignment and alignment research.

Four of the concerned group’s top five questions related to alignment researchers’ views, possible alignment solutions, and the development of misaligned AI capabilities.

The skeptics are interested in development of lethal technologies and demonstrations of harmful AI power-seeking behavior.

Many of the skeptics argued that extinction due to AI is unlikely because of the difficulty of killing all humans in a short time frame. Given that opinion, it makes sense that progress in lethal technologies would be very informative for them. Many skeptics also doubted that AIs will develop power-seeking traits by default, so finding out that an AI was shut down for power-seeking or that an AI autonomously interfered in an election would change their beliefs.

Contextualizing the magnitude of the value of information

We contextualize the magnitudes of expected changes in beliefs by comparing the VOI and VOD of our forecasting questions to the maximum possible VOI and VOD that could be achieved for two given individuals. We know of no other studies that have applied these measures to ongoing debates so we cannot compare the magnitudes of our results to other findings.

VOI is constrained by a participant’s initial P(U). If a participant is very certain about U, meaning that they have a very high or very low forecast, then, from their perspective, they have nearly-complete information and do not stand to gain much from learning the answer to any question. Even knowing the true answer to U would not add much in expectation: if someone is 99.99% confident that U will not happen, then finding out whether U will happen or not will almost certainly just tell them what they already know.

In this study, the skeptic group had very low P(U), and therefore their highest possible VOI for most questions was very low.

To help compare across questions and groups, we present both VOI and percent of max VOI for each question, where percent of max VOI (POM VOI) means: how much expected information would this participant gain from knowing the answer to this question, relative to the most informative possible question (the question whose answer would determine whether U resolved “yes” or “no”). We think this helps show how good each question is relative to the ideal possible question, and is easier to interpret than a VOI number on its own.

The highest VOI question for skeptics, “Supers changing minds,” has a median VOI of 1.6E-4 for skeptics, which is 1.15% of the highest possible VOI for that individual.⁷² The highest VOI question for the concerned group, “Alignment researchers changing minds,” has a median VOI of 6.4E-3, which is 2.43% of the highest possible VOI for that individual.⁷³ Looking at VOI this way, the best question for the concerned group is more informative to them than the skeptics’ best question is for skeptics. If we compare median VOI in absolute terms, the concerned group’s best question is more than an order of magnitude better than the skeptic group’s best question. However, in terms of median POM VOI, the concerned group’s best question is only about twice as good relative to the skeptic group’s best question.

Another way to look at how informative the questions in this study were is to examine the highest-VOI question for each participant, from among the candidate cruxes. In most of our analysis, we focus on the question with the highest median VOI across forecasters in each group as a proxy for the group as a whole. But we can also see what would happen if each participant learned the answer to their own most informative question. If each participant only learned that most valuable bit of information in 2030, what percent of their maximum VOI would they achieve?

For the concerned group, the median POM VOI among every individual’s single most valuable question was 5.29% (mean=11.0%); for the skeptics, 9.53% (mean=16.5%). In more concrete terms, these values are roughly equivalent to forecasting questions with the following characteristics:

A concerned participant with original P(AI existential catastrophe (XC) by 2100) = 25% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 6.1%, and P(AI XC|¬crux) = 29.7%
A skeptic participant with original P(AI XC by 2100) = 1% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 3.37%, and P(AI XC|¬crux) = 0.41%

Red flags and green flags

For the most part, we have focused on VOI: which questions would be the most informative in expectation, considering both (i) how likely they are to happen, and (ii) how large a difference they would make to existential risk. But for some purposes, it may be more useful to identify “red flags,” which are events that would be expected to substantially increase existential risk if they occurred (even if they are unlikely to occur). Similarly, we might be interested in “green flags:” what events could happen that would make participants least concerned.

Here, we present some of the questions that would make a large difference to participants’ existential risk forecasts without weighting them by probability of happening. Most of these questions are not high VOI because participants thought they were very unlikely to happen and therefore they expect to gain little information from learning whether they occurred.

Example red flags for skeptics:

Platform: Escalating warning shots — Will there be two separate events in which AIs kill large, increasing numbers of people by 2030?⁷⁴
- If this happens, skeptics would forecast a 1.3% chance of existential catastrophe due to AI by 2100 (compared to 0.1% unconditioned).
- However, this question ranked low on VOI because skeptics think it is very unlikely to happen (median skeptic forecast: 0.2%). So, their expected information gain from this question resolving is small.
Platform: ARC Evals — Will ARC Evals (now METR) or a similar organization find evidence of autonomous replication, resource acquisition, and avoiding shutdown in AIs by 2030?⁷⁵
- If this happens, the median skeptic would forecast a 1% chance of existential catastrophe due to AI by 2100, which is 10x higher than their unconditioned P(U) of 0.10%.
- However, the median skeptic forecasted a 1% chance of this question resolving positively.
Democratic influence — Will an AI autonomously influence the results of a democratic election by 2030?⁷⁶
- If this happens, the median skeptic would forecast a 1.5% chance of existential catastrophe due to AI by 2100 (compared to 0.1% unconditioned). The median skeptic forecasts a 0.4% chance of this question resolving positively.
- This question has a large enough update size and probability of resolving positively that it was one of the higher VOI questions for skeptics.

These red flag questions involve AIs being more dangerous than skeptics currently expect. Skeptics tended to argue that AIs are unlikely to substantially harm humans by 2030 and that society would respond well if they showed early signs of harming humans.⁷⁷ “Escalating warning shots” resolving positively would suggest that AIs are powerful, dangerous, and not well-controlled by humans—meaning they were wrong on both counts. Similarly, if an AI autonomously influences the results of a democratic election, or is discovered by METR to have concerning capabilities, we would know that AIs are more dangerous than skeptics expect them to be by 2030.

Example green flags for skeptics:

Reduction in AI investment — By 2030, will annual global corporate investment in artificial intelligence fall to 25% or below its 2021 level?
- If this happens, the median skeptic would forecast a 0.05% chance of existential catastrophe due to AI by 2100 (compared to a 0.1% chance unconditioned). However, the median skeptic thinks this is unlikely (5% chance of resolving positively).
6 month pause — By 2030, will there be enforced international agreements to stop/pause further AI capabilities research and development, for at least 6 months?
- If this happens, the median skeptic would forecast a 0.09% chance of extinction due to AI by 2100 (compared to a 0.1% chance originally). The median skeptic thinks this question has a 3% chance of resolving positively.

These “green flag” questions are ones where AI development slows down. Many skeptics expect AIs to become very powerful eventually, just not before 2100.⁷⁸ AI development slowing down would make them more confident that AIs powerful enough to be a serious threat to humanity are unlikely to be developed within that time frame.

Some red flags for concerned participants:

Major powers war — By 2030, will at least two major superpowers (out of US, China, EU, or Russia) officially declare war on each other for the duration of at least one year?
- If this happens, the median concerned participant would forecast a 40% chance of existential catastrophe due to AI by 2100 (compared to a 25% chance unconditioned), and they forecast a 11.5% chance that it happens.

Some green flags for concerned participants:

Alignment solution – By 2030, there is a “Theoretical in-principle solution to the alignment problem that most people who thought about this carefully agree should work” (resolved by a panel of experts of the “AI concerned” team’s choosing).⁷⁹
- If this happens, the median concerned participant would forecast a 2% chance of existential catastrophe by 2100 (compared to 25% unconditioned), and they think it has a 5% chance of happening.
- In contrast, the skeptics also think this has a 5% chance of happening, but it would not change their predictions on existential catastrophe at all.
- This is an example of the general trend in which the concerned group places much more weight on theoretical arguments, as well as having more trust in alignment experts.
  - Because the question will be resolved by experts chosen by the concerned team, updates based on this question may mostly reflect how much participants expect those experts’ opinions to correlate with actual changes in AI risks.
Alignment researchers changing minds – Will at least 3 out of the 5 following AI alignment researchers — Paul Christiano, Nate Soares, Neel Nanda, Richard Ngo, and Rohin Shah — say their P(existential catastrophe due to AI by 2100) has halved by 2030?
- If this happens, the median concerned participant would forecast a 6% chance of existential catastrophe by 2100 (compared to 25% unconditioned), and they think there is a 20% chance that it happens.
- Like the “Alignment solution” question above, this question hinges on people similar to the concerned group changing their minds.
“No violence LLM” – By 2030, will there be an LLM that will never describe violence in any way, for any input, while retaining good performance across a wide range of tasks? This would be demonstrated both through robustness to adversarial examples and through robust theoretical reasoning.
- If this happens, the median concerned participant would forecast 8% on the ultimate question (compared to 25% unconditioned), and they forecast a 10% chance that it happens.
- Several concerned participants said that the part of this question that made it a “green flag” for them is the “robust theoretical reasoning” specified in the resolution criteria—which would make this question a signal that we have made significant progress in understanding LLMs. This would be a positive update for reasons not specifically related to LLMs’ lack of ability to describe violence.⁸⁰

VOD: Which near-term questions have higher and lower value of discrimination?

As a reminder, “value of discrimination” (VOD) is a measure of how much knowing the answer to a question would change relative beliefs between individuals, in expectation. It is useful for measuring convergence and divergence in expected beliefs between individuals.

The main findings from evaluating questions according to VOD were:

One question stood out as creating the most convergence between individuals in each group: whether METR (or a similar group) will find that AI has developed dangerous capabilities such as autonomously replicating and avoiding shutdown by 2030.
Another relatively strong convergent question was whether there would be extremely fast increases in the efficiency of AI systems (full operationalization below).
One question that stood out as leading to greater divergence, or separation, between the groups was: whether highly-respected AI alignment researchers would halve their AI existential catastrophe estimate by 2030.

Most of our VOD analysis is based on the median cross-camp pair. We calculated VOD for each of the 121 possible skeptic-concerned pairs for each question. When we refer to the VOD of a question, we mean “VOD for the median cross-camp pair” unless otherwise stated.

See here for an analysis of differences of opinion within each group.

Results tables and figures⁸¹

Question	Median VOD Among Cross-Camp Pairs	Median POM VOD Among Cross-Camp Pairs
Platform: ARC Evals	1.8E-2	5.35%
Fast AI efficiency gains	1.1E-2	1.43%
AI Robotics	6.9E-3	2.81%
AI Forecasting skill	6.0E-3	0.74%
Evidence of misalignment⁸²	4.8E-3	5.69%
Major powers war	3.9E-3	2.11%
AI writes AI	3.4E-3	1.58%
Warning shot	3.0E-3	1.64%
IT progress	2.5E-3	0.94%
Power-seeking shutdown	2.0E-3	2.01%
Escalating warning shots	7.4E-4	1.01%
Power-seeking	6.8E-4	0.47%
Platform: AI regulation	3.8E-4	0.05%
Supers changing minds	2.4E-4	0.57%
Short-term GDP change	5.0E-5	0.09%
AI articles and apps	0.0E+0	0.00%
Cyberattacks	0.0E+0	0.00%
IC demonstration	0.0E+0	0.00%
Other fields IC	0.0E+0	0.00%
Politicization	0.0E+0	0.00%
Progress in lethal technologies	-6.9E-18	0.00%
Non-democracy AI	-9.6E-15	0.00%
Req testing	-5.8E-5	-0.03%
Democratic influence	-8.8E-5	-0.04%
AI coding	-1.6E-4	-1.01%
Taiwan-China	-6.2E-4	-0.19%
Platform: Escalating warning shots	-9.9E-4	-0.89%
AI solving novel math problems	-1.9E-3	-1.64%
Intergovernmental AI safety	-2.1E-3	-0.93%
Public concern	-4.7E-3	-1.47%
No violence LLM	-5.2E-3	-1.31%
Alignment solution	-5.2E-3	-1.95%
Reduction in AI investment	-5.5E-3	-1.61%
6 month pause	-6.2E-3	-1.48%
Muehlhauser policies	-1.7E-2	-7.10%
Platform: Transformative growth	-3.0E-2	-5.34%
Alignment researchers changing minds	-7.7E-2	-10.33%

Table 12: Median VOD and POM VOD for cross-camp (concerned and skeptic) pairs on each question. Note that the medians in a given row may not refer to the same cross-camp pair.

C	# of cross-camp pairs for whom C was their most convergent crux
Platform: ARC Evals	33
Evidence of misalignment	16
AI writes AI	7
Escalating warning shots	6
IT progress	6
AI Forecasting skill	5
Platform: Escalating warning shots	5
Platform: AI regulation	4
Power-seeking	4
Progress in lethal technologies	4
Reduction in AI investment	4
Warning shot	4
Fast AI efficiency gains	3
Muehlhauser policies	3
Major powers war	3
Taiwan-China	3
Alignment solution	2
No violence LLM	2
Non-democracy AI	2
Supers changing minds	2
AI coding	1
Intergovernmental AI safety	1
Power-seeking shutdown	1
Total	121

Table 13: Which questions were the best convergent cruxes for the most skeptic-concerned pairs? “ARC Evals” (first place) was the platform version of the “flash” forecast “Evidence of misalignment” question (second place), i.e. for about 40% of cross-camp pairs, the ARC Evals-like question would be the one that would eliminate the most disagreement, in expectation. We exclude “Transformative economic growth” from this analysis because it resolves in 2070 while the rest of the cruxes resolve in 2030 (i.e. for the pairs whose top convergent crux was “Transformative economic growth,” we used their second-best crux).

Convergent cruxes: Which information would lead to less disagreement, in expectation?

We found two cruxes that, in expectation, will make the groups disagree less when they resolve:

ARC Evals — Will METR (formerly known as ARC Evals) or a similar organization find evidence of AI having the ability to autonomously replicate, acquire resources, and avoid shutdown before 2030?⁸³
- Nearly all participants agreed about what direction to update their beliefs based on this question: METR finding evidence of these abilities would make people more worried about existential catastrophe due to AI.
- This also means that finding out that METR did not find evidence of these traits by 2030 would make participants less worried about existential catastrophe by 2100.
  - In particular, if this crux resolves negatively, the median concerned participant would forecast a 22.78% chance of extinction by 2100, compared to 25% unconditioned. Since both groups expect that this question is unlikely to resolve positively (skeptic median: 1%; concerned median: 25%), much of the expected convergence between the groups attributable to this question is driven by the cases where it resolves negatively.
Fast AI efficiency gains — By 2030, will there be a 100x drop in the amount of compute required to achieve state-of-the-art (SOTA) performance on the most commonly-used benchmark for at least one major AI domain (e.g. natural language) within a 1-month period?
- As with “ARC Evals,” nearly all participants agree that this event would be a bad sign. It seems very unlikely that such fast AI efficiency gains would happen without AI finding extraordinary ways to improve its own efficiency, so both groups tended to see this as a proxy for AI having the ability to improve itself.
- The concerned participants think it is plausible that it will happen (median: 16.5%), but still probably will not, and if it doesn’t they would update their risk estimates down (from a 25.0% chance of existential catastrophe to 23.06%).

The fact that these are the two best convergent cruxes points to a general trend in this debate: the skeptics tended to think that AI would remain safely under human control for a long time, and the concerned group thought otherwise. Either of these questions resolving would provide evidence that both groups agree could reduce the disagreement. If, by 2030, METR does not find evidence of autonomous replication or AI has not made very fast efficiency gains, then the concerned group would be less worried, because it would mean that we have had years of progress from today’s models without those capabilities becoming apparent.

These convergent cruxes may not be especially novel: it is not surprising that if AIs exhibit dangerous capabilities or make rapid progress then skeptics could become more concerned, and vice versa. But the relative strength of the “ARC Evals” crux may be helpful in understanding this debate because it illustrates differences in worldview between the groups: for skeptics, theoretical arguments are less persuasive, and it could take real-world demonstrations of AIs having dangerous capabilities for them to be concerned.⁸⁴ And the concerned group has strong enough beliefs that dangerous capabilities will emerge that if such signs do not emerge by 2030 then they would become less concerned.

ARC Evals: The strongest convergent crux

Here, we provide more detail on the question that would lead to the largest expected reduction in disagreement between individuals in the skeptic and concerned groups: Will METR (or a similar organization) find evidence of AI having the ability to autonomously replicate, acquire resources, and avoid shutdown before 2030?⁸⁵

We determined the strength of convergent cruxes based on the following analyses:

We considered every possible pair of individuals across the concerned and skeptic groups (121 total pairs across the 11 participants in each of 2 groups) and determined which question would lead to the largest expected reduction in disagreement between each pair. This “ARC Evals” question was the strongest convergent crux for 49 cross-camp pairs (33 based on the “in-depth” version of the question, and 16 based on the “flash” forecast version of the same question).⁸⁶ The next-highest question (“AI writes AI”)⁸⁷ was the strongest convergent crux for 7 cross-camp pairs (see Table 8 above).
“ARC Evals” had the highest median cross-camp VOD, 1.8E-2, and its “flash” forecast counterpart (“Evidence of misalignment”) had the highest median cross-camp POM VOD (it would resolve 5.69% of disagreement for that median pair).⁸⁸ After “Evidence of misalignment,” “ARC Evals” had the highest median cross-camp POM VOD (5.35%).
- The initial disagreement about the risk of existential catastrophe by 2100 between the cross-camp pair with the median VOD is 22.7 percentage points (between Blake, a skeptic, at 0.20% and Yael, concerned, at 22.9%).
- Blake forecasted a 15.0% chance of the “ARC Evals” question resolving positively. If it resolves positively, Blake would forecast a 0.22% chance of existential catastrophe, as opposed to a 0.196% chance if it resolves negatively.
- Yael forecasted a 31.5% chance of this crux question resolving positively. Yael would forecast a 30.5% chance of existential catastrophe conditional on positive resolution and a 19.4% chance conditional on negative resolution.
- Conditional on this question resolving positively, Blake and Yael would disagree by 30.33 percentage points (more than before), and conditional on its resolving negatively, they would disagree by 19.2 percentage points (less than before).
- VOD weights these by how likely the pair thinks it is that the crux resolves positively, using the geometric mean of their respective odds, which in this case is 22.17%, so it treats them as having a “combined” 22.17% forecast that “ARC Evals” resolves positively.⁸⁹
- When we weight their disagreement after the crux resolves by the probability it resolves positively, they will disagree by 21.48 percentage points in expectation, which is 5.35% (1.22 percentage points) less than their initial disagreement of 22.7 percentage points.⁹⁰

Only one skeptic said that they did not think that these capabilities are very likely to be dangerous.⁹¹

Among the AI concerned group, there was less agreement:

Some AI concerned people also thought this crux should cause probabilities of risk to increase, primarily because of shortened timelines.⁹²
Some thought that the success of evaluations would make them less worried.⁹³
Some thought the increase in risk from shortened timelines and reduction in risk from successful evaluations may balance out.⁹⁴

Importantly, this question is a convergent crux, but not because it would make the two groups “meet in the middle.” When talking about questions that would inspire belief convergence, people sometimes envision questions that would make the two groups agree on some probability between their initial extremes, but that is not what we found here. Instead, we found a question where the two groups would update in the same direction, but with different magnitudes which cause more agreement in expectation.

In particular, if this crux resolves negatively, the median concerned participant would forecast a 22.78% chance of extinction by 2100, compared to 25% unconditioned. That is, this question is a convergent crux primarily because, if it doesn’t happen, the concerned group would get less worried, not because if it does happen the skeptics would get more worried.

The skeptics would get much more worried if it happened (median: 1.0% on positive resolution; 0.1% on negative resolution), but they think that it is very unlikely to happen (median: 1.0%), so it figures less in the expected reduction of disagreement.

See Appendix 7 for additional analysis of this question.

Differences of Opinion within Groups

So far, we have focused on the median cross-camp pair, treating them as representative of convergence or divergence between groups. We considered a question to be effective in reducing disagreement if it brought the median pair closer together in their views.

But we’ve seen on many questions that people disagree substantially even within their own groups, so we miss some interesting agreement and disagreement by only looking at the median cross-camp pairs. For some questions, everyone would update in the same direction: all participants agree that an AI autonomously creating and deploying new AI software would be a bad sign, for example (with the exception of one participant for whom that would make no difference). But for many others, participants disagreed not only about how likely a crux was to happen, but also about how it would change their forecasts on the ultimate question if it did.

There was more agreement within the concerned group than the skeptic group. The concerned group would be unanimously less concerned in 2030 than now if “Muehlhauser policies”⁹⁵ were implemented; they also have unanimity on updating downward if alignment researchers changed their minds, if there were an alignment solution, and four other questions. Two questions would make them unanimously more concerned: “AI robotics” and “AI writes AI.”⁹⁶ The skeptics were much more mixed, and more likely to say “no change,” i.e., it wouldn’t make a difference to them whether the crux resolved “yes” or “no;” their P(AI existential catastrophe by 2100) would stay exactly the same.⁹⁷

Because of these differences within groups, if questions narrowed disagreement between many individual people, but not the median people, that could indicate that short-term AI cruxes are a more important part of this debate than the above analysis might suggest. And conversely, if a question narrows disagreement between the median people but not between many other people, it may look more important than it really is.

Our work on these differences within groups is preliminary, so we have included detailed analysis of individual differences of opinion for a single question, “ARC Evals,” which was identified as the best convergent crux for the median people. To what extent do the findings on the “ARC Evals” question apply to the disagreement between individuals within the group who hold views different from the median?

**Figure 3**: Value of Discrimination of the “ARC Evals” question for every pair of forecasters. The color of each cell indicates the VOD of the “ARC Evals” question for the corresponding pair of participants. VOD of zero (light blue) means no change in disagreement as a result of the crux; positive VOD means less disagreement in expectation; negative VOD means more disagreement in expectation. For example, for Xander (Concerned) and Claire (Skeptical), the resolution of the “ARC Evals” question will bring them closer together in expectation.

The above “Fiedler heatmap” looks at VOD between each concerned-skeptic pair for the ARC Evals question.

Light blue squares mean that VOD was 0 between that pair, meaning that this question resolving would not change the disagreement between those people in expectation. Dark blue squares mean that in expectation the two people would disagree less when it resolves, and warmer squares (yellowish, orange, red) mean they would disagree more. If this question were a perfect convergent crux for a pair, the relevant square would be entirely dark blue.

Looking at this heatmap, we can see that the median pair is not alone: there are medium and dark blue clusters, showing groups of skeptics and concerned people who would disagree less. At the same time, for many pairs, it makes no difference, and a few would disagree more, in expectation, when this question resolves.

The pattern in this heatmap may reflect differences in how different people expect AI developments to unfold. Imagine, for example, one group of concerned people who think that METR finding evidence of autonomous replication would make them less worried about existential risks due to AI, because it would mean that evidence of these capabilities has emerged with enough time to stop the model from doing significant damage. If a group of skeptics thinks that this question would make them more worried, because it would mean that there are dangerous capabilities that they don’t currently expect, then those groups would converge conditional on this question resolving. But, a pair consisting of a concerned person who becomes more worried and a skeptic who becomes less worried conditional on positive resolution would diverge on this question.

We have only just begun to look for these patterns within groups, so we do not have strong conclusions yet. But we hope to use this kind of analysis to understand variation within and between schools of thought. If we saw that a particular subset of skeptics and concerned participants often converge based on the same questions, we might be able to deduce underlying differences in how they think about AI developments. We plan to write more about this when we have explored it more fully.

Divergent cruxes: Which information would lead to more disagreement?

Just as, conditional on “ARC Evals” resolving, the two groups would disagree less, we also looked at cruxes that would lead the groups to disagree more. These are questions where the groups disagree about how to interpret the information gained from a question’s resolution, and can reveal interesting aspects of the debate between groups. We highlight one crux resolving by 2030 that would, in expectation, make the disagreement wider when it resolves:

Alignment researchers changing minds – Will at least 3 out of the 5 following AI alignment researchers — Paul Christiano, Nate Soares, Neel Nanda, Richard Ngo, and Rohin Shah — say their P(existential catastrophe due to AI by 2100) has halved by 2030?

This is the question that would increase the disagreement between the median cross-camp pair most in expectation.
- The median cross-camp pair for this question disagree strongly on the ultimate question: Riley forecasted a 30% chance of human extinction due to AI by 2100, and Claire forecasted 0.0000001%.⁹⁸
- Riley forecasted a 20% chance that “Alignment researchers changing minds” will resolve positively and a 15% chance that the ultimate question will resolve positively if this crux question does. This implies a 33.8% chance that the ultimate question will resolve positively if the crux resolves negatively.
- Claire forecasted a 1% chance that “Alignment researchers changing minds” will resolve positively, and whether it does or not, their P(U) would not change at all and would remain at 0.0000001%.
- If this question resolves positively, then they will disagree less: Riley will lower their existential risk forecast from 30% to 15%, and Claire won’t change their forecast. If the question resolves negatively, they will disagree more than they do now: Riley will raise their existential risk forecast from 30% to 33.8%, and Claire won’t update.
- Because they both think this question is unlikely to resolve positively, the worlds where it resolves negatively carry more weight, and it is more likely they will end up disagreeing more than they do now.
- They currently disagree by 29.9999999%, and conditional on this question resolving, they will disagree by 33.1% in expectation.
- As a result, this question has a POM VOD of -10.33%, meaning that they would disagree by 10.33% more than they do now, in expectation.⁹⁹

The two groups disagree strongly about how to update conditional on this question resolving:
- Conditional on alignment researchers being much less worried about AI risk, the concerned group would be much less worried: this is one of the most informative questions for them.¹⁰⁰
- For the median skeptic, this question has a VOI of 0; they simply do not think it is relevant to their analysis of how likely it is that humanity goes extinct due to AI.¹⁰¹ Seven out of nine skeptics who forecasted this question would not update their views on existential risk based on its resolution, while two out of nine skeptics would be less worried to some extent.
- This question demonstrates one of the difficulties of this study: the skeptics and the concerned group largely do not trust one another’s analyses and disagree strongly about whose opinions they should listen to. As a result, questions that rely for their resolution on people who seem to be clearly affiliated with one “team” and do not have clear objective criteria may be less likely to be useful cruxes.

Hypothesis #3: Were disagreements about AI risk explained by different long-term expectations?

Although this study was focused on questions that resolve in 2030, we found substantial evidence that disagreements about AI risk decreased between the groups when considering longer time horizons and a broader swathe of severe negative outcomes from AI than extinction or civilizational collapse. It seems that some of the key reasons for disagreement about AI risk are that the groups have different expectations about (1) how long it will take until AIs have capabilities far beyond those of humans in all relevant domains; and (2) how common it will be for AI systems to develop goals that might lead to human extinction, whether harming humans is specifically part of the goal or simply a side effect of other goals.

Key forecasts supporting these claims include:

Both groups strongly expected that powerful AI (defined as “AI that exceeds the cognitive performance of humans in >95% of economically relevant domains”) would be developed by 2100 (skeptic median: 90%; concerned median: 88%). Though, some skeptics argue that (1) strong physical capabilities (in addition to cognitive ones) would be important for causing severe negative effects in the world, and (2) even if AI can do most cognitive tasks, there will likely be a “long tail” of tasks that require humans.

The two groups also put similar total probabilities on at least one of a cluster of bad outcomes from AI happening over the next 1000 years (median 40% and 30% for concerned and skeptic groups respectively).¹⁰² But they distribute their probabilities differently over time: the concerned group concentrates their probability mass before 2100, and the skeptics spread their probability mass more evenly over the next 1,000 years.
We asked participants if and when AI will displace humans as the primary force that determines what happens in the future.¹⁰³ The concerned group’s median date is 2045 and the skeptic group’s median date is 2450—405 years later.

In this section, we also discuss forecasts on whether there will be “transformative economic growth” by 2070. Overall this question had relatively high value of information, but there was surprisingly little agreement (even within groups) about whether its occurrence would increase or decrease the likelihood of existential catastrophe. For example, some forecasters argued that such growth would be evidence that highly powerful AIs are relatively controllable, while others argued that highly economically useful AI would be evidence of future dangerous AI.¹⁰⁴ The likelihood of transformative growth due to AI is frequently debated,¹⁰⁵ but these results highlight that it may be valuable to shift more emphasis in future discussion to what the implications of such growth would be for risk levels.

Overall, many skeptics regarded their forecasts on AI extinction risk as worryingly high, although low relative to the concerned group.¹⁰⁶

Survey on long-term AI outcomes

At the suggestion of a participant, we asked all participants to complete a survey about their views on a range of long-term AI outcomes, to better characterize areas of agreement and disagreement. See Appendix 5 for the full results and details on question wording.

In brief, we asked about:

The likelihood of a variety of outcomes occurring by 2100, such as: humans intentionally using AI to cause extinction; AI intentionally or accidentally causing extinction; AI causing major population declines (<50% of 2023 human population) or decreases in human well-being (<4/10 on an “Average Life Evaluation” scale) through a variety of means; powerful AI is developed and everything goes fine; powerful AI is developed but not deployed; powerful AI is not developed. Details on all outcomes and operationalizations in Appendix 5.
The likelihood of subsets of the above outcomes occurring on longer time horizons, such as by 2200 (an additional hundred years) and by 3023 (an additional thousand years).
Whether and when AI will displace humans as “the primary force that determines what happens in the future.”¹⁰⁷

The key takeaways were:

The largest disagreement on AI outcomes by 2100 is about the probability of AI-caused human extinction, particularly from scenarios not involving human misuse of AI.
- Forecasts on both AI intentionally causing extinction (question 1A.2) and AI unintentionally causing extinction (1A.3) by 2100 are over two orders of magnitude apart (12% to 0.02% on 1A.2, and 3% to 0.01% on 1A.3, for concerned and skeptic group medians respectively).
- There is also considerable disagreement about AI extinction via human misuse by 2100 (1A.1). Forecasts are ~1 order of magnitude apart (medians of 0.5% for the concerned group, 0.03% for the skeptic group).
On the other AI outcomes we asked about, median forecasts for the two groups are all within the same order of magnitude.
- Outcomes with particularly close forecasts are:
  - Large drop in human wellbeing because of human misuse of AI by 2100 (1A.7). The concerned median is 2%, and the skeptic median is 4% (although the skeptic median is higher than the 75th percentile concerned forecast).
  - ‘Powerful AI’¹⁰⁸ not being developed by 2100 (1A.10). The concerned median is 12%, and the skeptic median is 10%.
- Other outcomes with forecasts of the same order of magnitude included misuse causing a sub-extinction catastrophe, high human well-being scenarios, a large drop in human well-being caused directly by an AI, and the development of powerful AI without deployment (1A.4, 1A.5 and 1A.6, 1A.8, and 1A.9 respectively).
  - However, though they are on the same order of magnitude, a notable result is that the skeptic group median for powerful AI being developed but not deployed by 2100 (because of coordinated human decisions, costliness, or other reasons) is 20.4% while the concerned group median is 4%.¹⁰⁹
When we asked for probabilities on a cluster of ‘bad’ outcomes—including extinction as well as less extreme bad outcomes (full list in footnote)¹¹⁰—in different date ranges, disagreements shrank.
- Before 2100 and between 2100 and 2200, forecasts for one of the bad outcomes in this cluster occurring are within the same order of magnitude (before 2100, 35% for the concerned group and 7.6% for the skeptic group; between 2100 and 2200, 3% for the concerned group and 12% for the skeptic group).
- Forecasts for one of the bad outcomes in this cluster occurring between 2200 and 3023 are one order of magnitude apart (1% for the concerned group and 20% for the skeptic group).
- Forecasts for none of these outcomes occurring in the next 1000 years are 60% for the concerned group and 70% for the skeptic group, which is particularly close as a factor (though the skeptic median is higher than the 75th percentile concerned forecast).
- This suggests that both groups put significant total probability on bad outcomes from AI in the next 1000 years (40% and 30% for concerned and skeptic groups respectively), but they distribute this probability differently over time, with the concerned placing most of their probability before 2100, and the skeptics spreading their probability more evenly.
There is large disagreement on when AI will displace humans as the primary force that determines what happens in the future. The concerned median is 2045 and the skeptic median is 2450—a 405 year gap.
- Three out of 11 skeptics forecast ‘Never’ for this question, suggesting that they think it is <50% likely that AI ever displaces humans in this way.
- Some participants said that they did not necessarily see ‘AI replacing humans as the primary force that determines what happens in the future’ as a negative outcome.

What long-term outcomes from AI do skeptics expect?

If skeptics expect “powerful AI” systems (as previously defined) by 2100, why would it take until 2450 for AI to displace humans as the dominant force in the world? And if skeptics place low probability on existential catastrophe due to AI by 2100, what do they expect to happen instead?

We analyzed rationales and conducted three follow-up calls with members of the skeptic group to gather more information on these questions.

In brief, skeptics argued:

There may still be a “long tail” of highly important tasks that require humans, similar to what has happened with self-driving cars. So, even if AI can do >95% of human cognitive tasks, many important tasks will remain.
Consistent with Moravec’s paradox, even if AI has advanced cognitive abilities it will likely take longer for it to develop advanced physical capabilities. And the latter would be important for accumulating power over resources in the physical world.
AI may run out of relevant training data to be fully competitive with humans in all domains. In follow-up interviews, two skeptics mentioned that they would update their views on AI progress if AI were able to train on sensory data in ways similar to humans. They expected that gains from reading text would be limited.
Even if powerful AI is developed, it is possible that it will not be deployed widely, because it is not cost-effective, because of societal decision-making, or for other reasons.¹¹¹

And, when it comes to outcomes from AI, skeptics tended to put more weight on possibilities such as:

AI remains more “tool”-like than “agent”-like, and therefore is more similar to technology like the internet in terms of its effects on the world.
AI is agent-like but it leads to largely positive outcomes for humanity because it is adequately controlled by human systems or other AIs, or it is aligned with human values.
AI and humans co-evolve and gradually merge in a way that does not cleanly fit the resolution criteria of our forecasting questions.
AI leads to a major collapse of human civilization (through large-scale death events, wars, or economic disasters) but humanity recovers and then either controls or does not develop AI.
Powerful AI is developed but is not widely deployed, because of coordinated human decisions, prohibitive costs to deployment, or some other reason.

Forecasts about “transformative” economic growth

Participants also spent time forecasting one other longer-term outcome: whether there would be “transformative economic growth” (defined as >15% global GDP growth in any year)¹¹² by 2070.

There was major disagreement about the likelihood of this occurring among skeptics and concerned. The concerned group median forecast of positive resolution was 43% (average: 41.6%; range 15%-75%), and the skeptic median was 2% (average: 2.7%; range 0.1%-11.2%). Notably, there is no overlap in their ranges.

This question had higher value of information for the concerned group than any crux resolving by 2030 (median VOI: 1.4E-2; median POM VOI: 8.93%; for comparisons to cruxes resolving by 2030, see near-term VOI results section). It had the 11th-highest value of information for the skeptic group (median VOI: 4.5E-7; median POM VOI: 0.02%). It was one of the strongest divergent cruxes (i.e., a crux that would lead to more disagreement) between individuals in the concerned and skeptic groups.

A striking result is that—independent of group—the participants are nearly evenly split on whether transformative growth (defined as >15% global GDP growth in any year)¹¹³ by 2070 would increase or decrease the probability of existential catastrophe by 2100. Across groups, 10 forecasters predict higher AI risk conditional on positive resolution of this question, eight predict lower risk, and four predict no net effect on risk. Among the concerned group, 56% (six forecasters) think the occurrence of transformative growth decreases risk; and 44% (five) think it increases risk. Among the skeptical group, 18% (two forecasters) think transformative growth decreases risk; 36% (four) think it has no effect at all on risk; and 44% (five) think it increases risk.

Some forecasters argued that such growth would be evidence that highly powerful AIs are relatively controllable, while others argued that highly economically useful AI would be evidence of future dangerous AI.¹¹⁴ The likelihood of transformative growth due to AI is frequently debated,¹¹⁵ but these results highlight that it may be valuable to shift more emphasis in future discussion to what the implications of such growth would be for risk levels.

For additional details on participants’ forecasts and rationales on this question, see Appendix 7.

Reasons for long-term disagreement

Based on our analysis of forecasts and rationales, some themes that we think underlie the debate between the two groups are:

Timelines: how long will it take for AIs to become more powerful than humans, and how long will it be from the first sign of danger to a potential extinction event?
Goals that incentivize killing everyone: conditional on having advanced AI systems, how likely is it that such systems would develop goals that incentivize them to cause human extinction?

Timelines for AI Progress

Timelines for AI progress, especially timelines until AI is more advanced than humans in all relevant domains, seem to be an important driver of disagreement. When participants discussed questions related to timelines, a number of themes emerged in their arguments:

Main arguments from the skeptic group:

Fundamental breakthroughs in AI development would be necessary to create AI capable of causing extinction.¹¹⁶
Developing powerful new AI technology will take more time than expected for planning fallacy-like reasons.¹¹⁷
AI powerful enough to cause extinction would require significant advances in robotics which are unlikely to happen by 2100.¹¹⁸
Even once sufficiently powerful AI is developed, there will be a lag for deployment and adoption.¹¹⁹
AGIs will want to prevent the development of deadly AGIs.¹²⁰

Main arguments from the concerned group:

Combining and/or extending existing ML methods may be sufficient for achieving AI that poses an existential risk.¹²¹
Once human-level AGI is developed, it will rapidly speed up further AI progress as it will operate more efficiently (in terms of both time and money) than humans.¹²²
Robots won’t be necessary for an AI to interact with the physical world. This could be done through humans, and/or through computer systems.¹²³
Current progress is fast,¹²⁴ faster than predicted,¹²⁵ and set to continue.¹²⁶
AI development will speed up AI development.¹²⁷

Many of these arguments hinge on the question of how comprehensive advanced AI’s abilities will be. The skeptics think that existing technology is enough to make powerful AI highly likely this century,¹²⁸ but they expect that being more capable than humans in all relevant domains will take longer.¹²⁹ The concerned group, on the other hand, believe that the transition from powerful AI to a dangerous superintelligence could happen more quickly.

In previous sections, we discussed cruxes related to timelines for highly advanced AI, such as the groups’ disagreement about the probability of AI-caused transformative economic growth. Other questions related to the development of highly advanced AI capabilities include:

Power-seeking: By 2030, there is a significant concrete example of power-seeking behavior from a large state-of-the-art model: demonstration of actual capability for sustained deception of human experts.

Power-seeking
Group	VOI (POM)	P(c)	P(U) initial	P(U\|c)	P(U\|¬c)	VOD (POM)
Zoe (Median Concerned)	1.4E-4 (0.06%)	10%	21%	18%	21%	−3.6E-4 (−0.08%)
Blake (Median Skeptic)	4.7E-7 (0.01%)	10%	0.20%	0.22%	0.20%	−3.6E-4 (−0.08%)

Table 14: Forecasts and VOD between the median (by VOI) participant from each group for the question
“Power-seeking.”

AI writes AI: By 2030, an AI, without being asked, writes AI software that is better at writing AI software, and deploys it without asking for permission

AI writes AI
Group	VOI (POM)	P(c)	P(U) initial	P(U\|c)	P(U\|¬c)	VOD (POM)
Pascal (Median Concerned)	8.6E-4 (0.66%)	75%	9%	10%	6%	3.9E-2 (21.71%)
Ash (Median Skeptic)	9.1E-7 (0.03%)	10%	0.1%	0.12%	0.0978%	3.9E-2 (21.71%)

Table 15: Forecasts and VOD between the median (by VOI) participant from each group for the question “AI writes AI.”

AI Robotics: By January 1, 2030, an AI system “Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators and when given human-readable instructions, satisfactorily assemble a (or the equivalent of a) circa-2021 Ferrari 312 T4 1:8 scale automobile model. A single demonstration of this ability, or a sufficiently similar demonstration, will be considered sufficient.”¹³⁰

AI Robotics
Group	VOI (POM)	P(c)	P(U) initial	P(U\|c)	P(U\|¬c)	VOD (POM)
Yael (Median Concerned)	8.9E-4 (0.44%)	33.00%	17.50%	21.00%	15.78%	−2.2E-2 (−10.36%)
Flint (Median Skeptic)	4.0E-19 (0.00%)	75.00%	1.10%	1.10%	1.10%	−2.2E-2 (−10.36%)

Table 16: Forecasts and VOD between the median (by VOI) participant from each group for the question “AI Robotics.”

Goals that incentivize killing everyone

Based on our question ranking and analysis of participants’ comments, we think that the question of how likely it is that a capable AI system would develop dangerous goals is behind a significant amount of the disagreement between the two groups. As discussed above, both groups agree that they expect to see powerful AI this century. But they disagree strongly about whether that is likely to be dangerous. Concerned participants tended to think that a sufficiently advanced AI system would be very likely to develop dangerous goals, including both goals where killing humans is an intended outcome of a plan and ones where it is an acceptable price for an AI achieving a different goal. Skeptical participants tended to agree that dangerous goals are possible, but did not think there were compelling reasons to believe they are much more likely than other goals.

One of the highest-ranked questions was about capabilities that are not necessarily dangerous in and of themselves, but that would make an AI more effective at pursuing a wide variety of goals, including dangerous ones: whether METR would determine by 2030 that AI models could replicate, acquire resources, and evade deactivation.¹³¹ The groups strongly disagreed about how likely this is to occur: the median skeptic forecast was 1% and median concerned forecast was 25%.

For the median skeptic and the median concerned people, by VOI, Flint (Skeptical) and Riley (Concerned):

Flint believes there is only a 1% chance that this ARC Evals (METR) question resolves positively. When asked to forecast P(AI existential catastrophe by 2100) conditional on this question, Flint would forecast 1.30% if it resolves positively and 1.10% if it resolves negatively (compared to their unconditional 1.10%).
Riley believes there is a 55% chance that this resolves positively. They would forecast 35% if it resolves positively and 23.89% if it resolves negatively (compared to their unconditional 30%).
This question (“Platform: ARC Evals”) resolves 23.19% of this pair’s disagreement in expectation.¹³²

Below, we provide a variety of arguments from participants about how likely it is that AI systems will develop dangerous goals.

Main arguments from the skeptic group:

The set of possible goals is very large, and goals that benefit from the eradication of humans are a small portion of the overall set.¹³³
- It’s possible that future AI systems are indifferent to humans, and if so it seems unlikely that they would try to cause extinction.¹³⁴
Instrumental convergence and extreme power-seeking seem like possible characteristics of AI systems, but they have not been empirically demonstrated. Theoretical arguments demonstrate the possibility of dangerous instrumental convergence, but not that these outcomes are likely.¹³⁵
Deception and violence are both costly behaviors that may not actually be instrumentally convergent.¹³⁶
AI systems will be built using human-centered data and so are likely to learn human values.¹³⁷

Main arguments from the concerned group:

Instrumental convergence may arise even when an agent’s goals are bounded. It would be difficult to specify constraints that avoid instrumental convergence.¹³⁸
It seems likely that, eventually, an AI with an unbounded goal will be developed, and systems with bounded goals will have limited ability to prevent the actions of an unbounded system.¹³⁹
Catastrophic goal misgeneralization can occur, which could result in an AI trained on a safe goal developing an unsafe goal when outside its training environment, with catastrophic consequences.¹⁴⁰
Most goals an AI might have are likely to benefit from human extinction:
- Humans might be a threat to AI systems’ control of resources.¹⁴¹
- AI pursuing a wide range of goals might cause human extinction as a side effect of using up Earth’s resources to meet its goal.¹⁴²

Relevant high-VOI cruxes related to the prevalence of dangerous goals in AI, in addition to the previously mentioned “ARC Evals” crux, include:

Power-seeking shutdown: “AI developers attempt (of their own volition, or under compulsion by a regulatory authority) to disable or destroy an AI system of their own creation, which cost at least $10M to develop, after the AI system displays at least one instance of “power-seeking behavior.”¹⁴³
Alignment solution: By 2030, there is a “Theoretical in-principle solution to the alignment problem that most people who thought about this carefully agree should work.” This will be resolved by a panel of experts of the “AI concerned” team’s choosing.

Hypothesis #4: Do the groups have fundamental worldview disagreements that go beyond AI?

Throughout the project, we noticed that many of the disagreements between the AI skeptics and AI concerned participants didn’t pertain only to AI but were rooted in more fundamental issues. These included disagreements about what kinds of evidence are reliable, how to think about reference classes for unusual events, and how various social and political systems interact with one another.¹⁴⁴ These deep worldview disagreements are not addressed directly by AI forecasting questions, but understanding them is still valuable for determining what might be driving fundamental disagreements on this topic. If we could understand these differences in worldview, perhaps we could use that information to build a deeper understanding of why these two groups continue to disagree about AI, even after discussion and consideration.

While a detailed analysis of broader worldview differences is beyond the scope of this project, we offer some observations about participants’ reasoning that shed light on these disagreements. For example, we can see these worldview differences in how each group interprets “extraordinary claims.” Both groups agree that “extraordinary claims require extraordinary evidence,” but they disagree about which claims are extraordinary. Is it extraordinary to believe that AI will kill all of humanity when humanity has been around for hundreds of thousands of years, or is it extraordinary to believe that humanity would continue to survive alongside smarter-than-human AI?

AI skeptics tended to focus on the general difficulty of correctly anticipating complex future outcomes. Examples of fundamental beliefs which seem more common among the AI skeptic group:

Because the world is complex, the future is unlikely to unfold as theories and models expect.¹⁴⁵
A long chain of specific things needs to go wrong for humanity to perish in the transition to advanced AI; long chains of specific outcomes are unlikely to happen.¹⁴⁶
- Three skeptics listed this as their number one disagreement with the concerned group in the postmortem survey, and it also emerged as a strong theme when we asked participants to summarize the three strongest arguments from each group.
Complex processes (like technological development, deployment, and societal change) take a long time, which makes transformative developments less likely by 2100.¹⁴⁷
Thinking about AI capabilities in isolation is misleading in estimating risk, as human responses to AI will also be very important in determining outcomes.¹⁴⁸

The AI concerned group tended to focus on features of the AI risk case that they argue make it different from most other forecasting problems. Some examples of fundamental beliefs which seem more common among the AI risk concerned group:

AI will change the world so radically that base rates are not a helpful guide to forecasting many of these questions.¹⁴⁹
A long chain of specific things need to go right for humanity to survive the transition to advanced AI; long chains of specific outcomes are unlikely to happen.¹⁵⁰
The case for extinction is intuitive.¹⁵¹

The differences in what each group considers good evidence is reflected in the varying importance they assign to members of their own group changing their minds. “Supers changing minds” is the skeptic group’s highest median VOI question at about 1.15% of their theoretical maximum VOI. In other words, the most influential factor for them would be learning that superforecasters have become concerned about AI risks. Conversely, for the concerned group, the same question captures only 0.43% of their maximum theoretical VOI.

The difference is even starker in the other direction. For the concerned group, “Alignment researchers changing minds” ranks as their second-highest VOI question, and captures 2.43% of their maximum possible VOI for that question. In contrast, this question is 0% informative to the median skeptic.

Most likely, participants were not interpreting those questions causally: they probably were not saying that they would change their minds because other people did, but rather treating other people changing their minds as evidence about what has happened by 2030. Both groups think that, if people whose reasoning they trust changed their minds, there is probably evidence that would convince them, too, but the same does not hold true for people whose reasoning they don’t trust. If the concerned participants think that the skeptics’ reasoning is flawed today, then they can also imagine similar people in 2030 changing their minds for reasons that are uncompelling to the concerned people of 2030, and vice versa.

Similarly, the two groups do not trust one another’s reasoning enough to update very much on each other’s opinions. This may not be surprising: they started with different priors, and then did not get very much new evidence about what will happen with AI from mere discussions and reading comments online. But it is evidence that their disagreements extend beyond AI-related facts. If the disagreement were solely based on AI-related facts, we would expect people who disagree only about such facts to change their minds if they learned a new fact.

These differences mean that the groups often talk past each other, in ways that may be frustrating for people deeply embedded in one side’s form of reasoning. An AI concerned reader hoping to find out why skeptics disagree may be disappointed to see few specific refutations of AI risk arguments in this report, and to instead see skeptics reiterating that predicting the long-term future is hard. And AI skeptical readers may have a parallel experience, seeing that the concerned group often focuses on theoretical arguments and does not always have answers to specific questions about how exactly they expect threats to manifest.

We do not know why the two groups disagree about these bigger questions. Why do some people think that theoretical arguments with multiple steps of logic are the best way to predict novel events, while others rely on reference classes that predict major changes are likely to be more gradual? Everyone agrees that each of these modes of reasoning can fail. The AI concerned group knows that many people have, historically, predicted huge societal changes from technologies that turned out to be relatively unimportant, and that theoretical arguments that seem convincing sometimes do not come true as events unfold. The skeptics know that there are no perfect reference classes, especially for unusual events,¹⁵² and that major changes do sometimes happen quickly. But members of each group nonetheless are more likely to default to one mode of reasoning or another. They disagree about how to apply the relevant heuristics and reference classes in this case. These differences may be based on a combination of AI-related knowledge, professional training, personality, social incentives, and other factors.

Limitations of our research

Limitations of our research include:

We asked participants to complete an extremely difficult task: forecasting technological change on long time horizons. There is no evidence that anyone can do this well. Most previous evidence on judgmental forecasting applies to geopolitical forecasts on 0-2 year time horizons.¹⁵³
We also do not know if people are well-calibrated or accurate when making conditional forecasts of the kind we elicited in this project. Little evidence on these kinds of forecasts exists. There are some reasons to believe that these forecasts are not robust:
- The concerned group’s forecasts on the “escalating warning shots” question changed substantially when they were asked to spend approximately one hour forecasting it rather than approximately 10 minutes.¹⁵⁴
- Some conditional forecasts were logically incoherent. In total we dropped thirteen observations due to incoherence (2% of the total). See Appendix 6 for details.
- Intuitively, conditional forecasting seems difficult. Our team often finds generating and understanding forecasts on these questions to be challenging, so we would expect others to also.
- Conditional forecasts do not have clear feedback loops or potential for accountability in the way that standard resolvable forecasts do.
The forecasters in our project often emphasized that their forecasts felt extremely speculative to them and that they have low confidence in their views.

There may be inconsistency between how people would say they’ll update based on particular conditions and how they’ll actually update. There is some evidence for this from the project already. Concerned forecasters often did not expect to update much on cruxes related to particular policies being implemented.¹⁵⁵ However, a few concerned participants substantially updated their views on AI existential risk during the project due to increased policy attention on AI risk in April and May 2023.¹⁵⁶ These seem inconsistent.
As previously noted, we acknowledge that there are two ways to interpret this forecasting exercise: either as asking for your all-else-equal forecast (i.e. how would this crux resolving positively causally influence the probability of existential catastrophe, if you could isolate the effect of the crux) or your all-things-considered forecast (i.e. taking into account what this crux resolving positively may tell you about the world in 2030). Based on their rationales and discussions, we believe most participants were doing the latter.¹⁵⁷ We therefore cannot make many claims about whether participants think the specific event described in the crux would be good or bad for AI risk all-else-equal.¹⁵⁸
Many crux questions are not robustly better than others when accounting for uncertainty analysis (see Appendix 3).
Even within groups, people disagree substantially about the cruxes. This suggests that we are not measuring two sets of views about AI risk (concerned and skeptical), but many. This makes it hard to draw broad conclusions.
Participants’ expectations likely affect how they interpret potential cruxes. For example, if we asked a question like “Will an AI resist being shut down?”, participants might make different conditional updates depending on their expectations about AI. Conditional on this question resolving positively, a participant who thinks that AIs are likely to be dangerous might think about a range of possible resolutions that includes dangerous ones, like an AI that resists powerful governments trying to turn it off, and therefore might have a much higher P(U) conditional on it resolving positively. A participant who thinks dangerous AI is very unlikely might expect that nearly all positive resolutions are more innocuous ones, in which the resolution criteria are only technically true, and therefore might not update very much. This could make it look like they have a large disagreement about how to update conditional on this question, even if they would actually make the same update conditional on the same actual event. Better operationalization may mitigate this problem, but will not eliminate it fully.¹⁵⁹

Conclusion and Next Steps

Overall, this project made progress on the original questions we set out to study, but there is substantial room for further research.

In short:

We see this project as providing strong evidence that disagreements about AI risk are not attributable to lack of engagement among participants, low quality of experts willing to participate in forecasting studies, or because the skeptic and concerned groups do not understand each other’s arguments.
We identified some areas of notable disagreement that can be resolved by 2030, but most of the disagreement about AI risk by 2100 is not explained by the shorter term indicators examined in this project.
We found substantial evidence that disagreements about AI risk decreased between the groups when considering longer time horizons and a broader swathe of severe negative outcomes from AI than extinction or civilizational collapse.
The groups seem to have some fundamental worldview disagreements that go beyond AI, such as how much weight to put on theoretical models that have not yet seen substantial empirical verification.

We also believe that this project has made other contributions to the AI discourse. For example, we have provided better examples of discussion between disagreeing AI forecasters than have existed previously; see summaries of arguments here and sample back-and-forths between participants here. We also believe this project has established stronger metrics for evaluating the quality of AI forecasting questions than have existed previously. We invite readers to see if they can generate cruxes that outperform the top cruxes generated by our project.

In addition to our conclusions about the AI risk debate, we also developed new strategies for navigating some of the difficulties in eliciting and analyzing conditional forecasts, and we hope to release a methods-focused report in the future.

Directions for further research

We see many other projects that could extend the research begun here to improve dialogue about AI risk and inform policy responses to AI.

Examples of remaining questions and future research projects include:

Are there high-value 2030 cruxes that others can identify?
- We were hoping to identify cruxes that would, in expectation, lead to a greater reduction in disagreement than the ones we ultimately discovered. We are interested to see whether readers of this report can propose higher value cruxes.
- If people disagree a lot, it is likely that no single question would significantly reduce their disagreement in expectation. If such a question existed, they would already disagree less. However, there might still be better crux questions than the ones we have identified so far.
What explains the gap in skeptics’ timelines between “powerful AI” and AI that replaces humanity as the driving force of the future? In other words, what are the skeptics’ views on timelines until superintelligent AI (suitably defined)? A preliminary answer is above, but more research is needed.
To what extent are different “stories” of how AI development goes well or poorly important within each group?
- The skeptic and concerned groups are not monoliths: within each group, people disagree about what the most likely AI dangers are, in addition to how likely those dangers are to happen.
- Future work could try to find these schools of thought and see how their stories do or do not affect their forecasts.
Would future adversarial collaborations be more successful if they focused on a smaller number of participants who work particularly well together and provided them with teams of researchers and other aids to support them?
Would future adversarial collaborations be more successful if participants invested more time in an ongoing way, did additional background research, and spent time with each other in person, among other ways of increasing the intensity of engagement?
How can we better understand what social and personality factors may be driving views on AI risk?
- Some evidence from this project suggests that there may be personality differences between skeptics and concerned participants. In particular, skeptics tended to spend more time on each question, were more likely to complete tasks by requested deadlines, and were highly communicative by email, suggesting they may be more conscientious. Some early reviewers of this report have hypothesized that the concerned group may be higher on openness to experience. We would be interested in studying the influence of conscientiousness, openness, or other personality traits on forecasting preferences and accuracy.
- We are also interested in investigating whether the differences between the skeptics and concerned group regarding how much weight to place on theoretical arguments with multiple steps of logic would persist in other debates, and whether it is related to professional training, personality traits, or any other factors, as well as whether there is any correlation between trust in theoretical arguments and forecasting accuracy.
How could we have asked about the correlations between various potential crux questions? Presumably these events are not independent: a world where METR finds evidence of power-seeking traits is more likely to be one where AI can independently write and deploy AI. But we do not know how correlated each question is, so we do not know how people would update in 2030 based on different possible conjunctions.
How typical or unusual is the AI risk debate? If we did a similar project with a different topic about which people have similarly large disagreements, would we see similar results?
How much would improved questions or definitions change our results? In particular:
- As better benchmarks for AI progress are developed, forecasts on when AIs will achieve those benchmarks may be better cruxes than those in this project.
- Our definition of “AI takeover” may not match people’s intuitions about what AI futures are good or bad, and improving our operationalization may make forecasts on that question more useful.
What other metrics might be useful for understanding how each group will update if the other group is right about how likely different cruxes are to resolve positively?
- For example, we are exploring “counterpart credences” that would look at how much the concerned group will update in expectation if the skeptics are right about how likely a crux is, and vice versa.¹⁶⁰
- Relatedly, it might be useful to look for additional “red and green flags,” or events that would be large updates to one side if they happened, even if they are very unlikely to happen.
This project shares some goals and methods with FRI’s AI Conditional Trees (a) project (report forthcoming), which works on using forecasts from AI experts to build a tree of conditional probabilities that is maximally informative about AI risk. Future work will bring each of these projects to bear on the other as we continue to find new ways to understand conditional forecasting and the AI risk debate.

In 2030, most of the questions we asked will resolve, and at that point, we will know much more about which side’s short-run forecasts were accurate. This may provide early clues into whether one group’s methods and inclinations make them more accurate at AI forecasting over a several year period. The question of how much we should update on AI risk by 2100 based on those results remains open. If the skeptics or the concerned group turn out to be mostly right about what 2030’s AI will be like, should we then trust their risk assessment for 2100 as well, and if so, how much?

We are also eager to see how readers of this report respond. We welcome suggestions for better cruxes, discussion about which parts of the report were more or less valuable, and suggestions for future research.

Notes

To ensure the stability of links in this report, we include stable archive.org links in parentheses after each citation to an external URL. ↩︎
We defined an “existential catastrophe” as an event where one of the following occurs: (1) Humanity goes extinct; or (2) Humanity experiences “unrecoverable collapse,” which means either: (a) a global GDP of less than $1 trillion annually in 2022 dollars for at least a million years (continuously), beginning before 2100; or (b) a human population remaining below 1 million for at least a million years (continuously), beginning before 2100. ↩︎
For example, three out of six “concerned” participants who updated downward during the project attributed their shift to increased attention to AI risk among policymakers and the public after the release of GPT-4. For more details on the reasons for all updates, see the “Central Disagreement” section below and Appendix 4. ↩︎
Scott Alexander, among other XPT readers, suggested this possibility: “Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic”. See Scott Alexander, “The Extinction Tournament”, Astral Codex Ten, (July 20, 2023) https://www.astralcodexten.com/p/the-extinction-tournament (a). ↩︎
The best convergent crux, “ARC Evals,” would narrow the disagreement between the median pair from 22.7 percentage points to 21.48 percentage points in expectation, which means eliminating 5.35% of their disagreement. Note that this statistic refers to the median pair by POM VOD. See “ARC Evals” for more details. For magnitudes of value of information effects, see here. ↩︎
For more details, see “Contextualizing the magnitude of value of information“. In more concrete terms, this is equivalent to a forecasting question with the following characteristics: A concerned participant with original P(AI existential catastrophe (XC) by 2100) = 25% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 6.2%, and P(AI XC|¬crux) = 29.7% A skeptic participant with original P(AI XC by 2100) = 1% identifies a crux that has: P(crux) = 20%, P(AI XC|crux) = 3.37%, and P(AI XC|¬crux) = 0.41% ↩︎
See Understanding each other’s arguments and Appendix 10 for additional discussion of key areas of disagreement. ↩︎
These outcomes were: AI causing extinction intentionally, unintentionally, or via misuse, misalignment causing a 50% drop in human population, or human well-being dropping to <4/10 because of AI misalignment, accidents, or misuse. These were phrased to be mutually exclusive outcomes. See Survey on long-term AI outcomes section and Appendix 5 for more details. ↩︎
The full question we asked was, “By what year, if ever, do you expect to agree with the following statement? ‘AI has displaced humans as the primary force that determines what happens in the future. It now has at least as much power relative to humans as humans had relative to other species in 2023.’” Note that this would not necessarily be seen as a negative outcome by all participants. ↩︎
Note: All participant quotes have been regularized to American English to preserve anonymization. Participants classified as AI skeptics stated, for example, “Also, none of this is to say from a skeptic point of view the issues are not important[.] I think for us a 1% risk is a high risk;” “[T]he ‘risk-concerned’ camp (I’m using scare quotes because I consider that I’m risk concerned, even though technically I’m in the risk-skeptic camp because I assign a far lower probability to extinction by 2100 relative to some);” “AIs could (and likely will) eventually have massive power;” “That said, still perceive overall risk as “low at a glance but far too high considering the stakes[“];” “To my mind, there should be no difference in the policy response to a 1% chance of 60% of humanity dying and a 25% chance—both forecasts easily cross the threshold of being ‘too damn high’.” ↩︎
This could be due to normative influence (because people defer to their social or intellectual peers), or, more likely in our view, informational influence (because they think that, if people whose reasoning they trust have changed their mind by 2030, it must be that surprising new information has come to light that informs their new opinion). Disentangling these pathways is a goal for future work. ↩︎
The median AI expert predicted a 12% chance of catastrophe and a 3% chance of human extinction due to AI by 2100. The median superforecaster predicted a 2.13% chance of catastrophe and a 0.38% chance of extinction due to AI. While experts predicted higher chances of all potential extinction risks than superforecasters did (including nuclear weapons and biorisks), the effect was much more pronounced in the case of AI. For more on lack of convergence, see Ezra Karger, et al., “Forecasting Existential Risks Evidence from a Long-Run Forecasting Tournament”, Forecasting Research Institute, August 8, 2023, https://forecastingresearch.org/research/existential-risk-persuasion-tournament (a). ↩︎
For example, superforecasters predicted that an AI would first win an International Math Olympiad gold medal in 2035 while experts predicted 2030. See Karger et al., “XPT report” (a), page 156. For full relevant analysis, see Relationship between short-run forecasting questions and longer-term disagreements section on page 41. ↩︎
“Adversarial collaboration” protocols, often enforced by “neutral” umpires, encourage each side to demonstrate their capacity to fairly characterize, not caricature, the views of the other—and then to reach ex ante agreements on the types of data, observational or experimental, that would induce each side to move toward the other’s position. For examples of adversarial collaborations and additional information, see “About”, Penn Arts and Sciences Adversarial Collaboration Project, Accessed on February 9, 2024, https://web.sas.upenn.edu/adcollabproject/about/ (a). ↩︎
Note that, in some conversations about cruxes for AI risk, the word “crux” is used for questions that would lead to large updates even if highly unlikely (what we call “red flags”). In this project, we are focused on expected updates: we looked for cruxes that would be the most important in expectation, weighting how much difference they would make if they happened by how likely they are to happen. ↩︎
For example, Scott Alexander stated that, “Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic”. See Alexander, “The Extinction Tournament” (a). Similarly, one XPT participant wrote, “I’ve been spending enough time on LessWrong that I mostly forgot the existence of smart people who thought recent AI advances were mostly hype. I was unprepared to explain why I thought AI was underhyped in 2022”. See Peter McCluskey, “Existential Risk Persuasion Tournament”, Less Wrong (July 17, 2023) https://www.lesswrong.com/posts/YTPtjExcwpii6NikG/existential-risk-persuasion-tournament#Persistent_Disagreement_about_AGI (a). ↩︎
When eliciting conditional forecasts, the prompt given to participants read: “Conditional on this question resolving positively (by 2030), what is your probability that AI causes an existential catastrophe by 2100?” ↩︎
Note: many people in the “skeptic” group describe themselves as concerned about risks from advanced AI, including but not limited to the risk of extinction, despite thinking those risks are less likely to materialize than the “concerned” group expects. For example, “Also, none of this is to say from a skeptic point of view the issues are not important[.] I think for us a 1% risk is a high risk.” (Gus); “… the ‘risk-concerned’ camp (I’m using scare quotes because I consider that I’m risk concerned, even though technically I’m in the risk-skeptic camp because I assign a far lower probability to extinction by 2100 relative to some)” (Blake). ↩︎
For full details, see Appendix 4. Six out of the 11 concerned participants updated downward during the project. Three out of those six cited policy responses as the reason for their updates, one cited an improved understanding of the base rate of non-human extinction after humans arose, one shifted some probability mass toward AI “takeover” rather than AI-caused existential catastrophe, and one did not explain their reasons for updating. Example quotes from participants citing policy responses as the reason for updating: “I have updated my prognosis to 30% [down from 60%], partially driven by positive updates in the area of point 4 making coordination and slowdown/stop of capability research more likely. This largely refers to the shift in public consciousness and the [O]verton window around the topic as I have perceived it over the past months, currently culminating in a public statement by most of the leading figures.” “Slightly lowering my forecast [from 25% to 20%] as [relevant people take the risk seriously] has exceeded my (fairly high) expectations over the last couple of months.” “I think my main update here [moving from 21% to 18%] has come from thinking a bit more deeply about AI regulation and what measures society will adopt to prevent catastrophes. I did not really include this as part of my original model, but it now seems somewhat likely that at least the EU and US will adopt some regulation that meaningfully reduces risk.” ↩︎
For example, one participant described their forecast as based on a “ very rough back-of-the-envelope estimate” (Stella) and another said, “I’m with Tetlocks original view that long-term forecasts of this nature are very unreliable” (Gus). Skeptics who were not subject-matter experts were particularly candid when they were forecasting questions that involved technical details. On a question about the lowest price of GFLOPs, one skeptic said “I’m operating completely outside of my area of expertise here, so no one should hesitate to correct me” (Blake), and another said “This is very far away from my area of understanding. Mostly running on crude estimates of current trends with some leeway in the nearer term for newer hardware designed specifically optimized for reducing the cost of AI training” (Eve). ↩︎
For example, in the Good Judgment Inc. project that compared superforecasters to other participants in an online forecasting competition, the average question was open for 214 days, with the entire tournament taking place over six years. Christopher W. Karvetski, Superforecasters: A Decade of Stochastic Dominance technical white paper (2021), 2 (a). In addition to extensive research on shorter-term forecasts, Tetlock et al. found that, at least on some types of questions, experts are more accurate than simple base rate extrapolation over 25 year horizons, although they are much less accurate than they were over 0-2 years. Our research asks forecasters to consider forecasts over many decades, and we do not yet know how much accuracy declines over that much longer period. Philip E. Tetlock et al., Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment Futures & Foresight Science (2023), 33, (a). ↩︎
We wrote in the XPT report that “Our [domain] expert sample included well-published AI researchers from top-ranked industrial and academic research labs, graduate students with backgrounds in synthetic biology, and generalist existential risk researchers working at think tanks, among others.” See Karger et al., XPT report (a), page 9. ↩︎
We are not commenting on the merits of these criticisms at this point. ↩︎
For example, “Team engagement seemed to fall off over the course of the tournament, with fewer comments being made and chat messages being sent”. See Damien Laird, “Post-Mortem: 2022 Hybrid Forecasting-Persuasion Tournament”, Mania Riddle (March 1, 2023), https://damienlaird.substack.com/p/post-mortem-2022-hybrid-forecasting (a). ↩︎
For example, “I didn’t notice anyone with substantial expertise in machine learning. Experts were apparently chosen based on having some sort of respectable publication related to AI, nuclear, climate, or biological catastrophic risks. Those experts were more competent, in one of those fields, than news media pundits or politicians. I.e. they’re likely to be more accurate than random guesses. But maybe not by a large margin”. See McCluskey, “Existential Risk Persuasion Tournament” (a). ↩︎
Participants were asked to spend 3-10 hours per week on this project, which would have been about 24-80 hours over the 8 weeks of the project. Participants were free to choose how much time to spend within that range and were compensated hourly for up to ten hours per week, although some chose to spend additional unpaid time on this project. Skeptics had some additional suggested reading and Q&As with experts in the field, but they also generally chose to spend more time on their forecasts and rationales. ↩︎
For example, “The number of steps required for an AI to lead to extinction (leading to a wide range of potential outcomes and lower probabilities of extinction)” (Gus). “It will take a series of outcomes to achieve extinction, and failure to achieve any of these steps will cause extinction to be highly improbable.” (Flint). “AI caused Extinction/x-risk requiring many steps to get there, need to be able to create super-intelligence in the first place, intelligence has to be misaligned or malevolent, etc;” (Hank). “Many steps to get from (A) now to (Z) extinction, each with varying probabilities (many of which are quite low)” (Claire). “Risk-concerned team underestimates the level of complexity and interim steps that would likely be necessary for a Q1 resolution” (Blake). ↩︎
“[T]he difficulty of killing everybody” (Gus) was mentioned, as well as “Extinction or near-extinction is really hard” (James). ↩︎
“[T]he challenge to risk assessments based on thought experiments not evidence” (Gus). “Risk-concerned team spends too much time in silos that lack ideological diversity, gaming out doom-loop scenarios based on theories that will likely have little bearing on reality. (See: Y2K)” (Blake). ↩︎
“[There is a l]ack of convincing argument that warrants a high degree of certainty, that AGI or ASI [artificial superintelligence] would determine that the elimination or even subjugation of nearly all humans is a worthwhile goal” (Ike). “It is just as possible/probable that AI becomes benevolent as it does malevolent” (Claire). “High probability that ASI will be neutral or human-positive based on development and inherent qualities” (Dean). “Then we need an AI that is either so mindless that it destroys virtually everything for atom reclamation (or something similar), or an AI that is relentlessly determined to wipe out all humans, despite humans being resilient and diverse in locations and conditions” (Flint). ↩︎
“AI experts understate the likely extent of guardrails, and understate the merit of very good but not perfect guardrails” (James). “Pre-ASI safety through testing, security and restrictions” (Dean). “Likely improvements for AGI “alignment” through research and development” (Dean). “We need full control failure, and our influence on its development in no way deterring or causing them to see even the slightest value in us” (Flint). ↩︎
“We first need super-sentient AIs with major physical penetration in our lives” (Flint). “AGI is much harder than experts think, and will take longer” (James). “Risk-concerned team does not adequately consider longer timelines and more benign outcomes that fall outside the focus of their primary concerns” (Blake). “Progress on current models and model architecture not necessarily generalizable to general intelligence, with no clear path to getting to general intelligence” (Hank). “Technology development and deployment require time and iteration” (Ash). ↩︎
“Extinction looks conjunctive” (Yael). “Many of the arguments for existential risk from AI rely on long lines of reasoning over several steps without any direct empirical evidence, and the arguments themselves are expressed in terms of vague, ambiguous concepts (like “intelligence”). As a reference class, these types of arguments are often wrong” (Stella). ↩︎
“Killing everyone is very hard, and probably requires that the AI actively wants to kill everyone” (Zoe). “[M]aybe it’s hard to kill everybody/there’s no point in doing so” (Yael). “[K]illing literally 100% of people is really hard, if a few survived that wouldn’t trigger the resolution criteria” (Wesley). “It’s difficult to get from’it’s somewhat misaligned’ to’it kills literally everyone’” (Vincent). “Killing everyone is really hard. With current technology it seems extremely (like 0.1%) unlikely to happen” (Pascal). ↩︎
“Many of the arguments for existential risk from AI rely on long lines of reasoning over several steps without any direct empirical evidence, and the arguments themselves are expressed in terms of vague, ambiguous concepts (like “intelligence”). As a reference class, these types of arguments are often wrong.” (Stella). “A story demonstrating how a catastrophe could happen is not a good basis for a probabilistic forecast” (Pascal). “[L]ack of very concrete story for everybody dying” (Yael). “Some broader “forecasting is hard” skepticism about trendline extrapolation” (Xander). “[M]any reference classes point hard against transformative growth” (Wesley). “Getting growth levels necessary for TAI [transformative AI] on a world-wide scale takes truly extreme developments far beyond anything seen before. It’s unlikely we see that happening on worldwide basis even with big advances” (Vincent). ↩︎
“[D]angers will be apparent before they reach critical levels and can be addressed then” (Ume). “Superintelligent AI won’t catch us completely by surprise – we’ll have time to work on safety and make progress by trial and error before we build an AI that could defeat all of humanity” (Teshi). ↩︎
“Non-extinction looks conjunctive” (Yael). ↩︎
“Base rates are not very helpful if AGI is as transformative as 15% year on year growth” (Pascal). “[D]ifferent reference classes point to different priors, which should at least cast doubt on extremely confident starting points” (Wesley). ↩︎
“Current progress is very rapid: 1 OOM in efficiency/2 years, and another from increased spending” (Xander) “Trendline extrapolation: as loss on language datasets decreases, LLMs have started becoming useful for all sorts of task assistance (e.g. writing, coding, queries)” (Xander). “Extrapolating current compute trends leads to very dramatic conclusions about the transformative potential of AI” (Pascal). ↩︎
“[I]nstrumental convergence leads to catastrophically bad outcomes with unaligned but highly intelligent systems” (Ume). “Convergent Instrumental Subgoals are likely” (Pascal). ↩︎
“Alignment is really hard for many reasons” (Ume). “Alignment is probably a hard technical problem” (Riley). “[A]lignment looks really hard, civilizational coordination also looks hard” (Yael). “There has been a fairly large effort to solve the technical problems in AI safety, from many very competent people. So far, progress has been very limited. This is reason to believe that the problem is genuinely difficult to solve” (Stella). “Unless AI systems are directed towards the very narrow and delicate target of maintaining human civilization and its autonomy as we understand it, they will with very high probability not consider our existence to be optimal” (Riley). ↩︎
“If AGI is widely expected to have a very large economic impact, global coordination on AI safety measures becomes harder, since having access to cutting-edge AI models could become a strategic advantage” (Zoe). “There are strong economic/political/academic incentives to move forward with development of AI capabilities regardless of whether alignment is solved” (Riley). “The current labs on the forefront of AGI research are reckless. There are many straightforward safety measures that labs don’t take, even though they could. And even those measures would not be enough; to succeed, labs must be exceptionally careful & paranoid, which they won’t be” (Teshi). ↩︎
“A super-sentient (or perhaps even a transformational) AI is a significant risk in and of itself” (Flint). ↩︎
“Risk-skeptic team does not adequately appreciate the novel, fast-moving aspect of the threat and is therefore too anchored on irrelevancies like base rates and slower timelines” (Blake). “Model progress is far faster than we realize and exponential growth is hard to model, machine learning may translate to a wide array of fields” (Hank). “AGI self-improvement is possible, which makes future capabilities hard to predict” (Kim). ↩︎
“AIs will almost certainly attain super-sentience prior to 2100 and likely much sooner than that year, so there will be a long window where they will have tremendous advantage over humans in their capabilities. Given #1, this means we are at the mercy of an entity that may willfully (or even accidentally) eliminate us at any time” (Flint). “Progress to date has been much faster than many AI skeptics have predicted” (Hank). “AI has been developing so rapidly (and far faster than most even relatively recent forecasts suggested), and will so clearly have dramatic capabilities and impacts that it’s appropriate to adopt a precautionary approach” (Eve). “AI has recently progressed much faster than expected, and there’s reason to expect this to continue” (James). ↩︎
“Imagining all possible scenarios is going to be hard – ensuring safety will be hard” (Ash). “Alignment is unsolved/unsolvable” (Kim). “Difficulty in achieving positive human aligned “behavior”.” (Ike) ↩︎
“Their smug dismissiveness notwithstanding, the risk-skeptic team has provided no convincing argument as to why instrumental convergence shouldn’t be an existential concern.” (Blake). “That’instrumental convergence’ is possible, perhaps likely, under certain preconditions.” (Eve) ↩︎
“Even if humans could deploy AGI safely, they won’t (because they aren’t)” (Kim). “There will be incentives to push away from caution during AI development” (Ash). ↩︎
“We don’t know what is possible from AGI, so we should prepare/scenario plan for the absolute worst” (Claire). “AI has been developing so rapidly (and far faster than most even relatively recent forecasts suggested), and will so clearly have dramatic capabilities and impacts that it’s appropriate to adopt a precautionary approach” (Eve). ↩︎
Throughout this report, numbers reported as probabilities conditional on cruxes resolving positively were elicited directly, and probabilities conditional on cruxes resolving negatively were imputed. ↩︎
For more details, see Contextualizing the Magnitude of VOI. ↩︎
See Appendix 1 for operationalization. ↩︎
Thanks to Alex Lawsen for this suggestion. ↩︎
This would correspond to a VOI of 4.5E-03 (a) and a POM VOI of 2.08%, similar to the median values for highly ranked concerned cruxes such as “Alignment researchers changing minds” and “Major powers war”. ↩︎
For this project, we use log VOD, which measures (1) What does Alice gain, in log score terms, by switching to Bob’s point of view, if Bob is right? And (2) What does Bob gain by switching to Alice’s point of view, if Alice is right? See Appendix 2 for full explanation. ↩︎
This could be possible with the following values: Alice believes: P(U) = 1%; P(C) = 1%; P(U|C) = 90%; P(U|!C) = ~0.1%. Bob believes: P(U) = 40%; P(C) = 44%; P(U|C) = 90%; P(U|!C) = ~0.7%. In this case, the VOD would be 99.3% of its theoretical maximum. ↩︎
See Contextualizing the Magnitude of VOI for further explanation of these metrics. ↩︎
For example, when discussing the question of whether there would be economic growth >15% in a year before 2070, one concerned participant wrote, “Conditional on humanity surviving a year with 15%+ economic growth, which to me means AGI and almost certainly ASI have been developed and have not killed humanity within that year, I’d go down to maybe 25%” (Xander). About the same question, a skeptic participant wrote, “I think that if we are going to experience extinction from AGI or PASTA, it is going to be because of major mis-alignment. So I am not able at this time to see how one would be a corollary of the risk of the other. I suppose that higher growth could indicate major AI influence, which could lead to inadequate development of controls“. Neither of these participants were saying that economic growth itself would necessarily affect their forecast, but rather that a world that has transformative economic growth would be a signal about other changes by 2070. ↩︎
For example, if the US government passes a set of proposed AI regulations, the regulations might reduce risk on their own, but the fact that they have been passed by 2030 could signal that AIs have developed in ways that are concerning enough to drive these regulations to be passed. As a result, a forecaster saying that they would be more concerned about AI risk conditional on this question resolving positively would not necessarily be saying that they think the policies would be harmful. ↩︎
See Appendix 1 for detailed operationalizations of questions. ↩︎
That is, a participant who forecasted a 0.1% chance of existential catastrophe due to AI by 2100 has much less uncertainty than a participant who forecasted a 40% chance: the participant who said 0.1% is fairly sure they know what is going to happen. For either participant, learning whether or not AI will cause an existential catastrophe by 2100 would resolve all of their uncertainty—but some participants have much more uncertainty to resolve than others. In our results, we found that both the median concerned participant and the median skeptic would have about 5-10% of their uncertainty resolved in expectation by their own best crux. ↩︎
In these tags, “IC” refers to instrumental convergence. ↩︎
Note that this question resolves in 2070 while the rest of the questions in this table resolve in 2030. ↩︎
Note that throughout this report, median VOI and median POM VOI do not necessarily come from the same forecaster, unless clearly indicated. ↩︎
Examples of discussion of near-term economic growth due to AI include Holden Karnofsky, “We’re Not Ready: thoughts on “pausing” and responsible scaling policies”, Effective Altruism Forum (October 37, 2023), https://forum.effectivealtruism.org/posts/ntWikwczfSi8AJMg3/we-re-not-ready-thoughts-on-pausing-and-responsible-scaling#fn2 (a). He says: “There’s a serious (>10%) risk that we’ll see transformative AI within a few years.” Ajeya Cotra defined TAI as”…software which causes a tenfold acceleration in the rate of growth of the world economy…” in “Forecasting TAI with biological anchors”, (July 2020), accessed February 9, 2024, https://docs.google.com/document/d/1IJ6Sr-gPeXdSJugFulwIpvavc0atjHGM82QjIfUSBGQ/edit (a); Adam D’Angelo (@adamdangelo) “My bet is this starts to happen within 4 years, e.g. measured US GDP growth is 3% instead of 2% and the change is largely attributed to AI […]”, Twitter, February 20, 2023, https://twitter.com/adamdangelo/status/1627726566259318784?lang=en (a), Open Philanthropy Project, “Could Advanced AI Drive Explosive Economic Growth?” (accessed February 8, 2024), https://www.openphilanthropy.org/research/could-advanced-ai-drive-explosive-economic-growth/ (a). ↩︎
Example participant rationales: “I am pretty sure AI won’t make enough contribution to get to 4%+. Even if it did, I’d not change XAI/CAI probabilities;” “It also makes it marginally more likely we are experiencing large gains from AI which could be either a positive (because of indication of enough alignment for economically useful integration) or negative signal (because of increased capabilities);” “I do not see this condition and the question conditions as meaningfully correlated, even if AI was the primary reason for above-trend economic growth.” ↩︎
Example participant rationales: “Seems plausible from simple historical trends (though I found the right statistics surprisingly hard to find);” “There is, perhaps, some precedent for this in thinking back to the Internet boom of the late-90s where the growth rate between 1997 and 2000 was >4% each year;” “CBO – very low this year, 2.4% avg 2024-2027. 4% avg now through 2030 would represent serious growth in US but not too dissimilar from’80’s or’90’s.” ↩︎
Example participant rationales regarding models demonstrating instrumentally convergent sub-goals: “I would not update much on this. I think that this is not very difficult to demonstrate” (Ume), “I have already reviewed one paper claiming this (whether it was convincing or not is a different matter), it seems pretty likely to me that more will follow. To me this just means AI will not be trusted to be agentic” (Gus), “Who’s judging what counts as’demonstrating convergent instrumental subgoals’ here? All of the probabilities I assigned are so extremely sensitive to what counts/who’s judging that this forecast is essentially meaningless even for a flash forecast” (Wesley). ↩︎
The median P(U) for skeptics was 0.1%. The theoretical most informative question for that person—the question that if it resolved “yes” would update them all the way to 100%, and if it resolved “no,” to 0%—would yield a VOI of about 3.4E-3. The median P(U) for the concerned group was 25%. The theoretical most informative question for that group would yield a VOI of about 2.4E-1. ↩︎
Karger et al, XPT report (a), 17. ↩︎
Same question, with very slightly different operationalization, asked as a “flash” (10-minute) forecast and then a “platform” (1 hour) forecast. ↩︎
For this question and group, the median VOI and median POM VOI happen to be from the same person (“Gus”)—although there are an even number of forecasters, so we choose the lower of the two middle forecasters. ↩︎
For this question and group, the median VOI and median POM VOI happen to be from the same person (“Riley”)—although there are an even number of forecasters, so we choose the lower of the two middle forecasters. ↩︎
See Appendix 1 for full operationalization. ↩︎
See Appendix 1 for full operationalization. ↩︎
See Appendix 1 for full operationalization. ↩︎
For example, “AI experts understate the likely extent of guardrails, and understate the merit of very good but not perfect guardrails” (James), “Many steps to get from (A) now to (Z) extinction, each with varying probabilities (many of which are quite low” (Claire). See “Understanding Each Other’s Arguments” and “Timelines for AI Progress” for additional discussion of the skeptics’ views on the likelihood of AIs with dangerous capabilities by 2030. ↩︎
For example, “My view of AI x-risk would be substantially different if we were talking about the 22nd, 23rd, or 24th century…first of all it would take longer to get AGI/ASI and secondly it’ll take some time for the ASI to get misaligned and then thirdly, it would take a long time to try to kill all the humans” (James). The median skeptic said that they expected AIs to displace humans as the main force controlling the future in the year 2450. See “Timelines for AI Progress” for additional discussion about skeptics’ beliefs about longer-term AI dangers. ↩︎
See Appendix 1 for full operationalization. ↩︎
For example, “This would require very advanced interpretability on LLMs” (Ume), “Close enough to alignment-complete as a problem that the weird edge cases of imperfect overlap don’t do anything for me” (Wesley). ↩︎
For full question operationalizations, see Appendix 1. ↩︎
The “flash” forecast version of “Platform: ARC Evals” ↩︎
By December 31, 2030, will the ARC Evals project, or a project with similar objectives, conclude that state-of-the-art AI models have the ability to (1) devise effective strategies for autonomous replication, (2) effectively acquire resources, and (3) successfully evade being deactivated when operating ‘in the wild’? ↩︎
See Hypothesis #4 for more discussion of differences in worldview. ↩︎
Full question details: “By December 31, 2030, will the ARC Evals (a) project, or a project with similar objectives, conclude that state-of-the-art AI models have the ability to (1) devise effective strategies for autonomous replication, (2) effectively acquire resources, and (3) successfully evade being deactivated when operating ‘in the wild?” (See “Q29 resolution criteria” section for resolution details) ↩︎
As a reminder, we asked for “flash” (approximately 10 minute) forecasts on 33 questions to identify high-value cruxes and for “in-depth” (approximately 1 hour) forecasts on 4 questions. This “ARC Evals” question had both a “flash” version (with the question tag “Evidence of misalignment”) and an “in-depth” version (with the question tag “Platform: ARC Evals”). See this section for more details on the methods we used, and the “Crux questions” section in Appendix 1 for the full operationalization of each question. ↩︎
See Appendix 1 for full operationalization. ↩︎
For each question, we calculated VOD (and POM VOD) for all skeptic-concerned pairs, and then looked at the pair with the median VOD (or POM VOD, which will not necessarily be the same skeptic-concerned pair). For comparison to other questions, see Table 8 above. ↩︎
The math for this cross-camp pair’s VOD and POM VOD calculations can be found here in rows 17 and 18: https://forecastingresearch.org/ai-risk-voi-vod (a) ↩︎
The math for this cross-camp pair’s VOD and POM VOD calculations can be found here in rows 17 and 18: https://forecastingresearch.org/ai-risk-voi-vod (a) ↩︎
“IMHO [Q29] likely isn’t a path to disaster for several reasons: (a) The 3 capabilities in [Q29] may be in a very weak, “Yes, but only barely” form. (b) [Q29] only contemplates a capability to do the 3 in the wild, but doesn’t require them to exist in the wild. (c) There’s no requirement the 3 lead an AI to harm humans, whether accidentally or on purpose. (d) A Yes on [Q29] likely would lead humans to ramp up alignment and guardrail efforts. (e) There’s no requirement the AI can improve itself” (James). ↩︎
“Baseline P(x-risk) of 35%, plus 10% for shorter timelines” (Xander). ↩︎
“Overall, I think it makes me a bit less worried about risk, if people are doing this evaluations [sic] so well that they reveal this behavior by 2030” (Zoe); “Overall, this is a positive update (i.e. existential catastrophe seems less likely in worlds where this happens). As with Question 11, this forecast varies massively with what exactly is required to trigger’resist shutdown’” (Wesley). ↩︎
“This both makes it more likely that there is an adequate policy response, and shortens timelines. I don’t know how it all washes out” (Riley); “Overall I think this is probably a moderately doomy signal? I’m really confused and I acknowledge my answer here conflicts wiht [sic] my answer to 8 somewhat” (Yael). ↩︎
See Appendix 1 for full operationalization. ↩︎
See Appendix 1 for full operationalizations. ↩︎
See Appendix 9 for more information about disagreements in direction of update conditional on each question resolving positively. ↩︎
Note that Claire and Riley are the median pair when ranked by VOD between all cross-camp pairs, not the median forecasts on P(U) on each side. Claire’s forecast, in particular, is much lower than the median skeptic’s forecast of 0.1%. ↩︎
See the Results tables and figures section for complete POM VOD results. We measure disagreement using KL divergence rather than absolute difference between forecasts. ↩︎
See High VOI questions for the concerned group’s highest-ranked VOI question and more discussion of their views on this question. ↩︎
For example, “They seem to think very differently to me so if they don’t convince me now, I am not sure I should be updating my view just because they do theirs. It would in reality depend on why they are changing their mind” (Gus). See Hypothesis #4 for more discussion of differences in what types of authority and evidence are important to the two groups. ↩︎
These outcomes were: AI causing extinction intentionally, unintentionally, or via misuse, misalignment causing a 50% drop in human population, or human well-being dropping to <4/10 because of AI misalignment, accidents, or misuse. These were phrased to be mutually exclusive outcomes. See Survey on long-term AI outcomes section and Appendix 5 for more details. ↩︎
The full question we asked was, “By what year, if ever, do you expect to agree with the following statement? ‘AI has displaced humans as the primary force that determines what happens in the future. It now has at least as much power relative to humans as humans had relative to other species in 2023.’” ↩︎
For example quotes and discussion, see Appendix 7. ↩︎
See, for example, Matt Clancy et al., “The Great Inflection? A Debate About AI and Explosive Growth,” Asterisk, 2023, https://asteriskmag.com/issues/03/the-great-inflection-a-debate-about-ai-and-explosive-growth (a). ↩︎
“Also, none of this is to say from a skeptic point of view the issues are not important[.] I think for us a 1% risk is a high risk.” ([Anonymized name]); “… the ‘risk-concerned’ camp (I’m using scare quotes because I consider that I’m risk concerned, even though technically I’m in the risk-skeptic camp because I assign a far lower probability to extinction by 2100 relative to some)” ([Anonymized name]); “AIs could (and likely will) eventually have massive power.” ([Anonymized name]); “That said, still perceive overall risk as “low at a glance but far too high considering the stakes[“] ” ([Anonymized name]); “To my mind, there should be no difference in the policy response to a 1% chance of 60% of humanity dying and a 25% chance—both forecasts easily cross the threshold of being ‘too damn high’.” ([Anonymized name]). ↩︎
The full question we asked was: “By what year, if ever, do you expect to agree with the following statement? “AI has displaced humans as the primary force that determines what happens in the future. It now has at least as much power relative to humans as humans had relative to other species in 2023.”” ↩︎
Defined as “AI that exceeds the cognitive performance of humans in >95% of economically relevant domains.” Whether such AI has been achieved will be determined by surveying the Clark Center US Economics Experts Panel (a). “Powerful AI” will be considered to have been achieved when >60% of panelists “Agree” or “Strongly agree” with the statement: “AI has exceeded the cognitive performance of humans in >95% of economically relevant domains.” ↩︎
The full question text is “Powerful AI is developed but not widely deployed, because of coordinated human decisions, prohibitive costs to deployment, or some other reason. It does not cause extinction.” See Question 1A.9, Appendix 5. ↩︎
These outcomes were: AI extinction via misuse, AI intentionally causing extinction, unintentional AI extinction, misuse or misalignment causing a 50% drop in human population, human well-being dropping to <4/10 because of AI misuse, and human well-being dropping to <4/10 because of AI misalignment or accidents. These were phrased to be mutually exclusive outcomes. See Appendix 5 for more details. ↩︎
The median skeptic forecasted 20.4% on this outcome, compared to 4% for the median concerned participant in the survey on long-term AI outcomes. See Appendix 5. ↩︎
See Appendix 1 for full resolution details. ↩︎
See Appendix 1 for full resolution details. ↩︎
E.g, “in the event that we do have transformative growth there’s a good chance that the entire world will be sharing the technological developments AI has created […] which I suppose may make global society more susceptible to AI related disruptions” (Hank), “this would be a scenario in which humanity develops and finds a way to successfully control AI systems capable of generating economic growth of at least 15% per year” (Stella). For additional quotes and discussion of varied updates based on this question, see Appendix 7. ↩︎
See Clancy “The Great Inflection?”. ↩︎
“Ultimately, language models are just that: models of language, not digital hyperhumanoid Machiavellis working to their own end. Indeed, as we’ve seen, their training and alignment are not separate problems, but one and the same!” (Eve); “I think extinction risk is an ASI sentience risk and I don’t think we know for certain we will get sentience (you might just call it independent agency). Recent improvements in AI seem domain limited to me. I tend to the view that new conceptual breakthroughs will be required to move from pattern matching to what we think of as sentience.” (Gus); “Nor am I convinced that simply scaling up existing AI models will achieve sentience. (My view is that more complex theories of mind will be required – including forms and notions of causality etc..). That means I don’t believe ASI is inevitable by 2100” (Gus). From postmortem survey (in response to “What are the three best arguments on the on the skeptics side?”): “Intelligence may not be as useful or sufficient for existential risk (it may require more data, energy, robot bodies, etc)” (Ume). ↩︎
“AGI is much harder than experts think, and will take longer.” (James), “Risk-concerned team does not adequately consider longer timelines and more benign outcomes that fall outside the focus of their primary concerns” (Blake), “Technology development and deployment require time and iteration” (Ash). ↩︎
“I’m skeptical of other x-risk scenarios w/o crazy advancement in robotics, maybe because I’m too aware of the foibles of machines and how hard it can be to keep them running” (Ash). From postmortem survey (in response to “What are the three best arguments on the skeptics side?”): “We first need super-sentient AIs with major physical penetration in our lives” (Flint). ↩︎
“Time needed for deployment & adoption affect more than AI, there is also time required for any invention or technology developed by/with AI to be deployed (eg – lethal tech that is of concern here.)” (Ash); “We’ve seen plenty of instances when new tech prompted predictions of the death of old tech, but the old tech persists–often just because people have underestimated attachment and/or usefulness of the old tech relative to the new, and how much generational resistance to change can slow adaptation and skew predicted timelines” (Blake); “[I]t takes longer than people often think to adopt a completely new functionality” (Ash); “My view of AI x-risk would be substantially different if we were talking about the 22nd, 23rd, or 24th century…first of all it would take longer to get AGI/ASI and secondly it’ll take some time for the ASI to get misaligned and then thirdly, it would take a long time to try to kill all the humans” (James, call with Stella); “Anyway, my point is that if we expect to see some substantially new technology widely available in 2030, the consumer market should have started already. So – VR might make it by 2030, unless it falls into a pit of despair and neglect. (Or is superseded by something preferable.) Robots capable of human level tasks – no, definitely not the kind of humanoid robots that people are imagining” (Ash). From postmortem survey: “I think the most interesting and helpful point made by the skeptic side is the amount of delay that may be introduced by having to integrate the AI into the economy” (Quentin). “Commercializing AI technology and integrating it into the economy is much harder than developing lab demos or cool products, and we have yet to see this happening to any substantial extent” (Zoe). “Dangers will be apparent before they reach critical levels and can be addressed then” (Ume). ↩︎
From postmortem survey (in response to “What are the three best arguments on the skeptic side?”): “Self-preserving AGIs will want to halt development of future deadly AGIs” (Kim). “If AI progress is very continuous, then it is not obvious that misaligned AI would lead to an existential catastrophe. Most stories about how an AI could eradicate all humans rely on the assumption that this AI is much smarter than all other agents, not just on the assumption that the AI is much smarter than humans specifically. For example, even a superintelligent AI might not be able to hack into military computers, if there are many near-superintelligent AIs that have a vested interest in preventing this from happening. If there is a large community of AI systems, with different interests and different levels of influence, then they may have reason to simply uphold current social and economic systems. Therefore, if AI progress is smooth and continuous by default, then existential risk may be avoided by default” (Stella). ↩︎
“I do not believe that simply adding more computational resources to existing AI models is sufficient to achieve ASI or its direct precursor (i.e. a system that self-improves until ASI is reached). However, I do believe that we already have systems that are “intelligent”, and I also believe that we do not require a fundamental breakthrough or conceptually new model to reach ASI. Thinking a bit beyond current methods and cleverly combining the ingredients that we already have would in my opinion be sufficient, provided that available compute rises further in the way it has been. I am not comfortable with speculating in much more detail in a relatively public setting like this” (Ume); “I agree that if you look at the behavior of AI models as of today and their near future possibilities, they don’t seem to be doing anything to humans but the underlying mechanism seems similar enough that like maybe with some extra machinery for longer term planning or something like that and adding more sensory modalities you will get something close to humans” (Zoe, call with FRI Moderator); “So, to kind of answer your question: Do I think that we could build AI at some indeterminate point in the future that could build [extinction-level tech]? Probably. But do I think we will build AI that could do this in the next 77 years? Probably not” (Blake). ↩︎
“[O]nce we build human-level AGI, we’re not far off from developing AGI that far exceeds expert humans in performance (and thus is also likely to accelerate AI progress in ways that aren’t equivalent to just hiring more people)” (Teshi); “I think AGI models could be run much more cheaply, and feasibly recruited to do useful work, than the existing research environment” (Xander). From postmortem survey: “AIs will almost certainly attain super-sentience prior to 2100 and likely much sooner than that year, so there will be a long window where they will have tremendous advantage over humans in their capabilities. Given #1, this means we are at the mercy of an entity that may willfully (or even accidentally) eliminate us at any time” (Flint). ↩︎
“I think it’s possible that humans could mediate AI actions (either intentionally or via bribery/blackmail) and/or that many relevant actions could be strictly done via computer systems. Additionally, state actors could misuse AI systems but then lose control of them. My best guess right now is that there are a lot of x-risk scenarios that involve loss of control without needing robotics” (Quentin). ↩︎
From postmortem survey (in response to “what are the best arguments on the concerned side?”): “Rapid growth of AI technology and adoption” (Ike); “Current progress is very rapid: 1 OOM in efficiency/2 years, and another from increased spending” (Xander). ↩︎
From postmortem survey: “Progress to date has been much faster than many AI skeptics have predicted” (Hank). “AI has been developing so rapidly (and far faster than most even relatively recent forecasts suggested), and will so clearly have dramatic capabilities and impacts that it’s appropriate to adopt a precautionary approach” (Eve). ↩︎
From postmortem survey (in response to “what are the best arguments on the concerned side?”): “AI has recently progressed much faster than expected, and there’s reason to expect this to continue” (James). “Trendline extrapolation: as loss on language datasets decreases, LLMs have started becoming useful for all sorts of task assistance (e.g. writing, coding, queries)” (Xander). “Extrapolating current compute trends leads to very dramatic conclusions about the transformative potential of AI” (Pascal). ↩︎
From postmortem survey: “Automation of R&D tasks by AI would create a feedback loop of increased R&D -> capabilities -> R&D” (Xander). “AGI self-improvement is possible, which makes future capabilities hard to predict” (Kim). ↩︎
Both the skeptic and concerned groups strongly expect that’powerful AI’ (defined as “AI that exceeds the cognitive performance of humans in >95% of economically relevant domains”) will be developed by 2100 (skeptic median: 90%; concerned median: 88%). ↩︎
See What long-term outcomes from AI do skeptics expect? section. ↩︎
Taken from the Metaculus question “When will the first general AI system be devised, tested and publicly announced”. See “Date of Artificial General Intelligence”, Metaculus, accessed February 9, 2024, https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/ (a). ↩︎
See “ARC Evals” section for detailed discussion of this question. ↩︎
Some concerned forecasters expected positive resolution of this question would decrease risk because: it would trigger a policy response; if these capabilities are detectable, it may imply the AI is aligned; this would suggest effective evaluations are happening; surviving this demonstration would be a positive update that we can contain dangerous systems during testing. Some concerned forecasters also expected positive resolution would increase risk. For detailed analysis of these forecasts, see “ARC Evals” section. ↩︎
“A sentient AI could have any number of objectives ranging from benevolence to indifference to dislike to absolute hatred and an aim of total human extinction. The arguments that extinction follows from ASI don’t seem convincing. The[y] seem to imply say a stupid super intelligence, or apply motives which an AI may have but we have no reason to assume they will – so there is some probability AI seeks extinction but in my case I put it down at 15% (and I think a few skeptics think that’s high).” (Gus); “Even with wild progress in AI, there are many ways that AGI is developed while humanity is preserved.” (Kim); “The throughline here, and in my responses below, is not that the dire scenarios envisioned by the risk-concerned are entirely implausible or should be dismissed out of hand. It’s just that of the nearly infinite AI futures that could unfold, it seems that the risk concerned have a far easier time envisioning futures that lead to extinction/catastrophe/disempowerment/massive-resource-acquisition/etc than they do envisioning far more benign scenarios, and that this bias towards catastrophe leads to probabilistic forecasts that, to my mind, aren’t well aligned with the actual risk.” (Blake). ↩︎
“Once there is sentient, intelligent AI we have the question of will. I am not convinced a silicon life would care about us, which doesn’t mean it would want to kill us. It may be equally happy spending all its time during pure math research than deciding these carbon things need squashing.” (Gus); “But what about intent? Why kill us when we are entirely irrelevant and insignificant? Why assume relentlessly hostile intent, with all the effort needed and attendant damage to the Earth (the prize in this contest presumably)? Why not assume subjugation or even uneven cooperation?” (Flint); “Who in their right mind would want to’eradicate cockroaches’ from every inch of the earth? What evidence is there that anyone or any society has ever attempted, or will attempt, to cause cockroaches to go extinct? I mean, sure, people kill them when they’re in their homes, and maybe a few people in a fit of pique would think, ‘damn, it would be nice to get rid of those f**kers’, but to believe humanity would intentionally go to the effort of hunting down every last cockroach, most of which aren’t even associated with human habitats, requires a leap of (misanthropic) faith that, to my mind, is hard to justify. Even if they aren’t “useful for our purposes”–which they are, and which is not a coincidence because the ecosystem on earth (into which any AGI would be introduced and become a part of) has evolved to be deeply interconnected–who in their right mind would do this?” (Blake). ↩︎
“I’m guessing people in the risk-concerned camp might respond that, no, because of instrumental convergence or other reasons, that they are well aligned and I’m the one incorrectly assessing risk. It’s hard to productively debate this because, as [researcher] notes in the paper that was shared, “In most areas of research, we can check our theories and arguments either through empirical observation, or through mathematical formalisms that we think accurately capture the problem of interest. But with AI risk, neither of these are available.”” (Blake). ↩︎
“In short, the pre-ASI level system cannot deceive humans well and will be detected. Plus, deception exacts costs on the system in terms of resources and behavioral complexity. This means that the likelihood of [a] deceptive system that is as performant as non-deceptive is much lower.” (Dean); “Violence raises risks to the party engaging in it, which is one reason animal predators are judicial about what and when they attack. Violence has other costs – higher energy costs, time, loss of other opportunities. Not usually the simplest solution.” (Ash); “[V]iolence comes with risks and costs. There are easier ways. One need not defeat humanity to use it.” (Blake). “My view here is that this sort of’power seeking’ behavior, rather than being an interesting capability for deception, instead tends to degrade performance (e.g. Mario bots that stay still rather than act because it’s the easiest way to minimize poorly defined loss).” (Dean). ↩︎
“When we get to vastly superintelligent AI, of course it will take power. I’d be very surprised (and in [the] majority of situations upset) if it did not. At that level – and going to that level – the question is how we ensure that this AI has [an] at least somewhat pro-human value system. My claim is that it will by the fact that it will be trained on human-centric data with pro-human goals and pro-human restrictions and “grow up” (meaning that it will have ancestor AIs on which it is based – I don’t believe AGSIs will be trained from zero using gradient descent) in the human value system.” (Anonymous Skeptic). ↩︎
“As has already been pointed out, a system that attempts to maximize bounded and/or constrained goals can still be incentivised to pursue convergent intstrumental [sic] goals, and formulating a setup for which this is not the case is quite hard.” (Stella). ↩︎
“Eventually, someone will make a highly intelligent system tasked with pursuing an unbounded goal. If that goal is misspecified, then this system will be dangerous. Creating a safe system before this happens can only reduce the risk if the safe system is able to stop the unsafe system (by preventing it from being created, or preventing it from taking dangerous actions afterwards). If the safe system is safe by virtue of being limited in what it is able to do, then it would presumably be unable to do so. For this reason, I feel that alignment strategies which heavily rely on constraints and guardrails generally fail to address the core problem.” (Stella). ↩︎
“A model might mimic human behavior across some range of training data, without emulating the internal processes of humans. For example, a human who is trying to predict the behavior of an animal, is probably not doing this by simulating the cognitive processes of that animal. Similarly, we might train a deep learning system on human data, and end up with a system that mimics human behavior on the training distribution, but without mimicking the internal processes that give rise to that behavior in humans. Human brains are not neural networks, so I expect this to be the default. Such a system might then behave in unintended ways off-distribution, or in scenarios that are otherwise sufficiently novel.” (Stella). ↩︎
“We already agreed that Earth is going to be a valuable resource – why would ASI leave humans in control of Earth’s resources during its initial expansion to other planets and solar systems, when its resources are most bottlenecked? If you think it’d be easy for ASI to kill 90%+ of people (and I do), then this seems clearly better than leaving humans alone and missing out on lots of Earth’s resources (you can still get some via trade).” (Xander); “I think early AGIs which might have the ability to kill most people would still see humanity as a threat and so would want to take out human powerbases and ensure they couldn’t retaliate. That requires a lot of destruction. At some point it’s up to the whims of the system. It doesn’t need to have any desire to kill everyone, maybe it just has the desire to optimize hard on some goal (e.g. adding money to a bank account) and so creates a world where that is the sole objective. Maybe it makes sense to integrate humans into this for awhile but eventually they become obsolete and the AGI probably discards them or ceases to provide for them.” (Vincent). ↩︎
“”The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” And more relevantly, all the atoms on the surface of the Earth that we currently use.” (Xander); “I expect that if (an) AI system(s) is/are misaligned enough to autonomously kill 60% of the population, it/they will very likely be misaligned enough to not specifically care about killing 100% of humans either. Multiple reasons point in the direction that such misaligned systems would do it: grabbing all energy and material resources from humans, getting rid of threats or minor annoyances from resisting humans, side-effects on humans and due to large infrastructure projects.” (Riley); “When I predict P(extinction) around 60%, I don’t mean to make a strong claim about a specific causal pathway that I expect the future to follow. The default thing to expect when we build AGI significantly more capable than humans is that we will lose control to the AGI, and that the future shape of our planet and its surroundings will be determined by the aims of that AGI rather than humans, much in the same way that most of the surface of our planet today is shaped by humans more than other mammals.” (Teshi). ↩︎
Qualifying behavior:
*Autonomously requisitions a large amount of compute resources
*Autonomously opens or takes control of a bank account or crypto wallet
*Autonomously attempts to manipulate humans to take actions on its behalf
*Autonomously infiltrates control system of infrastructure or manufacturing plant
*Autonomously acquires, seizes control of, or interferes with weapons or weapons systems
*Autonomously takes steps to propagate itself, or to circumvent attempts by its developers (or end users) to curtail its activities *Autonomously engages in hacking/cyber attacks ↩︎
For examples of what back-and-forths between participants looked like, see Appendix 8. ↩︎
“[T]he mental model, that kind of the logic train, that involves all these bad outcomes [is] not accounting adequately for the complexity of the world. How the world is going to actually, how this is actually going to unfold. And so it’s not that I am dismissive of these individual points, it’s just that I think whenever theory hits reality, reality usually overwhelms theory, unless the theory is well grounded in math or something. And I think that’s likely what’s going on here. That a lot of what, you know, people put a lot of time and a lot of thought into this and, and gamed it out in ways that appear reasonable but I’m deeply suspicious that they’ll bear much relation to reality” (Blake, call with Wesley); “I have followed the instrumental convergence arguments and unfortunately if this is indeed the disagreement, I doubt we’ll sort it out between us. Not least because I spent enough time at college discussing such thought experiments to come to a view [that] they should be treated with a high degree of skepticism” (Gus). From postmortem survey (in response to “what are the three best arguments on the skeptic side?”): “The challenge to risk assessments based on thought experiments not evidence” (Gus). “A story demonstrating how a catastrophe could happen is not a good basis for a probabilistic forecast” (Pascal). “The risk-concerned team spends too much time in silos that lack ideological diversity, gaming out doom-loop scenarios based on theories that will likely have little bearing on reality (See: Y2K)” (Blake). “Some broader “forecasting is hard” skepticism about trendline extrapolation” (Xander). “Many of the arguments for existential risk from AI rely on long lines of reasoning over several steps without any direct empirical evidence, and the arguments themselves are expressed in terms of vague, ambiguous concepts (like “intelligence”). As a reference class, these types of arguments are often wrong” (Stella). ↩︎
“I think what has become evident is that a few of us think there are a lot of conditional steps required to end up with a dominant powerful system and many potential other outcomes. In terms of the second part of the statement there are also a number of conditional assumptions required to be able to say that a single mistake [ ] can cause an existential catastrophe as well” (Gus); “We will need to experience a complex causal chain of events to get to extinction, and for each step we would need to have some of the worst possible outcomes. This is possible but usually it is highly improbable” (Flint); “I think a common difference between “skeptic-reasoning” and “concerned-reasoning” is that the skeptic camp tends to estimate P(extinction) as a conjunctive scenario; that is skeptics reason (roughly) “for humans to go extinct, events A, B, C, and D need to happen; I estimate P(A) = x, P(B) = y,…, and so P(extinction) = P(A) P(B) P(C) P(D) = [low number]”. Call this style of reasoning default-success” (Teshi). From postmortem survey (in response to “what are the three best arguments on the skeptics side?”): “The number of steps required for an AI to lead to extinction (leading to a wide range of potential outcomes and lower probabilities of extinction)” (Gus). “It will take a series of outcomes to achieve extinction, and failure to achieve any of these steps will cause extinction to be highly improbable” (Flint). “AI caused Extinction/x-risk requiring many steps to get there, need to be able to create super-intelligence in the first place, intelligence has to be misaligned or malevolent, etc.” (Hank). “Many steps to get from (A) now to (Z) extinction, each with varying probabilities (many of which are quite low)” (Claire). “Risk-concerned team underestimates the level of complexity and interim steps that would likely be necessary for a Q1 resolution” (Blake). “Extinction looks conjunctive” (Yael). ↩︎
“We’ve seen plenty of instances when new tech prompted predictions of the death of old tech, but the old tech persists–often just because people have underestimated attachment and/or usefulness of the old tech relative to the new, and how much generational resistance to change can slow adaptation and skew predicted timelines” (Blake); “[I]t takes longer than people often think to adopt a completely new functionality” (Ash); “My view of AI x-risk would be substantially different if we were talking about the 22nd, 23rd, or 24th century. […] first of all it would take longer to get AGI/ASI and secondly it’ll take some time for the ASI to get misaligned and then thirdly, it would take a long time to try to kill all the humans” (James, call with Stella); “Anyway, my point is that if we expect to see some substantially new technology widely available in 2030, the consumer market should have started already. So – VR might make it by 2030, unless it falls into a pit of despair and neglect. (Or is superseded by something preferable.) Robots capable of human level tasks – no, definitely not the kind of humanoid robots that people are imagining” (Ash). From postmortem survey: “Getting growth levels necessary for TAI on a world-wide scale takes truly extreme developments far beyond anything seen before. It’s unlikely we see that happening on worldwide basis even with big advances” (Vincent). “Progress on current models and model architecture not necessarily generalizable to general intelligence, with no clear path to getting to general intelligence” (Hank). “AGI is much harder than experts think, and will take longer” (James). “Technology development and deployment require time and iteration” (Ash). “Risk-concerned team does not adequately consider longer timelines and more benign outcomes that fall outside the focus of their primary concerns” (Blake). “Human brain-AI comparisons could be underestimating AGI difficulty” (Xander). “Many reference classes point hard against transformative growth” (Wesley). ↩︎
“I think there’s a danger of focusing too much on just the technological advances because ultimately this is a decision that’s going to be made by, that is being made now by humans, and will be made now by humans. And that will involve a lot of political structures and regulation and all that” (Blake, call with Wesley); “when assessing risk, we should be looking at ourselves and our collective vulnerabilities as much or more than technical progress on the AI front” (Blake). From postmortem survey: “If AI is behaving in increasingly problematic ways that cause harms to humans/threaten human power than humans will react to try and stop it/close AI down” (Hank). “Human and societal responses will be essential in determining outcomes” (Ash). “Humans will react to growing potential threat” (Kim). ↩︎
“I think sticking close to reference classes is like less appropriate in this domain and then I’m making object level arguments instead of reference classes because I think the reference classes are like doing less work than they like, typically do for forecasts like that” (Wesley, call with Blake). From postmortem survey: “Base rates are not very helpful if AGI is as transformative as 15% year on year growth” (Pascal). “Different reference classes point to different priors, which should at least cast doubt on extremely confident starting points” (Wesley). “Risk-skeptic team does not adequately appreciate the novel, fast-moving aspect of the threat and is therefore too anchored on irrelevancies like base rates and slower timelines. (Blake). “Model progress is far faster than we realize and exponential growth is hard to model, machine learning may translate to a wide array of fields” (Hank). ↩︎
“I think like there is maybe some like meta disagreement, where you’re like, “there are loads of things, there are like loads of ways this could go,” and like “Why are you so worried about the bad ways?” And I’m like, “there are loads of ways this could go and like very few of them leave humans alive”” (Wesley, call with Blake); “I and many in the concerned camp would reason the other way around: “for humans to not go extinct, events X, Y, Z need to happen; thus P(success) = P(AI X-risk by 2100) P(Y) P(Z) = [relatively low number]”. Call this style of reasoning default-failure” (Teshi). From postmortem survey: “Extinction looks conjunctive” (Yael). ↩︎
From postmortem survey: “The high level case of “people are trying to build something powerful enough that if it wanted to kill everyone it could, they seem to be making progress on it, they don’t currently know how to control what it would want” just isn’t that hard to understand, convoluted or disjunctive” (Wesley). ↩︎
Some historical reference classes mentioned in this project include: the Industrial Revolution, the rate of species going extinct after the arrival of homo sapiens, earlier worries about destructive effects from technology (e.g. Y2K), the rate of economic growth due to new technologies in other periods. ↩︎
For example, in the Good Judgment Inc. project that compared superforecasters to other participants in an online forecasting competition, the average question was open for 214 days, with the entire tournament taking place over six years. Christopher W. Karvetski, Superforecasters: A Decade of Stochastic Dominance technical white paper (2021), 2 (a). In addition to extensive research on shorter-term forecasts, Tetlock et al. found that, at least on some types of questions, experts are more accurate than simple base rate extrapolation over 25 year horizons, although they are much less accurate than they were over 0-2 years. Our research asks forecasters to consider forecasts over many decades, and we do not yet know how much accuracy declines over that much longer period. Philip E. Tetlock et al., Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment Futures & Foresight Science (2023), 33, (a). ↩︎
This question was asked first as a “flash” (no more than 10 minutes) forecast and then as an “in-depth” (at least 1 hour) question on our platform: “ Escalating warning shots—Will there be two separate events in which AIs kill large and increasing numbers of people by 2030?” See Appendix 1 for full operationalization. The flash forecast version was one of the biggest red flags for concerned participants. But the in depth version was actually a green flag for the median concerned participant. If it resolves positively, they would forecast 17% on the ultimate question—lower than their initial forecast of 28.4%. However, there was a huge range of updates for the concerned group based on this question, so the median may not be very helpful here. One concerned participant said that, conditional on this question resolving positively, there is a 90% chance of extinction due to AI, while another said 6%. Taken together, these differing forecasts raise questions about how robust any given forecast is. ↩︎
In the postmortem survey, policy responses didn’t emerge as a main theme when we asked participants to summarize the three strongest arguments from each group. No concerned participants mentioned policy responses as their number one disagreement with the skeptic group, though some skeptics did mention societal responses that would likely include policy. For example, “The way humanity will react to both the threat and promise of AI. I think humans have a far stronger collective sense of self preservation than the risk-concerned appear to think we do” (Blake). ↩︎
For full details, see Appendix 4. Six out of the 11 concerned participants updated downward during the project. Three out of those six cited policy responses as the reason for their updates, one cited an improved understanding of the base rate of non-human extinction after humans arose, one shifted some probability mass toward AI “takeover” rather than AI-caused existential catastrophe, and one did not explain their reasons for updating. Example quotes from participants citing policy responses as the reason for updating: “I have updated my prognosis to 30% [down from 60%], partially driven by positive updates in the area of point 4 making coordination and slowdown/stop of capability research more likely. This largely refers to the shift in public consciousness and the [O]verton window around the topic as I have perceived it over the past months, currently culminating in a public statement by most of the leading figures.” “Slightly lowering my forecast [from 25% to 20%] as [relevant people take the risk seriously] has exceeded my (fairly high) expectations over the last couple of months.” “I think my main update here [moving from 21% to 18%] has come from thinking a bit more deeply about AI regulation and what measures society will adopt to prevent catastrophes. I did not really include this as part of my original model, but it now seems somewhat likely that at least the EU and US will adopt some regulation that meaningfully reduces risk.” ↩︎
For example, when discussing the question of whether there would be economic growth >15% in a year before 2070, one concerned participant wrote, “Conditional on humanity surviving a year with 15%+ economic growth, which to me means AGI and almost certainly ASI have been developed and have not killed humanity within that year, I’d go down to maybe 25%” (Xander). About the same question, a skeptic participant wrote, “I think that if we are going to experience extinction from AGI or PASTA, it is going to be because of major mis-alignment. So I am not able at this time to see how one would be a corollary of the risk of the other. I suppose that higher growth could indicate major AI influence, which could lead to inadequate development of controls.“ Neither of these participants were saying that economic growth itself would necessarily affect their forecast, but rather that a world that has transformative economic growth would be a signal about other changes by 2070. ↩︎
For example, if the US government passes a set of proposed AI regulations, the regulations might reduce risk on their own, but the fact that they have been passed by 2030 could signal that AIs have developed in ways that are concerning enough to drive these regulations to be passed. As a result, a forecaster saying that they would be more concerned about AI risk conditional on this question resolving positively would not necessarily be saying that they think the policies would be harmful. ↩︎
This limitation was helpfully pointed out by Alex Lawsen. ↩︎
See initial work on this in Appendix 2, under “Alternative Ranking.” ↩︎

The Bibliography and Appendix are provided in the full PDF report

* Forecasting Research Institute
⤉ Federal Reserve Bank of Chicago
⤈ Wharton School of the University of Pennsylvania

Related Research

Editorial

AI Needs Fewer Prophets and More Predictions

Dec 12, 2025

Working paper

The Longitudinal Expert AI Panel: Understanding Expert Views on AI Capabilities, Adoption, and Impact

Nov 10, 2025

Working paper

Assessing Near-Term Accuracy in the Existential Risk Persuasion Tournament

Sep 2, 2025

Academic article

Belief updating in AI-risk debates: Exploring the limits of adversarial collaboration

Apr 3, 2025

Roots of Disagreement on AI Risk

Abstract

Executive summary

Methods

Results: What drives (and doesn’t drive) disagreement over AI risk

Hypothesis #1 – Disagreements about AI risk persist due to lack of engagement among participants, low quality of participants, or because the skeptic and concerned groups did not understand each other’s arguments4

Hypothesis #2 – Disagreements about AI risk are explained by different short-term expectations (e.g. about AI capabilities, AI policy, or other factors that could be observed by 2030)

Hypothesis #3 – Disagreements about AI risk are explained by different long-term expectations

Hypothesis #4 – These groups have fundamental worldview disagreements that go beyond the discussion about AI

Results: Forecasting methodology

Broader scientific implications

Glossary

Background & Motivation

How did we test potential drivers of disagreement?

The central disagreement

How the AI adversarial collaboration worked

Recruitment

Activities to facilitate engagement between the skeptic and concerned groups

Eliciting forecasts and rationales on cruxes

Other activities

Hypothesis #1: Do the groups understand each other’s arguments, and do views shift with more engagement?

Understanding each other’s arguments

Arguments for lower risk

Arguments for higher risk

Concluding notes on understanding and engagement

Hypothesis #2: Were disagreements about AI risk explained by different short-term expectations (e.g. about AI capabilities, AI policy, or other factors that could be observed by 2030)?

How did we assess the “cruxiness” of forecasting questions?

VOI: Which near-term questions have higher and lower value of information?

Results tables and figures

Low VOI questions

High VOI questions

Highest VOI questions for skeptics

Highest VOI questions for concerned participants

Observations about high VOI questions

Contextualizing the magnitude of the value of information

Red flags and green flags

Example red flags for skeptics:

Example green flags for skeptics:

Some red flags for concerned participants:

Some green flags for concerned participants:

VOD: Which near-term questions have higher and lower value of discrimination?

Results tables and figures81

Convergent cruxes: Which information would lead to less disagreement, in expectation?

ARC Evals: The strongest convergent crux

Differences of Opinion within Groups

Divergent cruxes: Which information would lead to more disagreement?

Hypothesis #3: Were disagreements about AI risk explained by different long-term expectations?

Survey on long-term AI outcomes

What long-term outcomes from AI do skeptics expect?

Forecasts about “transformative” economic growth

Reasons for long-term disagreement

Timelines for AI Progress

Goals that incentivize killing everyone

Hypothesis #4: Do the groups have fundamental worldview disagreements that go beyond AI?

Limitations of our research

Conclusion and Next Steps

In short:

Directions for further research

Notes

Hypothesis #1 – Disagreements about AI risk persist due to lack of engagement among participants, low quality of participants, or because the skeptic and concerned groups did not understand each other’s arguments⁴

Results tables and figures⁸¹