Adversarial Collaboration on AI Risk

Roots of Disagreement on AI Risk

In the 2022 Existential Risk Persuasion tournament (XPT), we found that experts are much more concerned about AI risk than superforecasters (people who have been accurate about short-run forecasts in the past). We wanted to understand why thoughtful people disagree so much about the risks of AI. Is one group simply misunderstanding the other, or are there real disagreements between reasonable people? What information could the two groups learn that would bring them closer to agreement?

In “Roots of Disagreement on AI Risk: Exploring the Potential and Pitfalls of Adversarial Collaboration,” we describe the results of a project that brought together participants who disagreed about the risk AI poses to humanity in an adversarial collaboration. We asked participants in “AI skeptic” (mostly superforecasters) and “AI concerned” (entirely domain experts) groups to work collaboratively to find the strongest near-term cruxes: forecasting questions resolving by 2030 that would lead to the largest changes in beliefs (in expectation) about the risk of existential catastrophe by 2100. Participants made conditional forecasts on the cruxes they generated, and discussed their reasoning in an online forum and in moderated one-on-one video calls. 

We found that participants were generally able to explain one another’s arguments, so the disagreement is probably not due mostly to misunderstandings. And we found some cruxes that would be somewhat informative to each group’s forecast of existential risks due to AI, as well as some areas of agreement on longer-term AI outcomes. But overall, neither the concerned nor the skeptics substantially updated toward the other’s views, and they continue to disagree. This project helped us answer some of the questions we had about disagreement between participants in the XPT and provides a basis for future work on understanding the stark disagreements about AI risk.

See the full report here.