Results of an Adversarial Collaboration on AI Risk
Today, we’ve released “Roots of Disagreement on AI Risk: Exploring the Potential and Pitfalls of Adversarial Collaboration,” which describes the results of a project that brought together participants who disagreed about the risk AI poses to humanity in the next century in an adversarial collaboration forecasting project.
The project, which ran in April and May 2023, asked participants in “AI skeptic” and “AI concerned” groups to work collaboratively to find the strongest near-term cruxes: forecasting questions resolving by 2030 that would lead to the largest changes in beliefs (in expectation) about the risk of existential catastrophe by 2100. The median skeptic forecasted a 0.12% chance that AI will cause an existential catastrophe by 2100, and the median concerned participant forecasted a 20% chance. This project set out to understand the causes of that disagreement, and what information could help resolve it.
Some major takeaways from the project include:
Neither the concerned nor the skeptics substantially updated toward the other’s views during our study, though one of the top short-term cruxes we identified is expected to close the gap in beliefs about AI existential catastrophe by about 5%: approximately 1 percentage point out of the roughly 20 percentage point gap in existential catastrophe forecasts.
Skeptics were generally able to explain the concerned group’s arguments and vice versa, suggesting that disagreements are not primarily due to misunderstanding one another.
We find greater agreement about a broader set of risks from AI over the next thousand years: the two groups gave median forecasts of 30% (skeptics) and 40% (concerned) that AI will have severe negative effects on humanity by causing either major declines in population, very low self-reported well-being, or extinction.
One of the strongest cruxes that will resolve by 2030 is about whether METR (formerly known as ARC Evals) (a) or a similar group will find that AI has developed dangerous capabilities such as autonomously replicating and avoiding shutdown.
See the full working paper here.