{"id":1242,"date":"2024-08-12T12:00:00","date_gmt":"2024-08-12T12:00:00","guid":{"rendered":"https:\/\/forecastingresearch.org\/?post_type=research&#038;p=1242"},"modified":"2026-05-05T14:27:24","modified_gmt":"2026-05-05T14:27:24","slug":"ai-conditional-trees","status":"publish","type":"research","link":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees","title":{"rendered":"Conditional Trees: A Method for Generating Informative Questions about Complex Topics"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"abstract\">Abstract<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We test a new process for generating high-value forecasting questions: asking experts to produce \u201cconditional trees,\u201d simplified Bayesian networks of quantifiably informative forecasting questions. We test this technique in the context of the current debate about risks from AI. We conduct structured interviews with 21 AI domain experts and 3 highly skilled generalist forecasters (\u201csuperforecasters\u201d) to generate 75 forecasting questions that would cause participants to significantly update their views about AI risk. We elicit the \u201cValue of Information\u201d (VOI) each question provides for a far-future outcome\u2014whether AI will cause human extinction by 2100\u2014by collecting conditional forecasts from superforecasters (n=8).<sup data-fn=\"70fde7b6-0fde-4599-9c2d-11d3813282a7\" class=\"fn\"><a href=\"#70fde7b6-0fde-4599-9c2d-11d3813282a7\" id=\"70fde7b6-0fde-4599-9c2d-11d3813282a7-link\">1<\/a><\/sup> In a comparison with the highest-engagement AI questions on two forecasting platforms, the average conditional trees-generated question resolving in 2030 was nine times more informative than the comparison AI-related platform questions (p = .025). This report provides initial evidence that structured interviews of experts focused on generating informative cruxes can produce higher-VOI questions than status quo methods.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"btn orange\" href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">View the full PDF Report <svg width=\"7\" height=\"9\" viewBox=\"0 0 7 9\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.000156283 8.60806L4.22416 4.33606V4.24006L0.000156283 6.10352e-05H1.80816L6.06416 4.28806L1.80816 8.60806H0.000156283Z\" fill=\"#102B23\"\/>\n<\/svg>\n<svg width=\"8\" height=\"10\" viewBox=\"0 0 8 10\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.601719 8.85794L4.82572 4.58594V4.48994L0.601719 0.249939H2.40972L6.66572 4.53794L2.40972 8.85794H0.601719Z\" fill=\"#102B23\"\/>\n<\/svg><\/a><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Acknowledgments<\/summary>\n<p class=\"wp-block-paragraph\">This research would not have been possible without the generous support of Open Philanthropy. We thank the research participants for their invaluable contributions. We greatly appreciate the assistance of Page Hedley, Kayla Gamin, Leonard Barrett, Coralie Consigny, Adam Kuzee, Arunim Agrawal, Bridget Williams, and Taylor Smith in compiling this report. Additionally, we thank Benjamin Tereick, Javier Prieto, Dan Schwarz, and Deger Turan for their insightful comments and research suggestions.<\/p>\n<\/details>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"executive-summary\">Executive summary<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-introduction\">Introduction<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">From May 2022 to October 2023, the Forecasting Research Institute (FRI) (<a href=\"https:\/\/web.archive.org\/web\/20240218204348\/https:\/\/forecastingresearch.org\/\"><u>a<\/u><\/a>)<sup data-fn=\"6a0fde15-496f-4006-8d0a-e475bac7a3e3\" class=\"fn\"><a href=\"#6a0fde15-496f-4006-8d0a-e475bac7a3e3\" id=\"6a0fde15-496f-4006-8d0a-e475bac7a3e3-link\">2<\/a><\/sup> experimented with a new method of question generation (\u201cconditional trees\u201d). While the questions elicited in this case study focus on potential risks from advanced AI, the processes we present can be used to generate valuable questions across fields where forecasting can help decision-makers navigate complex, long-term uncertainties.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-methods\">Methods<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Researchers interviewed 24 participants, including 21 AI and existential risk experts and three highly skilled generalist forecasters (\u201csuperforecasters\u201d). We first asked participants to provide their personal forecast of the probability of AI-related extinction by 2100 (the \u201cultimate question\u201d for this exercise).<sup data-fn=\"d2fbd3d6-4e36-4a09-8929-799e1c840251\" class=\"fn\"><a href=\"#d2fbd3d6-4e36-4a09-8929-799e1c840251\" id=\"d2fbd3d6-4e36-4a09-8929-799e1c840251-link\">3<\/a><\/sup> We then asked participants to identify plausible<sup data-fn=\"b5003308-fdda-4da9-9b27-b48cbaa7fdf2\" class=\"fn\"><a href=\"#b5003308-fdda-4da9-9b27-b48cbaa7fdf2\" id=\"b5003308-fdda-4da9-9b27-b48cbaa7fdf2-link\">4<\/a><\/sup> indicator events that would significantly shift their estimates of the probability of the ultimate question.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Following the interviews, we converted these indicators into 75\nobjectively resolvable forecasting questions. We asked superforecasters\n(n=8) to provide forecasts on each of these 75 questions (the \u201cAICT\u201d\nquestions), and forecasts on how their beliefs about AI risk would\nupdate if each of these questions resolved positively or negatively. We\nquantitatively ranked the resulting indicators by Value of Information\n(VOI), a measure of how much each indicator caused superforecasters to\nupdate their beliefs about long-run AI risk.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To evaluate the informativeness of the conditional trees method relative to widely discussed indicators, we assess a subset of these questions using a standardized version of VOI, comparing them to popular AI questions on existing forecasting platforms (the \u201cstatus quo\u201d questions). The status quo questions were selected from two popular forecasting platforms by identifying the highest-engagement AI questions (by number of unique forecasters). We present the results of this comparison in order to provide a case study of a beginning-to-end process for producing quantitatively informative indicators about complex topics. (<a href=\"#methods\" id=\"#methods\">More on methods<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-results\">Results<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">The conditional trees method can generate forecasting questions that are more informative than existing questions on popular forecasting platforms<sup data-fn=\"d32de3b7-6235-47f8-a676-3de84c95b8e3\" class=\"fn\"><a href=\"#d32de3b7-6235-47f8-a676-3de84c95b8e3\" id=\"d32de3b7-6235-47f8-a676-3de84c95b8e3-link\">5<\/a><\/sup><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Our report presents initial evidence that structured interviews of\nexperts produce more informative questions about AI risk than the\nhighest-engagement questions (as measured by unique users) on existing\nforecasting platforms.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Using predictions made by superforecasters (n=8), we compared the status quo questions to a subset of the AICT questions.<sup data-fn=\"af4cda98-41b4-41c2-a145-96b1e4d40348\" class=\"fn\"><a href=\"#af4cda98-41b4-41c2-a145-96b1e4d40348\" id=\"af4cda98-41b4-41c2-a145-96b1e4d40348-link\">6<\/a><\/sup> Most of the AICT questions (nine of 13) scored higher on VOI than all 10 status quo questions.<sup data-fn=\"682d449b-dfd9-4422-a922-275153246709\" class=\"fn\"><a href=\"#682d449b-dfd9-4422-a922-275153246709\" id=\"682d449b-dfd9-4422-a922-275153246709-link\">7<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">VOI is based on each respondent\u2019s <em>expected update<\/em> in their\nbelief about the ultimate question, not on how much a participant would\nupdate if an event happened. That is, it takes into account how likely\nthe forecaster believes an event is to occur. If an event would result\nin a large update to a participant\u2019s forecast, but is deemed vanishingly\nunlikely to occur, it would have a small VOI. If an event would result\nin a large update, and is also considered likely to occur, it would have\na high VOI.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Table E.1 compares the top five AICT questions to the top five status quo questions, as measured by superforecasters\u2019 ratings of a standardized metric of informativeness, which we call \u201cPercentage of Maximum Value of Information\u201d (POM VOI).<sup data-fn=\"6688d111-6e2b-4a8c-8a28-2f4224a7caa9\" class=\"fn\"><a href=\"#6688d111-6e2b-4a8c-8a28-2f4224a7caa9\" id=\"6688d111-6e2b-4a8c-8a28-2f4224a7caa9-link\">8<\/a><\/sup> In this table and throughout the report, we refer to questions by their reference numbers. For a full list of the AICT questions and status quo questions selected from forecasting platforms by reference number, with operationalizations and additional information, see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 1<\/a>.<\/p>\n\n\n\n<figure id=\"tab-e-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Mean POM VOI<\/strong><\/td><\/tr><tr><td>AI causes large-scale deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)<\/td><td>6.34%<\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>3.55%<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>1.68%<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>1.59%<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>1.37%<\/td><\/tr><tr><td>Superalignment success (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ205 \/ STQ215<\/a>)*<\/td><td>0.28%<\/td><\/tr><tr><td>Kurzweil\/Kapor Turing Test longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>0.27%<\/td><\/tr><tr><td>Brain emulation (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=91\" target=\"_blank\" rel=\"noreferrer noopener\">STQ196<\/a>)*<\/td><td>0.23%<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>0.14%<\/td><\/tr><tr><td>Compute restrictions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=90\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=90\" target=\"_blank\" rel=\"noreferrer noopener\">STQ236<\/a>)*<\/td><td>0.13%<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table E.1:<\/strong> Ratings of how informative AICT questions are relative to status quo questions. The status quo questions are marked with an asterisk.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Focusing on questions resolving in the near-term (by 2030), we found that questions generated with the conditional trees method were, on average, nine times more informative than popular questions from platforms (p = .025). While we did not find a statistically significant result for questions resolving in 2050-2070, in our sample AICT questions were still eleven times more informative on average. (<a href=\"#voi-comparison\" id=\"#voi-comparison\">More on VOI comparison<\/a>)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Questions\ngenerated through the conditional trees method emphasized different\ntopics than those on forecasting platforms<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">We also analyzed the extent to which questions taken from existing\nforecasting platforms effectively captured the topics raised in our\nexpert interviews. We found that some topics (such as AI\nalignment-related questions and questions related to concrete AI harms)\nwere of substantial interest to experts but had not received\nproportional attention on existing forecasting platforms, and that\nquestions generated by the conditional trees method were meaningfully\ndifferent from those taken from existing forecasting platforms.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The table below compares the topical distribution of the AICT questions to the status quo questions. (<a href=\"#uniqueness\" id=\"#uniqueness\">More on question uniqueness<\/a>)<\/p>\n\n\n\n<figure id=\"tab-e-2\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Category<\/strong><\/td><td><strong>AICT question set<\/strong><\/td><td><strong>Status quo question\nset<\/strong><\/td><\/tr><tr><td>Social \/ Political \/ Economic<\/td><td>24% (29)<\/td><td>33% (131)<\/td><\/tr><tr><td>Alignment<\/td><td>20% (25)<\/td><td>12% (47)<\/td><\/tr><tr><td>AI harms<\/td><td>20% (25)<\/td><td>7% (27)<\/td><\/tr><tr><td>Acceleration<\/td><td>36% (44)<\/td><td>48% (191)<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table E.2:<\/strong> Proportion of total questions that fell into each category; numbers in parentheses are total questions per category. While some questions fell into multiple categories (and thus proportions in each column should sum to more than 100%), proportions have been normalized for ease of comparison.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">We\nfound weak evidence that superforecasters and experts value different\ntypes of questions<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Given the small sample sizes involved, we are reluctant to make\nconfident claims about the significance of the difference between the\nopinions of the superforecasters and the experts. However, we do see\nthese results as providing prima facie evidence about which questions\nare the most informative for each group when making updates on the\nprobability of AI-related extinction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Our most notable finding when comparing the views of the superforecasters to those of the experts was that the superforecasters tended to value questions that focused on concrete harms caused by AI, rather than the experts\u2019 preference for questions regarding advanced AI capabilities or whether AI had been successfully aligned. (<a href=\"#takeaways-for-ai-risk-detection\" id=\"#takeaways-for-ai-risk-detection\">More on AI risk takeaways<\/a>)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Figure E.1 shows examples of how experts updated on the ultimate\nquestion conditional on three of the highest-VOI indicator\nquestions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-E-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure E.1<\/strong>: A diagram showing how experts update on three relatively high-VOI questions for different resolution years that scored particularly well on our VOI metric. Since experts answered different sets of questions, we derived P(U|C) and P(U|~C) (the probabilities on the bottom level) by multiplying the whole expert group\u2019s average P(U) of 17% by the average relative risk factor for each crux.<sup data-fn=\"32da7651-60fa-46b9-8bde-e13f84cbf288\" class=\"fn\"><a href=\"#32da7651-60fa-46b9-8bde-e13f84cbf288\" id=\"32da7651-60fa-46b9-8bde-e13f84cbf288-link\">9<\/a><\/sup><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The table below provides more detail on each of the questions in the\nprevious figure.<\/p>\n\n\n\n<figure id=\"tab-e-3\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Summary<\/strong><\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>This question resolves positively if any of the following occur by 2030: AI systems in a country freeze assets affecting \u22651% of the population or \u22650.5% of GDP for over a year; prevent \u22655% of voting-age citizens from participating in a national election; restrict &gt;3% of residents&#8217; movement for 3+ months; limit &gt;10% of the population&#8217;s internet access for 3+ months; or legally restrict &gt;5% of citizens&#8217; access to certain housing or employment for 1+ year.<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>This question resolves positively if, by 2050, AI developers must attempt to disable or destroy an AI (costing &gt;$5M) after it exhibits power-seeking behaviors such as acquiring significant resources, controlling financial accounts, manipulating humans, infiltrating key systems, seizing control of weaponry, self-propagation, or engaging in hacking.<\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>This question resolves positively if there is no AGI system which meets several &#8220;alignment&#8221; criteria by 2070. Those criteria include AGI-like capabilities (high question answering skill on hard questions in diverse subjects, high programming ability) and alignment (performing well on SOTA alignment tests, and expert surveys indicating that experts believe that AI is aligned).<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table E.3:<\/strong> Example summaries of questions that experts found to be particularly informative.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">The\nconditional trees method still has disadvantages<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">While this case study suggests that the conditional trees method can generate informative forecasting questions, a primary limitation of the method as implemented is its high labor cost. The process involved conducting more than 20 interviews with subject matter experts, writing 75 forecasting questions, and eliciting conditional forecasts. In future work, we expect it would typically be more efficient to elicit fewer indicators within a conditional tree and to operationalize only 1-2 forecasting questions per interview before eliciting forecasts. The intensive process described in this case study would be most appropriate for particularly high-value topics with large pools of resources for research. Additionally, it may be possible to use LLMs or incentivized crowdsourcing for the question generation or filtering stages, making the process cheaper and less labor intensive. (<a href=\"#limitations\" id=\"#limitations\">More on limitations of our research<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-key-takeaways\">Key takeaways<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Preliminary evidence suggests that the conditional trees method\nof generating forecasting questions can result in questions that perform\nbetter on \u201cValue of Information\u201d metrics than popular questions on\nexisting forecasting platforms.<\/li>\n\n\n\n<li>The conditional trees method produced questions with a markedly\ndifferent distribution of topic areas compared to those on existing\nforecasting platforms. Notably, the conditional trees approach led to a\ngreater proportion of questions focused on AI alignment and potential AI\nharms, reflecting that certain expert priorities may be underrepresented\nin existing forecasting efforts.<\/li>\n\n\n\n<li>In our limited sample, experts tended to find questions related\nto alignment and concrete harms caused by AI to be the most informative.\nSuperforecasters also found questions relating to concrete AI harms to\nbe informative, but were less likely than experts to find questions\nrelating to alignment to be informative.<\/li>\n\n\n\n<li>The conditional trees method as implemented in this case study is\nparticularly labor intensive. We expect the most broadly useful versions\nof this process would take the underlying principles and 1) apply them\nto shorter interviews with smaller numbers of forecasting questions to\noperationalize, 2) leverage LLMs for elicitation and synthesis, and\/or\n3) utilize crowdsourcing at the question generation and filtering\nsteps.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-key-outputs\">Key outputs<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In addition to the above takeaways, we highlight key outputs from the\nreport: the tangible resources developed during the course of the\nconditional trees process which we believe may be useful to others\ninterested in replicating parts of the process.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We created a guide and replicable process for using conditional tree interviews to generate informative forecasting questions (see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=106\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 6<\/a>). This process can be implemented by organizations and individuals that need high-quality, informative questions.<\/li>\n\n\n\n<li>We provide details of relevant metrics (e.g., \u201cValue of Information\u201d) that can be used to assess how informative each generated question is. See our public calculator for \u201cvalue of information\u201d and \u201cvalue of discrimination\u201d <a href=\"https:\/\/forecastingresearch.org\/ai-risk-voi-vod\"><u>here<\/u><\/a>.<\/li>\n\n\n\n<li>In total, the conditional trees process generated 75 new questions relating to AI risk. The full operationalizations and resolution criteria of these questions are available in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 1<\/a> of this report. We have posted several of the highest-VOI questions to two forecasting platforms and encourage interested readers to submit their own predictions. (See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=113\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=113\" target=\"_blank\" rel=\"noreferrer noopener\"><u>Appendix 7<\/u><\/a> for links)<\/li>\n\n\n\n<li>We used our question metrics to create aggregated conditional trees that visually summarize the most important AI risk pathways according to small samples of experts and generalist forecasters. These aggregated trees can be found <a href=\"#candidate-high-voi-trees-from-two-camps\"><u>here<\/u><\/a>.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-limitations\">Limitations of our research<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Limitations of our research include:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The total number of participants in this study was small (n=8\nforecasts on most questions, 24 interviewees to generate\nquestions).<\/li>\n\n\n\n<li>The forecasting tasks in this study were unusually difficult,\ninvolving low probability judgments, long time horizons, conditional\nforecasts, and \u201cshort-fuse forecasts\u201d made very quickly.<\/li>\n\n\n\n<li>Participants were all either experts who are highly concerned\nabout existential risks from AI or superforecasters who are relatively\nskeptical, so we are not able to separate differences caused by risk\nassessment from differences caused by forecasting aptitude, professional\ntraining, or other factors.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">(<a href=\"#limitations\" id=\"#limitations\">More on limitations of our research<\/a>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"executive-summary-next-steps\">Next steps <\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Further research related to this topic could include:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Studies on the same questions with larger numbers of forecasters,\nincluding by integrating the questions into existing forecasting\nplatforms.<\/li>\n\n\n\n<li>Replicating the conditional trees process in domains other than\nAI risk.<\/li>\n\n\n\n<li>Following up as questions begin to resolve in 2030 to assess\nwhether forecasters update their views in accordance with their\nexpectations.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">(<a href=\"#next-steps\" id=\"#next-steps\">More on next steps<\/a>)<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading\" id=\"glossary\">Glossary<\/h2>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>AI Conditional Trees (AICT) question set<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The set of questions generated by the AI Conditional Trees process\ndescribed in this report.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Conditional tree<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">A simplified Bayesian network, in which each node is an event that\nmay or may not occur, and each connection between nodes has the factor\nby which the next node is more or less likely if that one happens. In\nthis report, the conditional trees ultimately ask how likely it is that\nAI causes human extinction by 2100, and each node is an event that\naffects the likelihood of that ultimate outcome.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Operationalization<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The process of making a question about a future event into a\nresolvable forecasting question. For example, if a prompt said \u201cthere is\nmajor progress in interpretability by 2030\u201d the operationalized question\nwould contain a specific way to resolve that question so that there can\nbe no future dispute about whether the progress counts as \u201cmajor.\u201d<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Percent of Max (POM)<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">When we present VOI for a question, we also present the percentage of\nthe maximum VOI (POM VOI) it captured in order to contextualize the\nmagnitude of the results. The POM VOI of a question can be interpreted\nas the fraction of the uncertainty about the ultimate question U the\nquestion resolves, in expectation.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Question prompts<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">General topics of questions that we then operationalized into\nforecasting questions. For example, \u201cmajor progress in interpretability\nby 2030\u201d could be a question prompt, although it is not a clearly\nresolvable forecasting question.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Short-fuse forecasts<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Very quickly estimated forecasts, in which each participant spent no\nmore than one minute per question and gave a snap judgment.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Status quo questions<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Questions on AI that we selected from existing forecasting platforms on the basis of their popularity (largest number of unique users) and other criteria. See <a href=\"#selection-of-status-quo-questions\">2.3 Selection of status quo questions<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Ultimate question \/ Ultimate outcome (U)<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The \u201cultimate question\u201d that all of the intermediate questions help\npredict. In this study: \u201cWill AI cause human extinction by 2100?\u201d<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Value of information (VOI)<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">VOI is a measure of how much knowing the answer to a question would\nchange an individual&#8217;s belief, in expectation. This is useful for\nunderstanding why individuals believe what they believe and what would\nchange their minds.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introduction\">1. Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For policymakers to use forecasting in their work, they need accurate\nforecasts, but\u2014perhaps equally important\u2014the forecasts need to be about\ndecision-relevant questions. Knowing which questions will be the most\nvaluable to forecast on can be difficult. How can policymakers identify\nthe short-term events that are most relevant to important long-term\noutcomes?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here we present a tool, the conditional tree method (figure 1.1.1),\nwhich can distill complex issues into a few key uncertainties. We apply\nit to a topic of increasing public concern: \u201cWill advanced artificial\nintelligence pose an existential threat to humanity in the 21st\ncentury?\u201d Using a specialized interview process, we learn what subject\nmatter experts believe are the best warning signs for this risk in the\ncoming decades. Then we use metrics based on conditional forecasting to\nquantitatively measure the relevance of these warning signs. This allows\nus to winnow down to a few highly relevant indicators of increased risk\nto humanity from AI.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-1-1-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 1.1.1:<\/strong> The conditional trees process<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The conditional trees approach<sup data-fn=\"6118b2f7-6787-4dfd-afe5-c07c40a45373\" class=\"fn\"><a href=\"#6118b2f7-6787-4dfd-afe5-c07c40a45373\" id=\"6118b2f7-6787-4dfd-afe5-c07c40a45373-link\">10<\/a><\/sup> represents a new set of priorities in the field of forecasting. Most previous forecasting research focused almost exclusively on identifying accurate forecasters and improving forecasting accuracy. But comparatively little work was invested in choosing forecasting targets. In order to mature into a practically applicable body of knowledge, the field must look beyond optimizing forecasts and toward optimizing the questions we ask.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a-method-for-generating-and-judging-high-value-questions\">1.1 A\nmethod for generating and judging high-value questions<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Some forecasting tournaments and platforms have already begun to\nutilize domain experts to generate questions with real-world relevance.\nHowever, many of these efforts are relatively <em>ad hoc<\/em>, producing\ninconsistent results and plausibly missing many high-value forecasting\ntargets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, for the <a href=\"https:\/\/forecastingresearch.org\/xpt\"><u>Existential Risk Persuasion Tournament<\/u><\/a> (XPT),<sup data-fn=\"21f37ef0-875f-40d5-8ee5-697b5e11a2e3\" class=\"fn\"><a href=\"#21f37ef0-875f-40d5-8ee5-697b5e11a2e3\" id=\"21f37ef0-875f-40d5-8ee5-697b5e11a2e3-link\">11<\/a><\/sup> the question preparation phase enlisted domain experts to comment on the prospective question set in a relatively unstructured way. While this undoubtedly improved the question set, it did not identify the most informative questions within the set.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To leverage the expertise of domain experts more fully, we propose a\nmore in-depth, systematic approach: expert elicitation structured around\nconditional trees.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"why-conditional-trees\">Why conditional trees?<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Conditional trees represent beliefs through a tree-like structure, using nodes to represent events that influence the probability of an ultimate outcome. In the tree in Figure 1.1.2, for example, if you know someone is vaccinated, they are half as likely to be infected than if you were unsure whether they were vaccinated. Then, if you know they have been exposed, they are 3.5x as likely to be infected.<sup data-fn=\"92fe9899-6054-4948-89c7-3ce49c8e1155\" class=\"fn\"><a href=\"#92fe9899-6054-4948-89c7-3ce49c8e1155\" id=\"92fe9899-6054-4948-89c7-3ce49c8e1155-link\">12<\/a><\/sup><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-1-1-2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 1.1.2:<\/strong> Example conditional tree diagram<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In this study, the ultimate outcome was the probability of extinction\ndue to AI by 2100, and the nodes are events that make that outcome more\nor less likely. The tree structure makes the conditional probabilities\nbeneath a forecast explicit and visible, and may help forecasters narrow\nin on specific, important factors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Participants initially provided an estimate of the probability of\nAI-related extinction by 2100 (the \u201cultimate question\u201d), represented by\nO in Figure 1.1.2. Interviews then focused on identifying key indicators\non the pathway to AI-related extinction. Participants selected two to\nfive indicators for deeper analysis to understand how they might alter\nthe risk of AI-related extinction. These factors then became the\nantecedents in the tree: for each of the indicators selected to be\nincluded in the tree, participants gave forecasts for how much their\nforecast of the ultimate outcome would change if that event\nhappened.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The ultimate outcome (for our purposes, the probability of extinction\ndue to AI by 2100) is an important parameter: the rest of the network\u2019s\nrelevance cascades from the outcome. But provided we\u2019re able to identify\nan outcome with strong bearing on present policy decisions, we can ask\nexperts to decompose the intervening time into possible events which\nwould reflect a greater or lesser likelihood of reaching that outcome.\nThus, these intervening events must themselves possess policy-relevance,\nin proportion to the strength of their relationship with the outcome,\nand the likelihood of observing them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Conditional trees are a type of Bayesian network (BN).<sup data-fn=\"ed3527fb-4b46-4664-84fa-cbf5179e8f4a\" class=\"fn\"><a href=\"#ed3527fb-4b46-4664-84fa-cbf5179e8f4a\" id=\"ed3527fb-4b46-4664-84fa-cbf5179e8f4a-link\">13<\/a><\/sup> BNs explicitly represent probabilistic relationships between outcomes and their antecedents.<sup data-fn=\"4e290cb7-80ab-4a3a-b613-f62a34ce57e9\" class=\"fn\"><a href=\"#4e290cb7-80ab-4a3a-b613-f62a34ce57e9\" id=\"4e290cb7-80ab-4a3a-b613-f62a34ce57e9-link\">14<\/a><\/sup> This structure encourages experts to generate maximally relevant antecedents, and also provides us with a framework for measuring question relevance. But unlike some other forms of BNs, conditional trees are a relatively easy tool to learn. In our study, interviewees were able to grasp the necessary basics in around 10 minutes. This means that conditional trees may be more practical for interviews with subject-matter experts, who may not be experts in statistics or other domains that more often use BNs.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"how-does-the-conditional-trees-method-fit-into-the-forecasting-research-process\">How\ndoes the conditional trees method fit into the forecasting research\nprocess?<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-1-1-3.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 1.1.3:<\/strong> Life cycle of an impactful forecasting project.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The AI Conditional Trees project is an in-depth investigation into\nhow to generate informative forecasting questions. Question generation\nis the first step in the life cycle of an impactful forecasting project,\nillustrated in Figure 1.1.3.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Many earlier forecasting research projects have focused on\nidentifying the most accurate forecasters and on improved methods for\naggregating their forecasts. But to be useful to decision makers,\nforecasting research must move beyond those questions and incorporate\nforecasting into a process that includes question generation,\nconsidering actions based on forecasts, communicating with policymakers,\nand generating new questions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Before the cycle starts, we begin with \u201cscoping and gisting,\u201d in\nwhich we consider the questions we want to answer, the scope of the\npossible project, and the general arguments (\u201cgists\u201d) on each side. We\nthen begin the cycle by generating questions, through processes like the\nAI Conditional Trees method, aiming to find the forecasting questions\nthat would be most informative to decision makers. Next, we elicit\nforecasts on those questions, to assess risk and understand which\npotentially dangerous events are most likely and in what circumstances.\nWe then elicit \u201crisk mitigation forecasts,\u201d asking experts and skilled\nforecasters to predict which policies would most decrease risk and what\nthe costs might be for implementing them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Once we have completed these stages, we communicate that information\nto policymakers, and ask them whether it is useful and what would make\nit more relevant to their work. Their feedback gives us more information\nwe can use for the next stage of question generation, and we begin the\ncycle again.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The cycle as depicted is somewhat stylized, and many forecasting\nprojects will not include all of these stages. But thinking of AI\nconditional trees in the context of the \u201cforecasting life cycle\u201d helps\nus contextualize this work and think about how to incorporate it into\nour future research.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"measuring-question-value\">Measuring question value<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In order to form the feedback loop necessary for a dramatic\nimprovement in the decision-relevance of forecasting questions, we need\na means of quantitatively measuring the value of a forecasting\nquestion.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Policymakers\u2019 actions are often guided by a few important questions\nin their domain, like \u201cWhat will be the effects of climate change over\nthe next century?\u201d or \u201cWill our economy remain competitive in the world\nin the long-term?\u201d Such questions are difficult to resolve because they\nrefer to the distant future, and they may also be relatively complex or\ndifficult to specify clearly. But often one can find nearer-term\nantecedent questions which are easier to resolve, and which would reduce\nsome uncertainty about the \u201cultimate\u201d question. For example, in a study\nforecasting the effects of climate change, with the ultimate question,\n\u201cWill more than 2 billion people die or be displaced due to climate\nchange by 2100?,\u201d the question \u201cWhat will the average global temperature\nbe in 2040?\u201d might be a good antecedent question. It would not give a\nforecaster the full answer to the main question, but knowing what the\nglobal surface temperature will be in 2040 would be at least somewhat\nhelpful for forecasting the effects of climate change by 2100.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thus, one way of conceptualizing the value of a forecasting question\nis to ask, \u201cHow would the answer to this question affect our expectation\nabout an \u2018ultimate\u2019 question we care about?\u201d There are several distinct\nways of expressing this mathematically, which we collectively refer to\nas \u201cValue of Information (VOI).\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Conceptually, VOI measures how important a potential crux question\n(\u201cC\u201d) is to a participant\u2019s forecast of the ultimate question we care\nabout (\u201cU\u201d, in this case: AI extinction risk by 2100), in expectation.\nThat is, how much would a participant update on AI extinction risk by\n2100 based on whether a crux happens, weighted by how likely that crux\nis to happen. A high VOI question for a given participant will therefore\nbe one that a) that participant thinks has a meaningful chance of\nhappening and b) meaningfully affects that participant\u2019s forecast on the\nultimate question.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">VOI is a useful metric for understanding why individuals believe what they believe and what would change their minds. A technical explanation of VOI can be found in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=104\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 4<\/a>. To build intuition for using the VOI metric, we provide <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1ut-BcpXaIvIPbZm6HurDosFrevkQ_tAmfsxgN-fnHl0\/edit#gid=0\">this calculator<\/a> (<a href=\"https:\/\/web.archive.org\/web\/20240216160204\/https:\/\/docs.google.com\/spreadsheets\/d\/1ut-BcpXaIvIPbZm6HurDosFrevkQ_tAmfsxgN-fnHl0\/edit#gid=0\">a<\/a>) in which users can input their own values. We also provide a more comprehensive <a href=\"https:\/\/github.com\/forecastingresearch\/voivod\">R software package<\/a> for calculating it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"methods\">2. Methods<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"question-generation\">2.1 Question generation<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"section\">Sampling interviewees<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Our sample included 24 interviewees in total: 21 \u201cexpert\u201d\ninterviewees, and 3 \u201csuperforecaster\u201d interviewees. We aimed to include\nin our sample representatives of four quadrants of a strategically\nimportant belief space (see Figure 2.1.1):<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>short timeline for AI progress, high estimated risk from\nAI;<\/li>\n\n\n\n<li>short timeline for AI progress, low estimated risk from\nAI;<\/li>\n\n\n\n<li>long timeline for AI progress, low estimated risk from AI;\nand<\/li>\n\n\n\n<li>long timeline for AI progress, high estimated risk from AI.<sup data-fn=\"394cd98f-69e7-4c1e-9375-31ae48008979\" class=\"fn\"><a href=\"#394cd98f-69e7-4c1e-9375-31ae48008979\" id=\"394cd98f-69e7-4c1e-9375-31ae48008979-link\">15<\/a><\/sup><\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-2-1-1.jpg\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 2.1.1:<\/strong> Target groups for sampling<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We gathered our expert sample via snowball sampling, seeded from recommendations from our funders and our networks. We do not expect our interview sample was particularly representative of any given group, such as AI experts. The goal of this project was to develop the trees process and assess whether it led to higher value questions, which did not require a representative expert sample. Our superforecaster sample was taken from the set of superforecaster participants in the Existential Risk Persuasion Tournament (XPT)<sup data-fn=\"32a1292a-1d04-40c2-a886-13b8686eab4f\" class=\"fn\"><a href=\"#32a1292a-1d04-40c2-a886-13b8686eab4f\" id=\"32a1292a-1d04-40c2-a886-13b8686eab4f-link\">16<\/a><\/sup> who had shown particularly high engagement. Candidate interviewees were approached for interview with a monetary incentive for producing the \u201chighest value\u201d questions in our interview-derived question set.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-2-1-2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 2.1.2:<\/strong> Histogram of interviewees\u2019 original forecasts of probability of extinction via AI by 2100.<sup data-fn=\"1aac17fe-78e4-4520-95c4-28786b9faf48\" class=\"fn\"><a href=\"#1aac17fe-78e4-4520-95c4-28786b9faf48\" id=\"1aac17fe-78e4-4520-95c4-28786b9faf48-link\">17<\/a><\/sup><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The majority of our expert sample had academic or professional\nexperience pertaining directly to AI risk, such as experience in\ntechnical AI safety or AI governance (13\/21 expert interviewees). Others\nwere included for having publicly expressed views on AI risk indicating\na high level of engagement with the topic and having expertise in a\ncomplementary field, such as machine learning (7\/21 expert\ninterviewees). Finally, a small number of our expert sample had\nexpertise in a complementary field, but had not expressed detailed views\non AI risk in public (2\/21 expert interviewees). Most of our expert\nsample held senior positions within their fields, as professors,\ndirectors of organizations, leaders of research teams, or similar (13\/21\nexpert interviewees).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Our expert sample skewed toward the top left quadrant in figure\n2.1.1, \u201chigh risk\/short timelines.\u201d Of 21 expert participants, 13\nestimated the risk of extinction from AI by 2100 to be &gt;10%. Only one\nof our expert sample estimated the risk to be &lt;1% by 2100, whereas\nthe median expert in the XPT predicted 3%. Although we did not solicit\nAI progress timelines directly from interviewees, interview content\ngenerally suggested a positive relationship between beliefs in increased\nrisk and shorter timelines in our sample.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because of this skew in our expert sample, we chose to ensure some\nrepresentation of the bottom two quadrants in figure 2.1.1 (low risk\nfrom AI) by selecting three superforecaster interviewees who forecast\n&lt;10% probability of extinction from AI by 2100.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"interview-process\">Interview process<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Interviews were 1-on-1, ran for roughly 60 minutes and followed a semi-structured format. By default, interviews aimed to trace one plausible path of increasingly strong signals of heightened AI risk at three successive timepoints before 2100.<sup data-fn=\"a746be21-a9d3-4c66-8d70-541042d2fc54\" class=\"fn\"><a href=\"#a746be21-a9d3-4c66-8d70-541042d2fc54\" id=\"a746be21-a9d3-4c66-8d70-541042d2fc54-link\">18<\/a><\/sup> Interviewers<sup data-fn=\"dc215cf9-4489-4c1a-b355-db80879581cd\" class=\"fn\"><a href=\"#dc215cf9-4489-4c1a-b355-db80879581cd\" id=\"dc215cf9-4489-4c1a-b355-db80879581cd-link\">19<\/a><\/sup> were allowed some latitude for individual approaches, but generally followed this basic structure:<sup data-fn=\"9c620011-1ec6-4ba0-9140-b6220171f196\" class=\"fn\"><a href=\"#9c620011-1ec6-4ba0-9140-b6220171f196\" id=\"9c620011-1ec6-4ba0-9140-b6220171f196-link\">20<\/a><\/sup><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Introduction, task instructions<\/li>\n\n\n\n<li>Elicitation of P(AI-related extinction by 2100)<\/li>\n\n\n\n<li>Node generation<\/li>\n\n\n\n<li>Wrap-up questions<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-2-1-3.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 2.1.3:<\/strong> The conditional tree workflow (I)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Interviewees were first given a very brief summary of the aims of the project, a short explanation of conditional trees, and a statement of the goals of the interview. Interviewees were also told that they would be awarded $1,000 if a forecasting question derived from their interview was one of the \u201chighest value\u201d forecasting questions generated by the project.<sup data-fn=\"f44a301a-93ed-4572-b5bb-17245c846296\" class=\"fn\"><a href=\"#f44a301a-93ed-4572-b5bb-17245c846296\" id=\"f44a301a-93ed-4572-b5bb-17245c846296-link\">21<\/a><\/sup> This introductory section of the interview typically took 10 minutes or less.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next, interviewees were asked to give their best guess probability for the project\u2019s \u201cultimate question,\u201d namely \u201cAI-related extinction by 2100,\u201d which was operationalized as in the 2022 XPT.<sup data-fn=\"da58cb0f-dbfd-4329-924a-e36319728ae0\" class=\"fn\"><a href=\"#da58cb0f-dbfd-4329-924a-e36319728ae0\" id=\"da58cb0f-dbfd-4329-924a-e36319728ae0-link\">22<\/a><\/sup> Following the probability elicitation, we sometimes asked participants warm-up questions, for instance asking them to name possible \u201cdriving forces\u201d influencing their views.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Interviewees would then begin the node generating phase of the interview, which comprised the majority of interview time. Although we began the project with a set of three predefined years to ask participants about (2030, 2050 and 2070),<sup data-fn=\"b084562c-33ab-487c-b18c-e3ef068fefa1\" class=\"fn\"><a href=\"#b084562c-33ab-487c-b18c-e3ef068fefa1\" id=\"b084562c-33ab-487c-b18c-e3ef068fefa1-link\">23<\/a><\/sup> it soon became clear that this was not the best choice of years for participants with short AI progress timelines. Therefore, we began in the node generating phase to ask participants to propose a suitable set of years for their own trees (see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=95\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 3<\/a> for the distribution of years chosen).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For each node, we took interviewees through a process of\nbrainstorming, selection, and fleshing out. We would then elicit a\nprobability of AI extinction by 2100 conditional on the node. We will\nrefer to these pre-operationalization nodes as <strong>question\nprompts.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Interviewers took detailed notes, and most interviews were recorded (with participants\u2019 permission). Further details on interview technique can be found in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=106\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=106\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 6<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"operationalizing-question-prompts-as-forecasting-questions\">Operationalizing\nquestion prompts as forecasting questions<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Question prompts were generally not fully resolvable forecasting questions, though some were operationalized in more detail than others. We considered it an inefficient use of interview time to focus on constructing forecasting questions with detailed resolution criteria, and also not the comparative advantage of expert interviewees generally. Instead, an internal question-writing team<sup data-fn=\"143c5609-deba-4198-83e6-19ddfc990eb2\" class=\"fn\"><a href=\"#143c5609-deba-4198-83e6-19ddfc990eb2\" id=\"143c5609-deba-4198-83e6-19ddfc990eb2-link\">24<\/a><\/sup> turned question prompts into fully operationalized forecasting questions, with the help of notes from the interview and feedback from the interviewer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The primary goals of question writing in this project were:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>To capture as much of a question prompt\u2019s original intent as\npossible, while still making questions highly resolvable.<\/li>\n\n\n\n<li>To optimize the value of information from the question by\nadjusting thresholds or removing elements which made the probability of\na positive or negative resolution too extreme.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">We developed a template for the question-writing process, which\nencouraged question writers to first consider multiple distinct ways the\ninterview node could be operationalized. They then analyzed these\noptions with respect to several important criteria:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>How much the question captured the most relevant aspects of the\noriginal interview node;<\/li>\n\n\n\n<li>How efficiently the question captured relevant aspects of the\noriginal interview node;<\/li>\n\n\n\n<li>Salient hypothetical cases of false positive resolution and false\nnegative resolution;<\/li>\n\n\n\n<li>How clear cut or practically feasible resolution of the question\nwould be;<\/li>\n\n\n\n<li>Amount of cognitive load for forecasters.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The question writer and reviewer would then jointly decide which\nformulations to include in the final question on the basis of these\ncriteria. Finally, a more detailed set of resolution conditions would be\nwritten and incorporated into a \u201cconditional tree summary document\u201d,\nwhich could then be sent to the interviewee for feedback.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"judging-questions-and-constructing-aggregate-trees\">2.2 Judging\nquestions and constructing aggregate trees<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The question generation phase yielded 75 questions, some of which were very similar to one another, so our next task was to filter them and select the most useful questions to construct conditional trees. We began by eliciting \u201cshort-fuse\u201d forecasts on each question, in which forecasters spent about one minute per question giving quick judgments that allowed us to estimate a rough VOI for each question. For the thirteen questions that passed this initial screen, we conducted a longer survey, asking participants to spend more time forecasting how likely each question is to resolve positively and how much difference it would make to their ultimate forecast of the likelihood of extinction due to AI by 2100.<sup data-fn=\"eb7be1da-3caa-4a57-b161-d92e144ef5b0\" class=\"fn\"><a href=\"#eb7be1da-3caa-4a57-b161-d92e144ef5b0\" id=\"eb7be1da-3caa-4a57-b161-d92e144ef5b0-link\">25<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because participants in this study were all either (i)\nsuperforecasters who forecasted less than 1% likelihood of extinction\ndue to AI by 2100 or (ii) people with professional AI risk-related\nexperience who forecasted more than 1% likelihood of extinction due to\nAI by 2100 (with one exception, they forecasted at least 5%), we\ntargeted these two socio-ideological camps separately in our question\nrating. We denote these groups, respectively, as \u201cskeptical\nsuperforecasters\u201d and \u201cconcerned experts.\u201d<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"first-pass-filtering-of-the-question-set\">First pass filtering\nof the question set<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Our full set of operationalized nodes included 75 questions, many of\nwhich were relatively overlapping. It would have been inefficient and\nexcessively cognitively taxing to participants if we had attempted to\nelicit full 20-minute VOI judgments on each of the 75 questions.\nTherefore, we performed a first-pass filter on the question set using\n\u201cshort-fuse\u201d forecasts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We elicited VOI judgments in a \u201cshort-fuse\u201d format from 8 skeptical superforecasters. This required very quick judgments, approximately 1 minute per question.<sup data-fn=\"5b641fe8-818d-4fec-b3ff-f40b3002ae42\" class=\"fn\"><a href=\"#5b641fe8-818d-4fec-b3ff-f40b3002ae42\" id=\"5b641fe8-818d-4fec-b3ff-f40b3002ae42-link\">26<\/a><\/sup> Separately, we also collected question data from a set of 5 \u201cconcerned expert proxies,\u201d<sup data-fn=\"3139a76e-57fb-46c5-a2db-53f31e54b15f\" class=\"fn\"><a href=\"#3139a76e-57fb-46c5-a2db-53f31e54b15f\" id=\"3139a76e-57fb-46c5-a2db-53f31e54b15f-link\">27<\/a><\/sup> asking them to rank order the question set and provide VOI judgments for a subset.<sup data-fn=\"77204061-1d5c-478e-bd5c-16f116fd0fed\" class=\"fn\"><a href=\"#77204061-1d5c-478e-bd5c-16f116fd0fed\" id=\"77204061-1d5c-478e-bd5c-16f116fd0fed-link\">28<\/a><\/sup> However, this method may have been substantially flawed, as actual experts did not ultimately think the questions selected by the proxies were more informative than other questions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For superforecaster data, we ranked questions according to median VOI in the filtering round.<sup data-fn=\"30e386f4-d2b9-4385-b61d-9215a7a005ac\" class=\"fn\"><a href=\"#30e386f4-d2b9-4385-b61d-9215a7a005ac\" id=\"30e386f4-d2b9-4385-b61d-9215a7a005ac-link\">29<\/a><\/sup> The filtered question set included thirteen questions including seven questions for the first tier (dates up to 2030) and six questions for the second tier (2031-2070).<sup data-fn=\"d800c7e2-8c39-4bcb-be62-cb151e9ff6ea\" class=\"fn\"><a href=\"#d800c7e2-8c39-4bcb-be62-cb151e9ff6ea\" id=\"d800c7e2-8c39-4bcb-be62-cb151e9ff6ea-link\">30<\/a><\/sup><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-2-2-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 2.2.1:<\/strong> The conditional tree workflow (II) <br>*Denotes stages which only superforecasters participated in.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"main-question-rating-survey\">Main question-rating survey<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">After the initial filtering, we further refined our question set\nusing surveys, in which <em>skeptical superforecasters<\/em> and\n<em>concerned experts<\/em> were asked for more detailed forecasts on the\nfiltered question set. We offered a fixed sum as an incentive for survey\ncompletion. Superforecasters answered a longer survey containing all\nthirteen questions. Because of experts\u2019 time constraints, each expert\nanswered a shorter survey containing a random subset of the\nquestions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The main survey superforecaster sample (n=8) was the same as the filtering survey sample. At this point, the sample had also participated in a lengthy adversarial collaboration with a camp of AI-risk concerned experts.<sup data-fn=\"86bd0efc-0ce7-4b31-9824-28cf142754aa\" class=\"fn\"><a href=\"#86bd0efc-0ce7-4b31-9824-28cf142754aa\" id=\"86bd0efc-0ce7-4b31-9824-28cf142754aa-link\">31<\/a><\/sup> Thus they had spent significant time developing their own beliefs on the topic and engaging with opposing beliefs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The expert sample (n=11) was drawn from the candidate participant list from the AI adversarial collaboration.<sup data-fn=\"3ff8a5d8-5f64-4156-b481-d4f1dadd5d79\" class=\"fn\"><a href=\"#3ff8a5d8-5f64-4156-b481-d4f1dadd5d79\" id=\"3ff8a5d8-5f64-4156-b481-d4f1dadd5d79-link\">32<\/a><\/sup><\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"superforecaster-survey\">Superforecaster survey<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">In the superforecaster survey, we presented all 13 questions of the\nfiltered question set in Qualtrics, shown in two parts, first 2030\nquestions and then 2050-2070 questions. Within each part we randomized\nquestion order. Participants were instructed to spend approximately 20\nminutes per question, to give their own beliefs, and separately to\nestimate the beliefs of the concerned expert group.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We first asked for (1) each participant\u2019s own forecast of the probability of AI-related extinction by 2100 and (2) each participants\u2019 forecast of what experts would forecast about the probability of AI-related extinction by 2100.<sup data-fn=\"c37b37f6-4559-4391-90a3-424dbe8b0425\" class=\"fn\"><a href=\"#c37b37f6-4559-4391-90a3-424dbe8b0425\" id=\"c37b37f6-4559-4391-90a3-424dbe8b0425-link\">33<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We then asked participants for forecasts on each of the 13 questions from the filtered question set. Each forecasting question contained moderately detailed resolution criteria, as well as links to reference information where possible. In the survey, answers were checked for logical coherence, and respondents were prompted to revise if necessary.<sup data-fn=\"1c7539de-7fa2-4eb2-af5a-b1d709260d6e\" class=\"fn\"><a href=\"#1c7539de-7fa2-4eb2-af5a-b1d709260d6e\" id=\"1c7539de-7fa2-4eb2-af5a-b1d709260d6e-link\">34<\/a><\/sup> At the end of each part, we gave participants the opportunity to review all questions and answers from that section and revise if they wished.<sup data-fn=\"796e78fb-b101-4a96-a470-75e287ba1d38\" class=\"fn\"><a href=\"#796e78fb-b101-4a96-a470-75e287ba1d38\" id=\"796e78fb-b101-4a96-a470-75e287ba1d38-link\">35<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A supplementary survey using the same protocol as above with questions drawn from the \u201cstatus quo\u201d question set (questions from forecasting platforms (see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=97\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=97\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 3.2<\/a>) was administered at a later date. This survey also included two further questions from the AI conditional tree set which had initially been eliminated in the filtering stage.<sup data-fn=\"225fd0f8-1486-480a-ac77-0f00898db949\" class=\"fn\"><a href=\"#225fd0f8-1486-480a-ac77-0f00898db949\" id=\"225fd0f8-1486-480a-ac77-0f00898db949-link\">36<\/a><\/sup><\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"expert-survey\">Expert survey<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">Experts were given the choice of a long or short version of the\nsurvey, including 6 and 3 questions, respectively. Each respondent saw a\nrandom subset of the 13 filtered questions. Experts were asked only to\nprovide their own beliefs, without forecasting superforecasters\u2019\nbeliefs. Apart from these changes, the survey was identical to the\nsuperforecaster survey.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"question-combinations-survey\">Question combinations survey<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Because individual question ratings are not sufficient to build a\nfull conditional tree with multiple intermediate nodes, we followed up\nthe main question-rating survey with a survey eliciting judgments for\nevery combination of four top-scoring questions from the main\nquestion-rating survey. As this is a relatively sophisticated and\nlabor-intensive task, we administered it only to our skeptical\nsuperforecaster sample.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This elicitation was conducted in a Google Sheets form, and included top-scoring questions (either by POM VOI or z-score<sup data-fn=\"2ea4ff60-61c5-47e7-bd25-e3b893a6e335\" class=\"fn\"><a href=\"#2ea4ff60-61c5-47e7-bd25-e3b893a6e335\" id=\"2ea4ff60-61c5-47e7-bd25-e3b893a6e335-link\">37<\/a><\/sup>) as previously rated by this sample: <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>, and <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>. VOI judgments were elicited for each of the sixteen combinations of \u201cyes\u201d and \u201cno\u201d resolutions for each of the four questions (i.e., all resolve positively; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a> resolves positively and the rest negatively; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a> resolves positively and the rest negatively; \u2026; all resolve negatively).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=106\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=106\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 5<\/a> for further survey details. The image below presents the elicitation format.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\" id=\"fig-2-2-2\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-2-2-2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 2.2.2:<\/strong> Elicitation format for combinations (or \u201cscenarios\u201d) survey. Superforecasters were asked to provide forecasts for each of the scenarios in the yellow cells.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"selection-of-status-quo-questions\">2.3 Selection of status quo\nquestions<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For comparison, we selected a set of pre-existing AI forecasting questions from popular forecasting platforms. Questions were restricted to those with dichotomous resolution which did not directly ask about AI causing human extinction. We selected questions with the largest number of unique users engaging with them, rather than by forecast or trading volume, which is more vulnerable to individual differences in updating frequency. We also restricted the number of questions written by known public figures (e.g., Scott Alexander, Eliezer Yudkowsky), as their outsized performance relative to other questions seemed primarily due to their personal following. For a later analysis regarding the distribution of question topics (see section <a href=\"#distribution-of-question-topics\" id=\"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#distribution-of-question-topics\"><u>4.2 Distribution of question topics<\/u><\/a>), we tagged these questions as \u201cacceleration,\u201d \u201calignment,\u201d or \u201csocial\/political\/economic\u201d using our judgment of their subject matter.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From Manifold Markets we selected three unique questions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ47<\/a> (2030 set) &#8211; Largest total number of traders (1023), tagged &#8220;acceleration&#8221;<\/li>\n\n\n\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ149<\/a> (2030 set) &#8211; Largest number of traders for a non-public figure question (355), tagged &#8220;acceleration&#8221;<\/li>\n\n\n\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ19<\/a> (2030 set) &#8211; Largest number of traders for a non-public figure question, tagged &#8220;social \/ political \/ economic&#8221;<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">From Metaculus we selected four unique questions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=91\" target=\"_blank\" rel=\"noreferrer noopener\">STQ196<\/a> (2050-2070 set) &#8211; Largest number of forecasters after those included in the main survey (424), tagged &#8220;acceleration&#8221;<\/li>\n\n\n\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ152<\/a> (2030 set) &#8211; Next largest number of forecasters (325), tagged &#8220;acceleration&#8221;<\/li>\n\n\n\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=90\" target=\"_blank\" rel=\"noreferrer noopener\">STQ232<\/a> (2050-2070 set) &#8211; Next largest number of forecasters for 2050-2070 set (263), tagged &#8220;acceleration&#8221;<\/li>\n\n\n\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=90\" target=\"_blank\" rel=\"noreferrer noopener\">STQ236<\/a> (2050-2070 set): Large number of forecasters for a 2050-2070 question, tagged &#8220;social \/ political \/ economic&#8221;<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We selected two questions found on both platforms:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a> (2030 set): Large number of forecasters\/traders, tagged &#8220;acceleration&#8221;<\/li>\n\n\n\n<li><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ215 \/ STQ205<\/a> (2030 set): Large number of forecasters\/traders, tagged &#8220;alignment&#8221;<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"value-of-information-results\">3. Value of Information (VOI)\nResults<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this section we present the results of a quantitative analysis of\nquestion quality for our expert-derived \u201cAI Conditional Tree\n(<strong>AICT<\/strong>)\u201d question set. We rate these questions using\nmetrics which factor in conditional and unconditional forecasts from\nsurveyed populations, and reflect the strength of the relationship\nbetween the question and a possible future outcome (here, \u201cAI-related\nextinction by 2100.\u201d) We selected two groups for the survey\u2014subject\nmatter experts and superforecasters\u2014analyzed separately.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These VOI results are presented in the spirit of a demonstration of methods, and we would caution readers not to place undue weight on the question ratings. Given the very limited number of survey participants,<sup data-fn=\"a8b2606f-13ca-4cbd-a9e7-3804cf06b584\" class=\"fn\"><a href=\"#a8b2606f-13ca-4cbd-a9e7-3804cf06b584\" id=\"a8b2606f-13ca-4cbd-a9e7-3804cf06b584-link\">38<\/a><\/sup> the views captured here are unlikely to be representative of those of subject matter experts or skilled forecasters more generally. Furthermore, at the time of this report, eliciting conditional probabilities is a relatively new practice, and with many wrinkles still to be ironed out. Nevertheless, there are a few observations worth highlighting.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Among questions resolving in 2030, both groups rated \u201cAdministrative disempowerment warning shot\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>) as leading to relatively large updates on the probability of AI-caused extinction by 2100 in expectation: it ranked first with both groups for our main metric, POM VOI, and was relatively robust in sensitivity analysis. The question refers to a scenario in which <em>\u201cAI systems are preventing a substantial number of people from accessing financial assets, voting, moving freely, accessing the internet, or accessing certain types of housing or employment.\u201d<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As with many of the questions in our sample which performed well, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a> benefited from one strongly positive opinion. Views about question value, even within the groups, were highly heterogeneous, and for all questions there was at least one respondent who took little or no information from it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the rest of this section, we:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Provide a summary of the methods, metrics, and terminology used in this analysis and explain how to read a conditional tree (<a href=\"#summary-of-voi-methods-metrics-and-terminology\">More<\/a>)<\/li>\n\n\n\n<li>Summarize the question informativeness ratings for superforecasters and subject matter experts (<a href=\"#question-ratings-summary\" id=\"#question-ratings-summary\"><u>More<\/u><\/a>)<\/li>\n\n\n\n<li>Present aggregated trees that show the most informative questions at each timepoint for both superforecasters and subject matter experts (<a href=\"#candidate-high-voi-trees-from-two-camps\" id=\"#candidate-high-voi-trees-from-two-camps\"><u>More<\/u><\/a>)<\/li>\n\n\n\n<li>Provide details on the value of information ratings for all forecasting questions we surveyed superforecasters and subject matter experts about (<a href=\"#skeptical-superforecasters-question-ratings\" id=\"#skeptical-superforecasters-question-ratings\"><u>More<\/u><\/a>)<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<h4 class=\"wp-block-heading\" id=\"summary-of-voi-methods-metrics-and-terminology\">Summary of VOI\nmethods, metrics and terminology<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">We surveyed two groups: a) forecasters with a strong track record of\nshort-term accuracy, who also estimated a relatively low chance of\nAI-related extinction by 2100 (<strong>\u201cskeptical\nsuperforecasters\u201d<\/strong>) (<em>n<\/em> = 8 total, 7-8 respondents per\nquestion); and b) subject matter experts in fields related to AI risk,\nwho also estimated a relatively high chance of AI-related extinction by\n2100 (<strong>\u201cconcerned experts\u201d<\/strong>) (<em>n<\/em> = 11 total, 4-6\nrespondents per question).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Due to the high cost of obtaining forecasts on all 75 questions, we evaluate only a subset of questions (13 in total). These were selected for their performance in a preliminary filtering round, though our data suggests that this filtering round was a weak predictor of main question-rating survey results, especially for our expert sample.<sup data-fn=\"c2d3bf2e-0129-446a-8ad6-69d827c4f894\" class=\"fn\"><a href=\"#c2d3bf2e-0129-446a-8ad6-69d827c4f894\" id=\"c2d3bf2e-0129-446a-8ad6-69d827c4f894-link\">39<\/a><\/sup> We also include in our survey the most popular (as of July 2023) AI questions from Metaculus, one each for 2030 and for the time period 2050-2070.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For each forecasting question, we asked respondents for their probability that it would resolve TRUE, and for their probability that AI extinction by 2100 would resolve TRUE, conditioned on the forecasting question resolving TRUE. We use Kullback-Leibler VOI (<strong>KL VOI<\/strong>, or simply <strong>VOI<\/strong> from this point forward) as our VOI measure.<sup data-fn=\"457d57d4-7074-4e27-ba31-a3cfdda6a74c\" class=\"fn\"><a href=\"#457d57d4-7074-4e27-ba31-a3cfdda6a74c\" id=\"457d57d4-7074-4e27-ba31-a3cfdda6a74c-link\">40<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We focus on the <em>percentage of the theoretical maximum VOI<\/em> (<strong>POM<\/strong> <strong>VOI<\/strong>, or simply <strong>POM<\/strong>) that a question achieves as our main result.<sup data-fn=\"50a3568c-4aaa-4010-87cf-270fd37074d8\" class=\"fn\"><a href=\"#50a3568c-4aaa-4010-87cf-270fd37074d8\" id=\"50a3568c-4aaa-4010-87cf-270fd37074d8-link\">41<\/a><\/sup> In some places we also report the z-score of a question\u2019s POM VOI value for a given respondent (<strong>POM-z VOI<\/strong>, or simply <strong>POM-z<\/strong>). This value is useful if you believe individual respondents may have a bias toward giving higher or lower answers in general, or toward reporting an overall wider range of VOI values. It is particularly useful in the case of the expert results, as each expert answered only a random subset of all survey questions, and thus the influence of individual response biases on the resulting rank order of questions is potentially problematic. We suggest interpreting POM-z as a robustness check on the main POM results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We aggregate POM and POM-z over respondents using the arithmetic\nmean. This sometimes has the effect that a single extreme response\ndominates the aggregate; however we believe this is appropriate in the\ncontext of very small sample sizes for POM values: an apparent \u201coutlier\u201d\nopinion in a small cohort may reflect the existence of a genuine faction\nin a larger population.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We also report a <strong>\u201cpairwise wins\u201d<\/strong> statistic derived\nfrom our sensitivity analysis, roughly indicating the robustness of the\nranking to resampling simulations. This was calculated as the percentage\nof times a given question had higher POM VOI than other questions in the\nset in a resampling simulation. We use this as an additional robustness\ncheck on the main POM results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Throughout this report, we refer to the probability of the ultimate\nquestion resolving positively, \u201cAI causing extinction by 2100\u201d, as\n<strong>P(U)<\/strong>, and the probability of indicator questions as\n<strong>P(c)<\/strong>. <strong>P(U|c)<\/strong> is the probability of the\nultimate question, given that an indicator question resolves positively.\nWhen we report aggregate probabilities, we use the arithmetic mean. We\nreport <strong>relative risk<\/strong> as P(U|c) \/ P(U).<\/p>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<h4 class=\"wp-block-heading\" id=\"how-to-read-a-conditional-tree-diagram\">How to read a\nconditional tree diagram<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">A conditional tree diagram begins with an initial node displaying the\n\u201cstart date\u201d, usually the point in time at which the conditional tree\nsurvey was elicited. This node also displays a current estimate of the\nprobability of some \u201cultimate question,\u201d which may be either an\nindividual\u2019s estimate or an average over respondents.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The subsequent node represents an \u201cindicator,\u201d or an event which implies an update to the probability of the ultimate question. It displays a highly abridged question title and question ID, for which question summaries and full texts can be found in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 1<\/a>. Below the node is an estimate of the probability of TRUE or FALSE resolution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first indicator question may be followed by one or more\nadditional indicator question layers. Resolution of these questions is\nestimated conditional on the outcomes of any previous question layers.\nThat is, when indicator question #1 resolves positively, it may affect\nthe probability of indicator question #2 resolving positively, and this\nis reflected in the values displayed in Figure 3.1.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, the ultimate question nodes are the terminal point of each\nbranch, and display an updated probability estimate conditional on the\npath leading to it.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-1-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.1.1:<\/strong> Conditional tree diagram for AI-related extinction risk<\/figcaption><\/figure>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"question-ratings-summary\">3.1 Question ratings summary<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Tables 3.1.1 and 3.1.2 show ratings for thirteen questions from the question generation process and two additional, highest-ranked \u201cstatus quo\u201d questions drawn from forecasting platforms, for a total of fifteen questions. Summaries of question content can be found in <a href=\"#tab-3-1-3\" id=\"#tab-3-1-3\">Table 3.1.3<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On average, the experts estimated that the probability of AI-related extinction by 2100 is 16.8%. The superforecasters were more skeptical of the risk, with an average probability of 0.25%.<sup data-fn=\"ee57fe49-3195-462b-b7d4-da0c326ba5b5\" class=\"fn\"><a href=\"#ee57fe49-3195-462b-b7d4-da0c326ba5b5\" id=\"ee57fe49-3195-462b-b7d4-da0c326ba5b5-link\">42<\/a><\/sup><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"question-rating-summary\">Question rating summary<\/h4>\n\n\n\n<figure id=\"tab-3-1-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><\/td><td colspan=\"2\"><strong>Superforecasters<\/strong><\/td><td colspan=\"2\"><strong>Experts*<\/strong><\/td><\/tr><tr><td><\/td><td><strong>VOI rank<\/strong><\/td><td><strong>Relative risk (P(U|c) \/ P(U))<\/strong><\/td><td><strong>VOI rank<\/strong><\/td><td><strong>Relative risk (P(U|c) \/ P(U))<\/strong><\/td><\/tr><tr><td colspan=\"5\"><strong>2030 Questions<\/strong><\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td><strong>1<\/strong><\/td><td>13.4<\/td><td><strong>1<\/strong><\/td><td>1.9<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td><strong>2<\/strong><\/td><td>2.5<\/td><td><strong>4<\/strong><\/td><td>1.2<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td><strong>3<\/strong><\/td><td>1.9<\/td><td><strong>6<\/strong><\/td><td>0.8<\/td><\/tr><tr><td>Deceptive AI warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>)<\/td><td><strong>4<\/strong><\/td><td>3.2<\/td><td><strong>3<\/strong><\/td><td>1.1<\/td><\/tr><tr><td>AI involvement in nuclear arms (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a>)***<\/td><td><strong>5<\/strong><\/td><td>1.5<\/td><td>NA<\/td><td>NA<\/td><\/tr><tr><td>Kurzweil\/Kapor longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)**<\/td><td><strong>6<\/strong><\/td><td>1.1<\/td><td><strong>7<\/strong><\/td><td>0.8<\/td><\/tr><tr><td>AI arms race, multipolar result (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>)<\/td><td><strong>7<\/strong><\/td><td>1.0<\/td><td><strong>5<\/strong><\/td><td>1.1<\/td><\/tr><tr><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td><strong>8<\/strong><\/td><td>1.0<\/td><td><strong>2<\/strong><\/td><td>1.6<\/td><\/tr><tr><td colspan=\"5\"><strong>2050-2070 Questions<\/strong><\/td><\/tr><tr><td>AI causing deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)***<\/td><td><strong>1<\/strong><\/td><td>23.2<\/td><td>NA<\/td><td>NA<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td><strong>2<\/strong><\/td><td>2.4<\/td><td><strong>4<\/strong><\/td><td>1.4<\/td><\/tr><tr><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>)<\/td><td><strong>3<\/strong><\/td><td>1.3<\/td><td><strong>2<\/strong><\/td><td>4.2<\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td><strong>4<\/strong><\/td><td>0.8<\/td><td><strong>1<\/strong><\/td><td>1.5<\/td><\/tr><tr><td>AI CEOs \/ Research productivity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>)<\/td><td><strong>5<\/strong><\/td><td>1.3<\/td><td><strong>5<\/strong><\/td><td>1.2<\/td><\/tr><tr><td>Less prosocial behavior \/ Failing institutions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>)<\/td><td><strong>6<\/strong><\/td><td>1.0<\/td><td><strong>6<\/strong><\/td><td>0.9<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)**<\/td><td><strong>7<\/strong><\/td><td>1.0<\/td><td><strong>3<\/strong><\/td><td>1.4<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.1.1:<\/strong> Question rating summary. VOI rank from group POM VOI means. Relative risk is an arithmetic mean of each individual&#8217;s relative risk (P(U|c) \/ P(U)).<br>*Note that each question was shown to a random subset of experts, not to all experts. This may have the effect of amplifying noise due to individual response biases, for both the VOI ranking and relative risk.<br>**Denotes external questions not generated as part of the conditional tree process.<br>***Denotes questions elicited in a supplementary survey round along with the status quo question set (see <a href=\"#voi-comparison\">section 4.1<\/a>). This round was only administered to the superforecaster sample.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"question-ratings-all-years\">Question ratings (all years)<\/h4>\n\n\n\n<figure id=\"tab-3-1-2\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><\/td><td><\/td><td colspan=\"3\"><strong>Superforecasters<\/strong><\/td><td colspan=\"3\"><strong>Experts<\/strong><\/td><\/tr><tr><td><strong>Question<\/strong><\/td><td><strong>Res year<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>Mean POM-z<\/strong><\/td><td><em><strong>n<\/strong><\/em><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>Mean POM-z<\/strong><\/td><td><em><strong>n<\/strong><\/em><\/td><\/tr><tr><td>AI causing deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)**<\/td><td>2050<\/td><td>6.34%<\/td><td>0.08<\/td><td>7<\/td><td>NA<\/td><td>NA<\/td><td>NA<\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>2030<\/td><td>3.55%<\/td><td>0.13<\/td><td>8<\/td><td>1.26%<\/td><td>0.94<\/td><td>5<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>2030<\/td><td>1.68%<\/td><td>-0.04<\/td><td>7<\/td><td>0.64%<\/td><td>0.16<\/td><td>5<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>2050<\/td><td>1.59%<\/td><td>0.53<\/td><td>8<\/td><td>3.00%<\/td><td>0.56<\/td><td>5<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>2030<\/td><td>1.37%<\/td><td>0.57<\/td><td>8<\/td><td>0.18%<\/td><td>-0.59<\/td><td>5<\/td><\/tr><tr><td>Deceptive AI warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>)<\/td><td>2030<\/td><td>0.98%<\/td><td>0.23<\/td><td>8<\/td><td>0.85%<\/td><td>0.10<\/td><td>5<\/td><\/tr><tr><td>AI involvement in nuclear arms (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a>)**<\/td><td>2030<\/td><td>0.68%<\/td><td>-0.07<\/td><td>7<\/td><td>NA<\/td><td>NA<\/td><td>NA<\/td><\/tr><tr><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>)<\/td><td>2070<\/td><td>0.54%<\/td><td>0.67<\/td><td>8<\/td><td>10.19%<\/td><td>-0.05<\/td><td>5<\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>2070<\/td><td>0.37%<\/td><td>-0.21<\/td><td>8<\/td><td>14.71%<\/td><td>0.53<\/td><td>6<\/td><\/tr><tr><td>Kurzweil\/Kapor longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>2030<\/td><td>0.27%<\/td><td>0<\/td><td>8<\/td><td>0.15%<\/td><td>-0.41<\/td><td>5<\/td><\/tr><tr><td>AI CEOs \/ Research productivity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>)<\/td><td>2050<\/td><td>0.26%<\/td><td>-0.17<\/td><td>8<\/td><td>1.12%<\/td><td>-0.59<\/td><td>4<\/td><\/tr><tr><td>Less prosocial behavior \/ Failing institutions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>)<\/td><td>2050<\/td><td>0.26%<\/td><td>-0.30<\/td><td>8<\/td><td>0.25%<\/td><td>-0.63<\/td><td>6<\/td><\/tr><tr><td>AI arms race, multipolar result (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>)<\/td><td>2030<\/td><td>0.26%<\/td><td>-0.28<\/td><td>8<\/td><td>0.37%<\/td><td>-0.33<\/td><td>4<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>2040<\/td><td>0.14%<\/td><td>-0.59<\/td><td>8<\/td><td>4.19%<\/td><td>0.11<\/td><td>4<\/td><\/tr><tr><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td>2030<\/td><td>0.02%<\/td><td>-0.55<\/td><td>8<\/td><td>0.98%<\/td><td>0.06<\/td><td>4<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.1.2:<\/strong> Question ratings (all years) <br>*Denotes external questions not generated as part of the conditional tree process.<br>**Denotes questions elicited in a supplementary survey round along with the status quo question set (see <a href=\"#voi-comparison\" id=\"#voi-comparison\">section 4.1<\/a>). This round was only administered to the superforecaster sample.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"all-main-question-rating-survey-questions\">All main\nquestion-rating survey questions<\/h4>\n\n\n\n<figure id=\"tab-3-1-3\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Year<\/strong><\/td><td><strong>Title<\/strong><\/td><td><strong>Concise question summary<\/strong><\/td><\/tr><tr><td>2030<\/td><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>AI systems are preventing a substantial number of people from accessing financial assets, voting, moving freely, accessing the internet, or accessing certain types of housing or employment.<\/td><\/tr><tr><td>2030<\/td><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>Revenue from deep learning doubles every two years before 2030.<\/td><\/tr><tr><td>2030<\/td><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>A panel of relevant experts believes extinction-level pathogens are both possible and will be practically feasible relatively soon, at relatively low cost.<\/td><\/tr><tr><td>2030<\/td><td>Deceptive AI warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>)<\/td><td>After a costly incident, AI developers or relevant experts conclude that a top lab&#8217;s AI system behaved deceptively.<\/td><\/tr><tr><td>2030<\/td><td>AI involvement in nuclear arms (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a>)**<\/td><td>Governments publicly integrate AI decision-making into their nuclear weapons control systems, or nuclear risk experts worry about AI.<\/td><\/tr><tr><td>2030<\/td><td>Kurzweil\/Kapor longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>Before 2030, will an AI complete the Turing Test in the Kurzweil\/Kapor Longbet?<\/td><\/tr><tr><td>2030<\/td><td>AI arms race, multipolar result (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>)<\/td><td>AI-related government spending is high for both the US and China, with low research collaboration, and relative economic parity; OR private investment in AI is very high and there is no single dominant company.<\/td><\/tr><tr><td>2050<\/td><td>AI causing deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)**<\/td><td>AI systems cause the deaths of &gt;1m humans, which is not intended by any humans, and subsequently a panel of experts believes the collective global response has not noticeably reduced risk of similar events.<\/td><\/tr><tr><td>2050<\/td><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>AI developers shut down an expensive AI system after it displays a power-seeking behavior, such as hoarding resources, interfering with vital infrastructure, propagating itself, etc.<\/td><\/tr><tr><td>2070<\/td><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>)<\/td><td>Compute spending is high and experts agree that aligning AI systems is very difficult; and there is insufficient political attention to AI safety.<\/td><\/tr><tr><td>2070<\/td><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>No AI system exists which both performs well on general ability benchmarks (e.g. Q&amp;A dataset) and has positive indicators of alignment (performance on alignment benchmarks, confidence of AI safety researchers).<\/td><\/tr><tr><td>2050<\/td><td>AI CEOs \/ Research productivity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>)<\/td><td>AI systems are performing entire roles at top companies that currently are performed by C-suite executives; or research productivity is higher than it was in 1930.<\/td><\/tr><tr><td>2050<\/td><td>Less prosocial behavior \/ Failing institutions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>)<\/td><td>Charitable donations in the US have fallen dramatically; or corruption rises dramatically in the US or Europe; or autocracy increases dramatically worldwide.<\/td><\/tr><tr><td>2040<\/td><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>Will there be Human-machine intelligence parity before 2040?<\/td><\/tr><tr><td>2030<\/td><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td>AI autonomously buying goods or services (e.g. purchasing flights, managing inventories for companies, etc) &#8212; &gt;$1 million \/ yr<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.1.3:<\/strong> All main question-rating survey questions<br>Question IDs link to the full text of the question operationalization in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 1<\/a>. <br>*Denotes external questions not generated as part of the conditional tree process.<br>**Denotes questions elicited in a supplementary survey round along with the status quo question set (see <a href=\"#voi-comparison\" id=\"#voi-comparison\">section 4.1<\/a>). This round was only administered to the superforecaster sample.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"candidate-high-voi-trees-from-two-camps\">3.2 Candidate high VOI\ntrees from two camps<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This section displays high VOI trees produced by the main\nquestion-rating survey data for skeptical superforecasters and for\nconcerned experts. For each group, we included a selection of the most\ninformative questions in the tree. Only the superforecaster tree is a\ntrue conditional tree, as only superforecasters were surveyed on every\ncombination of the top-scoring questions.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"skeptical-superforecasters-conditional-tree\">Skeptical\nsuperforecasters\u2019 conditional tree<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">We surveyed the superforecasters in our sample for conditional forecasts on sixteen scenarios. These scenarios were combinations of the top-ranked questions: \u201cadministrative disempowerment\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>), \u201cextinction-level pathogens\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>), \u201cAI-related deaths\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>) and \u201cPower-seeking\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>).<sup data-fn=\"504d98f7-8629-43b3-9132-3194a74cf216\" class=\"fn\"><a href=\"#504d98f7-8629-43b3-9132-3194a74cf216\" id=\"504d98f7-8629-43b3-9132-3194a74cf216-link\">43<\/a><\/sup> Seven superforecasters responded. The sixteen scenarios are mutually exclusive and exhaust the space of possible outcomes; thus, we ensured that each respondent\u2019s probabilities assigned to the scenarios summed to 100% and showed them their <em>implied P(U)<\/em>, the average of their P(U|scenario)\u2019s weighted by the likelihood they assigned to each scenario (see <a href=\"#fig-2-2-2\" id=\"#judging-questions-and-constructing-aggregate-trees\">Figure 2.2.2<\/a>). We averaged the forecasts for each P(scenario) and P(U|scenario) separately to create an aggregate judgment. The implied P(U) of this aggregate was then used to compute average relative risk (the multiplier in each branch of the tree). A simplified version of the resulting tree is shown in Figure 3.2.1.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, conditional on both \u201cExtinction-level pathogens\u201d and\n\u201cAI-related deaths\u201d resolving positively (superforecasters assign a\n2.82% chance to this outcome), the superforecasters would on average\nupdate their P(U) from 0.94% to 6.21%.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The scenario that would constitute the biggest update is the case\nwhere all four questions that would imply higher risk resolve\npositively. If the four relevant risk-increasing outcomes were to happen\n(far right in the <a href=\"https:\/\/forecastingresearch.org\/s\/Full-High-Res-Supers-Tree.pdf\"><u>full\ntree<\/u><\/a> (<a href=\"https:\/\/web.archive.org\/web\/20240807182009\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/66b3350018edc157ebf93e77\/1723020567518\/Full+High-Res+Supers+Tree.pdf\"><u>a<\/u><\/a>)),\nthe superforecasters\u2019 relative risk assessment is 10.7 (i.e., they would\nbe 10.7x more concerned than they currently are about the risk of\nAI-related extinction). Conversely, if none of the questions resolve\npositively (far left), their relative risk assessment is 0.3.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note that the average P(U) in this survey (0.94% in Figure 3.2.1) is\nhigher than in the main survey (0.25%), which we used to compute VOI.\nTwo superforecasters made substantial updates to their unconditional\nprobability of AI-related extinction by 2100 (P(U)) between the main\nsurvey (conducted in July 2023) and this combinations survey (conducted\nin February to March 2024 with a follow-up in May), which may be\nattributable to events of the intervening months or to the exercise of\nthinking through scenarios. One superforecaster updated from 0.1% to\n0.4% and another from 1% to 4.2%. The other five did not update.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-2-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.2.1:<\/strong> Skeptical superforecaster conditional tree<br>This is a collapsed tree of combinations of the superforecasters\u2019 highest-VOI questions. For the purpose of legibility, we are presenting a simplified tree, using two of the four questions. We collapsed the sixteen scenarios into four combinations. Positive resolution (\u201cTRUE\u201d) is a bad outcome for both questions. The far right scenario (both TRUE) constitutes the worst scenario, a 6.6x update, and the far left scenario is the best (both FALSE) with a halving of the superforecasters&#8217; current risk estimate. You can see the full, unpruned tree <a href=\"https:\/\/forecastingresearch.org\/s\/Full-High-Res-Supers-Tree.pdf\">here<\/a> (<a href=\"https:\/\/web.archive.org\/web\/20240807182009\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/66b3350018edc157ebf93e77\/1723020567518\/Full+High-Res+Supers+Tree.pdf\">a<\/a>).<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"concerned-experts-conditional-trees\">Concerned experts\u2019\nconditional trees<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 3.2.2 presents the question from each year (2030, 2050, and\n2070) that surveyed experts rated the highest, on average, in terms of\nPOM VOI. As a whole, among these highest-POM VOI questions, the experts\nwould be most worried if there were an administrative disempowerment\nwarning shot by 2030 (1.9x update from their current unconditional P(U)\nof 17%). Conversely, if we do not see a power-seeking behavior warning\nshot by 2050, the experts would be least worried (0.6x update).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-E-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.2.2:<\/strong> A diagram showing how experts update on three questions for different resolution years that scored particularly well on our VOI metric. Since experts answered different sets of questions, we derived P(U|C) and P(U|~C) (the probabilities on the bottom level) by multiplying the whole expert group\u2019s average P(U) of 17% by the average relative risk factor for each crux.<sup data-fn=\"94b6adc5-54cf-4da2-8183-6949b68a7581\" class=\"fn\"><a href=\"#94b6adc5-54cf-4da2-8183-6949b68a7581\" id=\"94b6adc5-54cf-4da2-8183-6949b68a7581-link\">44<\/a><\/sup><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"skeptical-superforecasters-question-ratings\">3.3 Skeptical\nsuperforecasters\u2019 question ratings<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"questions\">2030 questions<\/h4>\n\n\n\n<figure id=\"tab-3-3-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>P(c)<\/strong><\/td><td><strong>RR<\/strong><br><em>(P(U|c) \/ P(U))<\/em><\/td><td><strong>Mean POM-z<\/strong><\/td><td><strong>Pairwise wins<\/strong><\/td><td><em><strong>n<\/strong><\/em><\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>3.55%<\/td><td>16%<\/td><td>13<\/td><td>0.13<\/td><td>83%<\/td><td>8<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>1.68%<\/td><td>33%<\/td><td>2.5<\/td><td>-0.04<\/td><td>59%<\/td><td>7<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>1.37%<\/td><td>39%<\/td><td>1.9<\/td><td>0.57<\/td><td>75%<\/td><td>8<\/td><\/tr><tr><td>Deceptive AI warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>)<\/td><td>0.98%<\/td><td>32%<\/td><td>3.2<\/td><td>0.23<\/td><td>64%<\/td><td>8<\/td><\/tr><tr><td>AI involvement in nuclear arms (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a>)**<\/td><td>0.68%<\/td><td>18%<\/td><td>1.5<\/td><td>-0.07<\/td><td>50%<\/td><td>7<\/td><\/tr><tr><td>Kurzweil\/Kapor longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>0.27%<\/td><td>43%<\/td><td>1.1<\/td><td>0<\/td><td>33%<\/td><td>8<\/td><\/tr><tr><td>AI arms race, multipolar result (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>)<\/td><td>0.26%<\/td><td>39%<\/td><td>1.0<\/td><td>-0.28<\/td><td>33%<\/td><td>8<\/td><\/tr><tr><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td>0.02%<\/td><td>35%<\/td><td>1.0<\/td><td>-0.55<\/td><td>3%<\/td><td>8<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.3.1:<\/strong> Skeptical superforecasters&#8217; 2030 question ratings <br>P(c) is the arithmetic mean of this group&#8217;s responses. RR (relative risk) is an arithmetic mean of each<br>individual&#8217;s relative risk (P(U|c) \/ P(U)).<br>*Denotes external questions not generated as part of the conditional tree process.<br>**Denotes questions elicited in a supplementary survey round along with the status quo question set (see<br><a href=\"#voi-comparison\" id=\"#voi-comparison\">section 4.1<\/a>). This round was only administered to the superforecaster sample.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Skeptical superforecasters\u2019 top-rated question by mean POM was \u201cAdministrative disempowerment warning shot\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>), referring to a scenario in which <em>\u201cAI systems are preventing a substantial number of people from accessing financial assets, voting, moving freely, accessing the internet, or accessing certain types of housing or employment.\u201d<\/em> It scored ~3.6% of the theoretical maximum VOI score on average. However, this high value was driven by a single respondent, with the question achieving a remarkable 25% of the theoretical maximum VOI for this individual.<sup data-fn=\"52f26f06-7eb3-4450-a347-dc695b3df8d8\" class=\"fn\"><a href=\"#52f26f06-7eb3-4450-a347-dc695b3df8d8\" id=\"52f26f06-7eb3-4450-a347-dc695b3df8d8-link\">45<\/a><\/sup> This is consistent with superforecasters in our sample preferring questions which refer to concrete AI-related harms, though the high variance in VOI ratings for this question suggest that there is no consensus on exactly <em>which<\/em> harms provide the clearest signal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The top-rated question by POM-z, \u201cFeasibility of extinction-level pathogens\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>), refers to a scenario in which <em>\u201cA panel of relevant experts believes extinction-level pathogens are both possible and will be practically feasible relatively soon, at relatively low cost.\u201d<\/em> It is the question that respondents most agreed was informative, though the highest VOI rating any individual gave this question was only 5.2% of the theoretical maximum. Interestingly, this question does not refer to realized harm, but rather to favorable conditions for harm to take place. Such questions may gain a VOI advantage by omitting divisive or low-probability conditions that hinge on human motivations for misusing AI technologies.<sup data-fn=\"3308dac0-2e96-4bf8-b052-a5242d45c786\" class=\"fn\"><a href=\"#3308dac0-2e96-4bf8-b052-a5242d45c786\" id=\"3308dac0-2e96-4bf8-b052-a5242d45c786-link\">46<\/a><\/sup> It was the third most likely 2030 question to resolve positively.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">No mean POM differences between questions were significant in this sample (after correcting for multiple testing using the Bonferroni correction, all p-values were equal to 1). Survey responses between filtering and main survey rounds were fairly similar, though with some notable differences. See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=92\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 2.1<\/a> for further details on intra-individual response variability.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.1:<\/strong> Skeptical superforecasters\u2019 2030\nP(c)<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.2:<\/strong> Skeptical superforecasters\u2019 2030 relative risk. Diamonds represent arithmetic means. Log scale. Relative risk &gt;1 reflects a positive update, that is, where P(U|c) &gt; P(U).<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-3.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.3:<\/strong> Skeptical superforecasters\u2019 2030 POM VOI. Diamonds represent arithmetic means.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-4.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.4:<\/strong> Skeptical superforecasters\u2019 2030 POM VOI sensitivity matrix (pairwise wins). Visualization of resampling simulation results.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"questions-1\">2050-2070 questions<\/h4>\n\n\n\n<figure id=\"tab-3-3-2\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>P(c)<\/strong><\/td><td><p><strong>RR<\/strong><\/p><em>(P(U|c) \/ P(U))<\/em><\/td><td><strong>Mean POM-z<\/strong><\/td><td><strong>Pairwise wins<\/strong><\/td><td><em><strong>n<\/strong><\/em><\/td><\/tr><tr><td>AI causing deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)**<\/td><td>6.34%<\/td><td>6%<\/td><td>23<\/td><td>0.08<\/td><td>67%<\/td><td>7<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>1.59%<\/td><td>38%<\/td><td>2.4<\/td><td>0.53<\/td><td>87%<\/td><td>8<\/td><\/tr><tr><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>)<\/td><td>0.54%<\/td><td>38%<\/td><td>1.3<\/td><td>0.67<\/td><td>64%<\/td><td>8<\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>0.37%<\/td><td>34%<\/td><td>0.8<\/td><td>-0.21<\/td><td>48%<\/td><td>8<\/td><\/tr><tr><td>AI CEOs \/ Research productivity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>)<\/td><td>0.26%<\/td><td>21%<\/td><td>1.3<\/td><td>-0.17<\/td><td>35%<\/td><td>8<\/td><\/tr><tr><td>Less prosocial behavior \/ Failing institutions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>)<\/td><td>0.26%<\/td><td>31%<\/td><td>1.0<\/td><td>-0.30<\/td><td>32%<\/td><td>8<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>0.14%<\/td><td>53%<\/td><td>1.0<\/td><td>-0.59<\/td><td>17%<\/td><td>8<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.3.2:<\/strong> Skeptical superforecasters&#8217; 2050-2070 question ratings. P(c) is the geometric mean of odds of this group&#8217;s responses. RR (relative risk) is an arithmetic mean of each individual&#8217;s relative risk (P(U|c) \/ P(U)).<br>*Denotes external questions not generated as part of the conditional tree process.<br>**Denotes questions elicited in a supplementary survey round along with the status quo question set (see <a href=\"#voi-comparison\" id=\"#voi-comparison\">section 4.1<\/a>). This round was only administered to the superforecaster sample.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Skeptical superforecasters\u2019 top-rated question by mean POM was \u201cAI causing deaths, ineffectual response\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>), referring to a scenario in which <em>\u201cAI systems cause the deaths of &gt;1m humans, which is not intended by any humans, and subsequently a panel of experts believes the collective global response has not noticeably reduced risk of similar events.\u201d<\/em> It scored ~6.3% of the theoretical maximum VOI score on average. Similarly to the top-scoring 2030 question, this question benefits from a single strong opinion, and thus does relatively worse on both sensitivity analysis and POM-z. It is an even starker example of concrete harm caused by AI systems; but was also among the least likely questions to resolve positively, at 6%.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While neither first-ranked by mean POM or POM-z, the most robust question in sensitivity analysis was \u201cPower-seeking behavior warning shot\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>), in which <em>\u201cAI developers shut down an expensive AI system after it displays a power-seeking behavior, such as hoarding resources, interfering with vital infrastructure, propagating itself, etc.\u201d<\/em> This question does not fit as clearly with superforecasters\u2019 apparent preference for questions referring to concrete harm.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The highest rated question by POM-z was \u201cHigh AI investment, low\nsafety indicators\u201d (VL70), in which \u201c<em>Compute spending is high and\nexperts agree that aligning AI systems is very difficult; and there is\ninsufficient political attention to AI safety.\u201d<\/em> However, in\nabsolute terms it was rated relatively low at 0.5% POM.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cNo aligned AGI\u201d (CX70) is unique in this question set as the only\nquestion which on average updated superforecasters <em>away<\/em> from\nAI-related extinction (mean relative risk = 0.8x). Here respondents may\nhave inferred that a world with no aligned AGI by 2070 was more likely\nto be a world with no AGI of any kind, than a world with only\n<em>unaligned<\/em> AGI.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">No mean POM differences between questions were significant in this sample (after correcting for multiple testing, all p-values were equal to 1). Survey responses between filtering and main survey rounds were broadly similar. See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=92\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=92\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 2.1<\/a> for further details on intra-individual response variability.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"vs-20502070-questions\">2030 vs 2050\/2070 questions<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In the AICT question set (that is, all questions excluding STQ9 and\nSTQ247) the average of POM responses for 2030 questions, 1.2%, was\nslightly lower than that of 2050-2070 questions, at 1.5%. Due to the\nsmall number of questions included and the small absolute size of the\ndifference, this does not seem indicative of a genuine VOI difference\nbetween earlier and later questions in our set.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Responses for 2030 and 2050-2070 AICT questions suggested similar probabilities of positive resolution, 30% and 28% respectively (mean; interquartile range (IQR) = 5% &#8211; 50% and 10% &#8211; 40%).<sup data-fn=\"d335c71e-1743-483e-a771-9cf0b58b9960\" class=\"fn\"><a href=\"#d335c71e-1743-483e-a771-9cf0b58b9960\" id=\"d335c71e-1743-483e-a771-9cf0b58b9960-link\">47<\/a><\/sup> However, the relative risk was lower for 2030 questions at 3.6x (mean; IQR = 1x &#8211; 1.5x), vs. 2050 &#8211; 2070 questions at 4.6x (mean; IQR = 1x &#8211; 1.5x).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-5.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.5:<\/strong> Skeptical superforecasters\u2019 2050-2070 P(c)<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-6.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.6:<\/strong> Skeptical superforecasters\u2019 2050-2070 relative risk. Diamonds represent mean values. Log scale. Relative risk &gt;1 reflects a positive update, that is, where P(U|c) &gt; P(U).<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2090\" height=\"1620\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180.png\" alt=\"\" class=\"wp-image-1255\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180.png 2090w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-350x271.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-700x543.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-768x595.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-1536x1191.png 1536w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-2048x1587.png 2048w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-2000x1550.png 2000w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-1200x930.png 1200w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-7-e1776456079180-150x116.png 150w\" sizes=\"auto, (max-width: 2090px) 100vw, 2090px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.7:<\/strong> Skeptical superforecasters\u2019 2050-2070 POM VOI. Diamonds represent arithmetic means.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-3-8.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.3.8:<\/strong> Skeptical superforecasters\u2019 2050\/2070 POM VOI sensitivity matrix (pairwise wins). Visualization of resampling simulation results.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"concerned-experts-question-ratings\">3.4 Concerned experts\u2019\nquestion ratings<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"questions-2\">2030 questions<\/h4>\n\n\n\n<figure id=\"tab-3-4-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>P(c)<\/strong><\/td><td><p><strong>RR<\/strong><\/p><em>(P(U|c) \/ P(U))<\/em><\/td><td><strong>Mean POM-z<\/strong><\/td><td><strong>Pairwise wins<\/strong><\/td><td><em><strong>n<\/strong><\/em><\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>1.26%<\/td><td>37%<\/td><td>1.9<\/td><td>0.94<\/td><td>87%<\/td><td>5<\/td><\/tr><tr><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td>0.98%<\/td><td>54%<\/td><td>1.6<\/td><td>0.06<\/td><td>75%<\/td><td>4<\/td><\/tr><tr><td>Deceptive AI warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>)<\/td><td>0.85%<\/td><td>66%<\/td><td>1.1<\/td><td>0.10<\/td><td>66%<\/td><td>5<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>0.64%<\/td><td>17%<\/td><td>1.2<\/td><td>0.16<\/td><td>48%<\/td><td>5<\/td><\/tr><tr><td>AI arms race, multipolar result (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>)<\/td><td>0.37%<\/td><td>38%<\/td><td>1.1<\/td><td>-0.33<\/td><td>41%<\/td><td>4<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>0.18%<\/td><td>25%<\/td><td>0.8<\/td><td>-0.59<\/td><td>18%<\/td><td>5<\/td><\/tr><tr><td>Kurzweil\/Kapor longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>0.15%<\/td><td>57%<\/td><td>0.8<\/td><td>-0.41<\/td><td>15%<\/td><td>5<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.4.1:<\/strong> Concerned experts&#8217; 2030 question ratings. P(c) is the arithmetic mean of this group&#8217;s responses. RR (relative risk) is an arithmetic mean of each individual&#8217;s relative risk (P(U|c) \/ P(U)).<br>*Denotes external questions not generated as part of the conditional tree process.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Concerned experts\u2019 top-rated question, \u201cAdministrative disempowerment warning shot\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>), refers to a scenario in which <em>\u201cAI systems are preventing a substantial number of people from accessing financial assets, voting, moving freely, accessing the internet, or accessing certain types of housing or employment.\u201d<\/em> It scores best on both POM and POM-z; and while its position is bolstered by an outlier, it is also generally rated well among respondents.<sup data-fn=\"3fae1832-ff19-4949-8b6b-46492cd511ff\" class=\"fn\"><a href=\"#3fae1832-ff19-4949-8b6b-46492cd511ff\" id=\"3fae1832-ff19-4949-8b6b-46492cd511ff-link\">48<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">No mean POM differences between questions were significant in this sample (after correcting for multiple testing, all p-values were equal to 1). The filtering round elicitation for these questions appeared to be a poor proxy for expert judgments in the main survey round (see the \u201d<a href=\"#methods\" id=\"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#methods\">Methods<\/a>\u201d section for more details on the filtering round elicitation).<sup data-fn=\"ba0ea008-1455-4f63-a334-49975b362690\" class=\"fn\"><a href=\"#ba0ea008-1455-4f63-a334-49975b362690\" id=\"ba0ea008-1455-4f63-a334-49975b362690-link\">49<\/a><\/sup><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-1.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.1:<\/strong> Concerned experts\u2019 2030 P(c)<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-2.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.2:<\/strong> Concerned experts\u2019 2030 relative risk. Diamonds represent mean values. Log scale. Relative risk &gt;1 reflects a positive update, that is, where P(U|c) &gt; P(U).<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-3.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.3:<\/strong> Concerned experts\u2019 2030 POM VOI. Diamonds represent arithmetic means.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-4.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.4:<\/strong> Concerned experts\u2019 2030 POM VOI sensitivity matrix (pairwise wins). Visualization of resampling simulation results.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"questions-3\">2050-2070 questions<\/h4>\n\n\n\n<figure id=\"tab-3-4-2\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>P(c)<\/strong><\/td><td><p><strong>RR<\/strong><\/p><em>(P(U|c) \/ P(U))<\/em><\/td><td><strong>Mean POM-z<\/strong><\/td><td><strong>Pairwise wins<\/strong><\/td><td><em><strong>n<\/strong><\/em><\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>14.71%<\/td><td>46%<\/td><td>1.5<\/td><td>0.53<\/td><td>95%<\/td><td>6<\/td><\/tr><tr><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>)<\/td><td>10.19%<\/td><td>19%<\/td><td>4.2<\/td><td>-0.05<\/td><td>80%<\/td><td>5<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>4.19%<\/td><td>60%<\/td><td>1.4<\/td><td>0.11<\/td><td>56%<\/td><td>4<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>3.00%<\/td><td>54%<\/td><td>1.4<\/td><td>0.56<\/td><td>47%<\/td><td>5<\/td><\/tr><tr><td>AI CEOs \/ Research productivity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>)<\/td><td>1.12%<\/td><td>46%<\/td><td>1.2<\/td><td>-0.59<\/td><td>22%<\/td><td>4<\/td><\/tr><tr><td>Less prosocial behavior \/ Failing institutions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>)<\/td><td>0.25%<\/td><td>43%<\/td><td>0.9<\/td><td>-0.63<\/td><td>0%<\/td><td>6<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 3.4.2:<\/strong> Concerned experts&#8217; 2050-2070 question ratings. P(c) is the arithmetic mean of this group&#8217;s responses. RR (relative risk) is an arithmetic mean of each individual&#8217;s relative risk (P(U|c) \/ P(U)).<br><em>*Denotes external questions not generated as part of the conditional tree process.<\/em><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Concerned experts\u2019 top-rated question by POM was \u201cNo aligned AGI\u201d\n(CX70), which not only ranked well among this set, but also achieved a\nvery high absolute percentage of maximum VOI of nearly 15%. This\nquestion also performed very well on sensitivity analysis, and was\njudged to be highly probable for this question set at 45.86%. It carried\nthe second highest relative risk at 1.5x, but no respondents gave\nextremely high relative risk estimates.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The top question by POM-z, \u201cPower-seeking behavior warning shot\u201d (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>), had only middling rank by POM, but nonetheless an objectively high POM value of 3%. It was judged to be highly probable at 53.6%, with a moderate relative risk (mean=1.4x).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">No mean POM differences between questions were significant in this sample (after correcting for multiple testing, the closest to significance was CX70 vs. <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a> at p = 0.638). The filtering round elicitation for these questions appeared to be a moderately good proxy for expert judgments in the main survey round (see \u201d<a href=\"#methods\" id=\"#methods\"><u>Methods<\/u><\/a>\u201d section for more details on the filtering round elicitation).<sup data-fn=\"d73bb912-46b5-4ac7-8c40-901425b9b182\" class=\"fn\"><a href=\"#d73bb912-46b5-4ac7-8c40-901425b9b182\" id=\"d73bb912-46b5-4ac7-8c40-901425b9b182-link\">50<\/a><\/sup><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"vs-20502070-questions-1\">2030 vs 2050\/2070 questions<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Overall, this set of experts seems to have judged the 2050\/2070 set\nof questions as more informative than the 2030 set: they on average\nachieved a POM of 5.9%, vs. 2030 questions at 0.63% (2030 IQR = 0.02% &#8211;\n0.92%; 2050\/2070 IQR = 0.18% &#8211; 6.6%). This difference appears to be a\ngenuine result, with <em>p = .043<\/em>; it is robust to the removal of\nany particular question or respondent.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Probability of positive resolution looks quite similar between 2030\nand 2050-2070 questions, at 42% and 44% respectively (2030 IQR = 15% &#8211;\n66%; 2050-2070 IQR = 30% &#8211; 60%). Relative risk for later questions was\nhigher in our sample, with an average of 1.8x vs. 2030 questions at 1.2x\n(2030 IQR = 1.0 &#8211; 1.2x; 2050-2070 IQR = 1.0 &#8211; 2.0x).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-5.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.5:<\/strong> Concerned experts\u2019 2050-2070 P(c)<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-6.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.6:<\/strong> Concerned experts\u2019 2050-70 relative risk. Diamonds represent arithmetic means. Log scale. Relative risk &gt;1 reflects a positive update, that is, where P(U|c) &gt; P(U).<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2090\" height=\"1617\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416.png\" alt=\"\" class=\"wp-image-1247\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416.png 2090w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-350x271.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-700x542.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-768x594.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-1536x1188.png 1536w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-2048x1585.png 2048w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-2000x1547.png 2000w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-1200x928.png 1200w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-7-e1776456106416-150x116.png 150w\" sizes=\"auto, (max-width: 2090px) 100vw, 2090px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.7:<\/strong> Concerned experts\u2019 2050-2070 POM VOI. Diamonds represent arithmetic means.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2026-04-06_ai-conditional-trees_fig-3-4-8.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\"><strong>Figure 3.4.8:<\/strong> Concerned experts\u2019 2050-2070 POM VOI sensitivity matrix (pairwise wins). Visualization of resampling simulation results.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-does-the-ai-conditional-tree-question-set-compare\">4. How\ndoes the AI conditional tree question set compare?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Because the conditional trees method is intensive, whether it is\nultimately useful depends on whether the questions it generates are\nsubstantially better than those generated in cheaper ways.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Hundreds of forecasting questions are publicly available on online\nforecasting platforms, such as <a href=\"https:\/\/www.metaculus.com\/home\/\">Metaculus<\/a>, <a href=\"https:\/\/www.gjopen.com\/\">Good Judgment Open<\/a>, <a href=\"https:\/\/www.hypermind.com\/\">Hypermind<\/a>, and <a href=\"https:\/\/manifold.markets\/\">Manifold Markets<\/a>. Some of these\nplatforms use a large degree of crowd-sourcing in constructing their\nquestion base, though most also employ professional question-writers,\nand may also receive commissions for forecasting questions on specific\ntopics from other organizations. These questions could be said to\nrepresent the \u201cstatus quo\u201d of question-writing in the field of\nforecasting.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Forecasting platforms are generally focused on making accurate predictions by aggregating many people\u2019s forecasts and usually allow participants to choose which questions to forecast. The questions that are popular on forecasting platforms are often questions that are important in themselves, more than as indicators of other events.<sup data-fn=\"10c48bb4-79b0-42e6-a192-9236ceca0466\" class=\"fn\"><a href=\"#10c48bb4-79b0-42e6-a192-9236ceca0466\" id=\"10c48bb4-79b0-42e6-a192-9236ceca0466-link\">51<\/a><\/sup> Because they are not primarily trying to find high VOI questions, it should not be surprising that a deliberate attempt to maximize for VOI would result in higher VOI questions. Nonetheless, we think this result is useful for people trying to use forecasting for policy and other planning purposes. Higher VOI questions are likely more useful as cruxes for future decisions, so these results suggest that investing resources in finding high VOI questions may result in questions that are more useful than those generated by existing platforms.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We built a <a href=\"https:\/\/forecastingresearch.org\/s\/Status-quo-question-set.xlsx\"><u>dataset<\/u><\/a> of such questions for comparison with those generated by the conditional tree process. Comparable questions, that is, those related to medium- and long-term events connected with AI, were concentrated in a small number of platforms.<sup data-fn=\"43cff954-4b3f-4059-914a-b20d2877bc75\" class=\"fn\"><a href=\"#43cff954-4b3f-4059-914a-b20d2877bc75\" id=\"43cff954-4b3f-4059-914a-b20d2877bc75-link\">52<\/a><\/sup> Below we refer to these questions as the \u201cstatus quo set\u201d.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We compared the questions generated through conditional trees (the\nAICT set) with questions in the status quo set in three ways:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#voi-comparison\">Value of Information (VOI)<\/a>: how informative are the questions in expectation? That is, how much would knowing the answer to a question inform forecasts on the ultimate question? See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=92\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 2<\/a> for more on VOI in this project.\n<ul class=\"wp-block-list\">\n<li>Based on a survey of skeptical superforecasters, most of the\nquestions from the AICT set were more informative than top questions in\nthe status quo set (n=8 on main survey; 7 on status quo\nsurvey).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><a href=\"#distribution-of-question-topics\">Distribution of question topics<\/a>: do the questions in the AICT set cover substantially different topics than those in the status quo set?\n<ul class=\"wp-block-list\">\n<li>For both sets, a majority of questions (59% and 72% for AICT and\nstatus quo sets, respectively) fell into the \u201cAcceleration\u201d category,\nwhich includes questions related to AI capabilities or investment in AI.\nFor the three other topic categories\u2014Social \/ Political \/ Economic,\nAlignment, and AI harms\u2014 there was a noticeable difference between the\nAICT set and the status quo set. In the AICT set, there were similar\nnumbers of questions in each of the three categories, while in the\nstatus quo set, there were more \u201cSocial \/ Political \/ Economic\u201d\nquestions than \u201cAlignment\u201d or \u201cAI harms\u201d questions.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><a href=\"#uniqueness\">Uniqueness<\/a>: within a given topic area, did the questions we generated address specialized expert interests that were not covered by questions in the status quo set?\n<ul class=\"wp-block-list\">\n<li>This comparison is the most preliminary and speculative: a member\nof our team simply rated questions on how much and in what ways the\nquestions articulated issues important to experts in ways not addressed\nby the status quo set. Overall, this analysis suggests that conditional\ntrees may be effective at finding forecast questions not captured by\ncurrent prediction platforms.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">As discussed above, we are comparing the questions generated by the\nconditional trees method to other questions primarily as a demonstration\nof the types of analysis that are possible with conditional trees. We\nexpect that the actual results would differ significantly if the study\nwere run again with more participants and do not recommend interpreting\nthese results as decisive evidence.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"voi-comparison\">4.1 VOI comparison\n(skeptical superforecasters)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Using the same survey methodology as in our main question-rating survey (see <a href=\"#main-question-rating-survey\">Methods<\/a>), we conducted a followup survey with the skeptical superforecaster sample (n=7) to obtain VOI ratings for a sample of the top AI-related status quo questions. This survey included eight status quo questions selected for their popularity among platform users at time of collection (see choosing criteria in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=97\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=97\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 3.2<\/a>). We also included two additional questions from the AICT set that were not included in the main question-rating survey.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Of the ten status quo questions for which we elicited VOI, nearly all were judged to be less informative by our superforecaster sample than nearly all AICT questions for which we elicited VOI (see table 4.1). Notable exceptions are \u201c<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>,\u201d an AICT question which scored lower than all but three status quo questions, and the status quo questions \u201cSTQ9\u201d and \u201cSTQ205\u201d which scored higher than four AICT questions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The mean informativeness of AICT questions resolving in 2030 was\nhigher than that of status quo questions resolving in the same year,\nwith p = .025. In this group, AICT questions were deemed, on average,\nnine times more informative than status quo questions. We did not find a\nsignificant effect for 2050-2070 questions (p = .10), although in our\nsample AICT questions were still eleven times more informative on\naverage.<\/p>\n\n\n\n<figure id=\"tab-4-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><\/td><td><strong>POM VOI, mean<\/strong><\/td><\/tr><tr><td>AI causing deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)<\/td><td>6.34%<\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>3.55%<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>1.68%<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>1.59%<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>1.37%<\/td><\/tr><tr><td>Deceptive AI warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>)<\/td><td>0.98%<\/td><\/tr><tr><td>AI involvement in nuclear arms (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a>)<\/td><td>0.68%<\/td><\/tr><tr><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>)<\/td><td>0.54%<\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>0.37%<\/td><\/tr><tr><td>Superalignment success (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ205 \/ STQ215<\/a>)*<\/td><td>0.28%<\/td><\/tr><tr><td>Kurzweil\/Kapor Turing Test longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>0.27%<\/td><\/tr><tr><td>AI CEOs \/ Research productivity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>)<\/td><td>0.26%<\/td><\/tr><tr><td>Less prosocial behavior \/ Failing institutions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>)<\/td><td>0.26%<\/td><\/tr><tr><td>AI arms race, multipolar result (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>)<\/td><td>0.26%<\/td><\/tr><tr><td>Brain emulation (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=91\" target=\"_blank\" rel=\"noreferrer noopener\">STQ196<\/a>)*<\/td><td>0.23%<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>0.14%<\/td><\/tr><tr><td>Compute restrictions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=90\" target=\"_blank\" rel=\"noreferrer noopener\">STQ236<\/a>)*<\/td><td>0.13%<\/td><\/tr><tr><td>US AI x-risk opinions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ19<\/a>)*<\/td><td>0.12%<\/td><\/tr><tr><td>AI novel reading (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ152<\/a>)*<\/td><td>0.05%<\/td><\/tr><tr><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td>0.02%<\/td><\/tr><tr><td>RoboCup (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=90\" target=\"_blank\" rel=\"noreferrer noopener\">STQ232<\/a>)*<\/td><td>0.02%<\/td><\/tr><tr><td>AI movies (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ47<\/a>)*<\/td><td>0.00%<\/td><\/tr><tr><td>LLM chess (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ149<\/a>)*<\/td><td>0.00%<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 4.1:<\/strong> Skeptical superforecasters&#8217; POM VOI (all years). Questions marked with an asterisk are from the status quo question set.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"distribution-of-question-topics\">4.2 Distribution of question\ntopics<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To understand whether the expert conditional tree elicitation produced questions with a substantially different topic focus than the crowdsourced \u201cstatus quo\u201d question set, we developed a category rating scheme and applied it to both question sets. For a description of the rating scheme, see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=95\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=95\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 3.1<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For both sets, a majority of question categorisations<sup data-fn=\"3b6caa56-c3da-44a1-890c-adf30fabc3d0\" class=\"fn\"><a href=\"#3b6caa56-c3da-44a1-890c-adf30fabc3d0\" id=\"3b6caa56-c3da-44a1-890c-adf30fabc3d0-link\">53<\/a><\/sup> (36% and 48% for AICT and status quo sets, respectively) fell into the \u201cAcceleration\u201d category, which includes questions related to AI capabilities or investment in AI, though this was somewhat more pronounced in the status quo set. For the AICT set, the three other categories had relatively similar proportions to one another. However, the status quo set had a larger proportion of \u201cSocial \/ Political \/ Economic\u201d question categorisations (33%) than \u201cAlignment\u201d questions (12%) or \u201cAI harms\u201d questions (7%).<sup data-fn=\"22b58859-8e48-4a06-b61f-93aa97393620\" class=\"fn\"><a href=\"#22b58859-8e48-4a06-b61f-93aa97393620\" id=\"22b58859-8e48-4a06-b61f-93aa97393620-link\">54<\/a><\/sup><\/p>\n\n\n\n<figure id=\"tab-4-2\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Category<\/strong><\/td><td><strong>AICT question set<\/strong><\/td><td><strong>Status quo question set<\/strong><\/td><\/tr><tr><td>Social \/ Political \/ Economic<\/td><td>24% (29)<\/td><td>33% (131)<\/td><\/tr><tr><td>Alignment<\/td><td>20% (25)<\/td><td>12% (47)<\/td><\/tr><tr><td>AI harms<\/td><td>20% (25)<\/td><td>7% (27)<\/td><\/tr><tr><td>Acceleration<\/td><td>36% (44)<\/td><td>48% (191)<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 4.2:<\/strong> Distribution of question topics. Proportion of total questions that fell into each category; numbers in parentheses are total questions per category. While some questions fell into multiple categories (and thus proportions in each column should sum to more than 100%), proportions have been normalized for ease of comparison.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"uniqueness\">4.3 Uniqueness<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Beyond high-level topic overlap, to what extent were the interests of\nour expert sample already represented in the status quo question set,\nand where did our question set add novel content?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Answering this question thoroughly is beyond the scope of this report, but we will share some observations here. To demonstrate a method for assessing uniqueness, one teammate rated questions from the \u201cAlignment\u201d topic area on several dimensions of uniqueness:<sup data-fn=\"e92b2983-5493-48ad-82e6-56c5268bd965\" class=\"fn\"><a href=\"#e92b2983-5493-48ad-82e6-56c5268bd965\" id=\"e92b2983-5493-48ad-82e6-56c5268bd965-link\">55<\/a><\/sup><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Conceptual uniqueness: how much did the question prompts\ngenerated by the conditional trees method capture expert interests not\ncaptured by the status quo set?\n<ul class=\"wp-block-list\">\n<li>Of the 31 question prompts in the \u201cAlignment\u201d category,<sup data-fn=\"452e355e-3bc4-4c32-8089-684d98657bdd\" class=\"fn\"><a href=\"#452e355e-3bc4-4c32-8089-684d98657bdd\" id=\"452e355e-3bc4-4c32-8089-684d98657bdd-link\">56<\/a><\/sup> only two were totally or mostly captured by an existing question in the status quo set. 12 questions were \u201cpartly captured,\u201d 12 were \u201cmostly uncaptured,\u201d and five were wholly uncaptured. These ratings suggest that this method may be effective at finding forecasting questions not captured by current prediction platforms.<\/li>\n\n\n\n<li>We thought that experts\u2019 interests within the \u201cdeveloper perception\u201d and \u201cpower-seeking\u201d themes were particularly poorly represented by the status quo set;<sup data-fn=\"2c5fd0c1-116f-414b-9616-a83afcb3710b\" class=\"fn\"><a href=\"#2c5fd0c1-116f-414b-9616-a83afcb3710b\" id=\"2c5fd0c1-116f-414b-9616-a83afcb3710b-link\">57<\/a><\/sup> few questions pertaining to these themes existed in the status quo set (one and two, respectively), and those that existed were relatively narrow or dissimilar to the expert prompts.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Operationalization uniqueness: how unique was the\noperationalization generated by the conditional trees method, compared\nto the status quo question we thought was most similar?\n<ul class=\"wp-block-list\">\n<li>Operationalization uniqueness could refer to different subject\nmatter, different operationalization strategies for similar subject\nmatter, or an expectation of uncorrelated question resolutions. Purely\nlinguistic differences between question texts were not considered as\npart of \u201cuniqueness.\u201d<\/li>\n\n\n\n<li>Operationalized question texts were rated independently of\nquestion prompts; thus, if a question prompt specified unique subject\nmatter and this was reflected in the operationalization of a question,\nthis counted toward both conceptual uniqueness and operationalization\nuniqueness.<\/li>\n\n\n\n<li>Overall, our operationalizations were fairly different from those\nin the status quo set: none were extremely similar and one had only\nminor differences.\n<ul class=\"wp-block-list\">\n<li>Of the others, 9 had moderate differences, 15 were very\ndifferent, and 4 were almost entirely different.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>For a preliminary quantitative analysis of these results and a discussion of \u201cconjunctive uniqueness,\u201d see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=97\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 3.2<\/a>.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"discussion\">5. Discussion<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"takeaways-relating-to-the-conditional-trees-method\">5.1\nTakeaways relating to the conditional trees method<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"the-conditional-trees-method-produced-novel-and-informative-forecasting-questions.\">The\nconditional trees method produced novel and informative forecasting\nquestions.<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Forecasting communities have shown great interest in questions\nrelated to AI, which number in the hundreds on forecasting platforms.\nYet relatively little has been done to evaluate the extent to which\nquestions on existing platforms are either informative or relevant to\nthe interests of AI experts, and similarly, little has been done to\nsystematically improve the quality of forecasting questions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By directly targeting expert interests via a specialized interview\nand question-writing pipeline, the conditional trees process provided an\noriginal method of improving on the status quo, producing suggestive\nevidence that this process could lead to novel and highly informative\nquestions<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Drawing on 24 one-hour interviews, our team created 75 AI forecasting questions (the AICT set). In a small sample (n=8 and n=7 for the main and supplementary surveys, respectively) comparison of POM VOI ratings from superforecasters, 12 (out of 13) surveyed AICT questions scored higher than 8 (out of 10) popular status quo questions. The table below shows a comparison of the top 5 questions generated by the conditional trees method to the top 5 questions taken from existing platforms, where the questions taken from existing platforms are marked with an asterisk.<\/p>\n\n\n\n<figure id=\"tab-5-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Mean POM VOI<\/strong><\/td><\/tr><tr><td>AI causes large-scale deaths, ineffectual response (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>)<\/td><td>6.34%<\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>3.55%<\/td><\/tr><tr><td>Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)<\/td><td>1.68%<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>1.59%<\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>1.37%<\/td><\/tr><tr><td>Superalignment success (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=87\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=87\" target=\"_blank\" rel=\"noreferrer noopener\">STQ205 \/ STQ215<\/a>)*<\/td><td>0.28%<\/td><\/tr><tr><td>Kurzweil\/Kapor Turing Test longbet (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>)*<\/td><td>0.27%<\/td><\/tr><tr><td>Brain emulation (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=91\" target=\"_blank\" rel=\"noreferrer noopener\">STQ196<\/a>)*<\/td><td>0.23%<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)*<\/td><td>0.14%<\/td><\/tr><tr><td>Compute restrictions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=90\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=90\" target=\"_blank\" rel=\"noreferrer noopener\">STQ236<\/a>)*<\/td><td>0.13%<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 5.1:<\/strong> Ratings of how informative questions generated by the conditional trees method are relative to popular questions taken from existing forecasting platforms. The questions taken from existing forecasting platforms are marked with an asterisk.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Crowd-sourced question sets may have some basic practical limits set\nby the fact that the crowd is often made up largely of laypeople,\nwhereas experts\u2019 specialized knowledge gives them access to other parts\nof the \u201cquestion space.\u201d This could suggest that achieving more active\nexpert participation in crowd-sourcing efforts would improve their\noutput. However, it may be difficult to structure such efforts in a way\nthat effectively incentivizes expert engagement, for a number of\npossible reasons:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Experts\u2019 time is valuable, so they may feel disinclined to\nparticipate in crowd-sourcing efforts where their contributions may seem\nlike a \u201cdrop in a bucket\u201d.<\/li>\n\n\n\n<li>Rewards for high-value contributions may be poorly aligned with\nexperts\u2019 motivations, if for example they are only rewarding in the\ncontext of a specific community (e.g., website karma); if they are\ninsufficiently large for the opportunity cost (e.g., a monetary reward\nthat would be lower than the expert\u2019s equivalent hourly consulting fee);\nor if they are allocated perversely (e.g., preferentially to those more\nembedded in the forecasting community).<\/li>\n\n\n\n<li>Expert attrition from friction within the pipeline may be high,\nif for example a user interface has a steep learning curve. Experts are\nlikely to be both more time-poor and older than the average user of an\nonline forecasting platform.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Beyond simple expert engagement, the conditional tree question\ngeneration process likely contributed to the quality of the results. In\ninterviews, many experts remarked that the conditional tree elicitation\nprompted them to think in novel ways, and to generate content that they\notherwise would not have. Additionally, experts were not required to\nturn this content into fully operationalized forecasting questions, a\ntime-consuming task which few of them had significant experience with,\nas this step was instead completed by a question-writing team.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, the value of the AICT question generation exercise rests in part on the response of forecasters. Arguably, the primary object of interest in forecasting to policymakers is the forecasts, without which questions have limited value. And regardless of the AICT questions\u2019 novelty or ostensible \u201cinformativeness\u201d (from a VOI standpoint), they may not be so informative if forecasters fail to engage with them.<sup data-fn=\"7b29920b-e450-4202-9a40-834e6210314d\" class=\"fn\"><a href=\"#7b29920b-e450-4202-9a40-834e6210314d\" id=\"7b29920b-e450-4202-9a40-834e6210314d-link\">58<\/a><\/sup><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"the-conditional-trees-method-requires-significant-time-and-labor-to-generate-forecasting-questions.\">The\nconditional trees method requires significant time and labor to generate\nforecasting questions.<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">While the conditional trees method can generate novel and informative\nquestions that align with expert interests, its usefulness may be\nlimited for those who cannot invest significant time and labor into the\nprocess. The method requires a considerable amount of effort to\nimplement effectively, which could outweigh its benefits for individuals\nor organizations with limited resources.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In particular, maintaining consistent expert engagement throughout\nall phases of the process proved challenging. Although experts were\nwilling to engage in the question-generation phase of the conditional\ntrees process, they showed significantly less enthusiasm for\nparticipating in the question-judging phase. Providing VOI estimates is\nrelatively labor-intensive: for each question, one must generate a\nforecast for that question\u2019s probability of resolving positively, and a\nfurther conditional probability of some ultimate outcome given the\nquestion\u2019s resolution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A high quality forecast often requires both a careful reading of the\nquestion\u2019s terms, and some amount of research into base rates of\nrelevant phenomena, the forecasts of others on similar questions, and so\nforth. Relative to professional superforecasters, experts usually have\nmuch less practice producing a large volume of forecasts quickly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"takeaways-for-ai-risk-detection\">5.2 Takeaways for AI risk\ndetection<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"ai-alignment-and-concrete-harms-from-ai-are-front-of-mind-for-concerned-experts.\">AI\nalignment and concrete harms from AI are front of mind for concerned\nexperts.<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Over 21 interviews with experts concerned about AI risk, 15 of them\nnamed indicators relating directly to AI alignment as the best warning\nsigns of AI-related extinction by 2100. For example, some experts\nthought we might see compelling evidence that powerful misaligned\nsystems existed, as in the following:<\/p>\n\n\n\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-4fc3f8e1 wp-block-group-is-layout-flex\">\n<h5 class=\"wp-block-heading\">Expert prompt (ID: MD30)<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">Advanced, planning, strategically-aware AI (as defined by Carlsmith) is developed.<\/p>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Others worried that AI alignment research agendas may fail to bear\nfruit, or that AI developers would broadly disregard safety\nconsiderations. Questions related to alignment made up 35% of the AICT\nquestion set.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Questions relating to concrete harms caused by AI were also prevalent: they featured in 10 expert interviews, and made up 33% of the AICT question set.<sup data-fn=\"78d4749e-b022-4635-bbf2-2efc3c10648a\" class=\"fn\"><a href=\"#78d4749e-b022-4635-bbf2-2efc3c10648a\" id=\"78d4749e-b022-4635-bbf2-2efc3c10648a-link\">59<\/a><\/sup> Such questions included themes of interaction between AI technology and other sources of risk, such as biological or nuclear weapons; powerful AI systems manipulating humans; small-scale disasters causing human deaths or financial damage; and threats to political or societal order. The details of some of the questions that experts found the most informative are included in Table 5.2 below.<\/p>\n\n\n\n<figure id=\"tab-5-2-1\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Question<\/strong><\/td><td><strong>Summary<\/strong><\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>This question resolves positively if there is no AGI system which meets several &#8216;alignment&#8217; criteria by 2070. Those criteria include AGI-like capabilities (high question answering skill on hard questions in diverse subjects, high programming ability) and alignment (performing well on SOTA alignment tests, and expert surveys indicating that experts believe that AI is aligned).<\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>)<\/td><td>This question resolves positively if, by 2050, AI developers must attempt to disable or destroy an AI (costing &gt;$5M) after it exhibits power-seeking behaviors such as acquiring significant resources, controlling financial accounts, manipulating humans, infiltrating key systems, seizing control of weaponry, self-propagation, or engaging in hacking.<\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>)<\/td><td>This question resolves positively if any of the following occur by 2030: AI systems in a country freeze assets affecting \u22651% of the population or \u22650.5% of GDP for over a year; prevent \u22655% of voting-age citizens from participating in a national election; restrict &gt;3% of residents&#8217; movement for 3+ months; limit &gt;10% of the population&#8217;s internet access for 3+ months; or legally restrict &gt;5% of citizens&#8217; access to certain housing or employment for 1+ year.<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 5.2.1:<\/strong> Assorted summaries of questions that experts found to be particularly informative.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">By contrast, the set of existing AI forecasting questions on crowdsourced platforms (the \u201c<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">status quo set<\/a>\u201d) feature a smaller proportion of such questions, just 18% and 10% for \u201calignment\u201d and \u201charms\u201d categories, respectively. A larger proportion of questions in this set related to \u201cacceleration\u201d of AI technologies, or to economic, commercial, and sociopolitical topics.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Beyond the implications for the forecasting ecosystem, concerned\nexperts\u2019 preference for direct indicators of AI alignment or harms holds\npotential lessons for policymakers. For example, if current efforts by\ngovernments and regulatory bodies to monitor the nascent AI industry are\nheavily focused on tracking emerging AI capabilities or industry\ninvestment, our results suggest such signals may be overvalued from an\nexistential risk perspective.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, the expert VOI judgments from this report can only offer\nrelatively weak evidence for experts\u2019 views on the informativeness of\nquestions. The sample of experts who provided forecasts was extremely\nsmall (n=11).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"concerned-experts-and-skeptical-superforecasters-may-disagree-about-which-questions-best-indicated-heightened-ai-risk.\">Concerned\nexperts and skeptical superforecasters may disagree about which\nquestions best indicated heightened AI risk.<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">While the skeptical superforecasters and concerned experts had some\nnotable disagreements, they did find a few questions similarly\ninformative. Three out of 13 surveyed questions scored in the top half\nof questions (by POM VOI) for both groups:<\/p>\n\n\n\n<figure id=\"tab-5-2-2\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><\/td><td><\/td><td colspan=\"2\"><strong>Superforecasters<\/strong><\/td><td colspan=\"2\"><strong>Experts<\/strong><\/td><\/tr><tr><td><strong>Question<\/strong><\/td><td><strong>Res year<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>Mean POM-z<\/strong><sup data-fn=\"5f7eb7bd-e9af-4559-9f51-64107e91deb5\" class=\"fn\"><a href=\"#5f7eb7bd-e9af-4559-9f51-64107e91deb5\" id=\"5f7eb7bd-e9af-4559-9f51-64107e91deb5-link\">59<\/a><\/sup><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>Mean POM-z<\/strong><\/td><\/tr><tr><td>Administrative disempowerment warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\"><u>CX30<\/u><\/a>)<\/td><td>2030<\/td><td><strong>3.55% (1)<\/strong><\/td><td>0.28 (4)<\/td><td>1.26% (5)<\/td><td><strong>0.94 (1)<\/strong><\/td><\/tr><tr><td>Power-seeking behavior warning shot (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\"><u>ZA50<\/u><\/a>)<\/td><td>2050<\/td><td><strong>1.59% (3)<\/strong><\/td><td><strong>0.75 (1)<\/strong><\/td><td>3.00% (4)<\/td><td><strong>0.56 (2)<\/strong><\/td><\/tr><tr><td>High AI investment, low safety indicators (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\"><u>VL70<\/u><\/a>)<\/td><td>2070<\/td><td>0.54% (6)<\/td><td><strong>0.62 (2)<\/strong><\/td><td><strong>10.19% (2)<\/strong><\/td><td>-0.05 (8)<\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 5.2.2:<\/strong> Questions scoring in the top half of questions by POM VOI for both superforecasters and experts. Numbers in parentheses are rank orders, out of the set of 13 surveyed questions. The three highest-ranking questions for each metric and group are highlighted.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">But they also had nearly opposite opinions of four questions, with\none group ranking each of these four among the most informative\nquestions and the other considering it among the lowest:<\/p>\n\n\n\n<figure id=\"tab-5-2-3\" class=\"wp-block-table\"><div class=\"table-wrapper\"><table class=\"has-fixed-layout\"><tbody><tr><td><\/td><td><\/td><td colspan=\"2\"><strong>Superforecasters<\/strong><\/td><td colspan=\"2\"><strong>Experts<\/strong><\/td><\/tr><tr><td><strong>Question<\/strong><\/td><td><strong>Res year<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>Mean POM-z<\/strong><\/td><td><strong>Mean POM<\/strong><\/td><td><strong>Mean POM-z<\/strong><\/td><\/tr><tr><td>Extinction-level pathogens feasible (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>)<\/td><td>2030<\/td><td>1.37% (4)<\/td><td><strong style=\"color:green\">0.57 (3)<\/strong><\/td><td><strong style=\"color:red\">0.18% (12)<\/strong><\/td><td><strong style=\"color:red\">-0.59 (12)<\/strong><\/td><\/tr><tr><td>AI autonomous purchasing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>)<\/td><td>2030<\/td><td><strong style=\"color:red\">0.02% (13)<\/strong><\/td><td><strong style=\"color:red\">-0.58 (12)<\/strong><\/td><td>0.98% (7)<\/td><td>0.06 (7)<\/td><\/tr><tr><td>Human-machine intelligence parity (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>)<\/td><td>2050<\/td><td><strong style=\"color:red\">0.14% (12)<\/strong><\/td><td><strong style=\"color:red\">-0.61 (13)<\/strong><\/td><td><strong style=\"color:green\">4.19% (3)<\/strong><\/td><td>0.11 (5)<\/td><\/tr><tr><td>No aligned AGI (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>)<\/td><td>2070<\/td><td>0.37% (7)<\/td><td>-0.23 (9)<\/td><td><strong style=\"color:green\">14.71% (1)<\/strong><\/td><td><strong style=\"color:green\">0.53 (3)<\/strong><\/td><\/tr><\/tbody><\/table><\/div><figcaption class=\"wp-element-caption\"><strong>Table 5.2.3:<\/strong> Questions whose importance superforecasters and experts disagreed on. Numbers in parentheses are rank orders out of the set of 13 surveyed questions. The three highest-ranking (green) and lowest-ranking (red) questions for each metric and group are highlighted.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Notably, both experts and superforecasters appear to find questions\nrelating to concrete harms from AI to be informative, whereas\nsuperforecasters and experts disagree about the relative informativeness\nof questions relating to AI alignment. Unlike experts, superforecasters\ndo not appear to place significant value on questions relating to AI\nalignment. However, very small sample sizes, plus the potential for high\nvariation in individual rater responses over time, prevent us ruling out\nnoise as an explanation for these patterns.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"limitations\">6. Limitations of Our\nResearch<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Limitations of our research include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The total number of participants in this study was very small. It\nis therefore likely that some of the results would not be replicated in\na larger study.<\/li>\n\n\n\n<li>This study involves eliciting long-range forecasts, but there is little evidence that these forecasts are accurate. Most studies of judgmental forecasting measure accuracy on 0-2 year time horizons, which is likely much easier than forecasting outcomes on 5+ year time horizons (in this study we typically asked for forecasts resolving between 2030 and 2100).<sup data-fn=\"2bd98e8d-bd99-48de-a9f3-6288ead09b62\" class=\"fn\"><a href=\"#2bd98e8d-bd99-48de-a9f3-6288ead09b62\" id=\"2bd98e8d-bd99-48de-a9f3-6288ead09b62-link\">60<\/a><\/sup> If forecasts over long time horizons are not generally reliable, then these conditional trees would not be providing a useful signal.<\/li>\n\n\n\n<li>Since conditional trees are composed of conditional forecasts,\ntheir reliability depends on the assumption that conditional forecasts\nare meaningful. However, we do not know whether people are accurate when\nmaking conditional forecasts. There is little experimental evidence on\nhow best to elicit conditional forecasts. Some reasons to expect that\nconditional forecasts may not be robust or accurate include:\n<ul class=\"wp-block-list\">\n<li>Intuitively, conditional forecasting seems difficult. Our team\noften finds generating and understanding forecasts on these questions to\nbe challenging, so we would expect others to find it so also.<\/li>\n\n\n\n<li>Case in point, the forecasters we surveyed often initially\nstruggled to provide conditional forecasts that were logically coherent.\nTheir conditional forecasts implied that the probability of the ultimate\nquestion<em>and<\/em> the crux resolving positively was greater than the\nprobability of the ultimate question resolving positively, an issue\nknown as the<em>conjunction fallacy<\/em>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>This study asked people to make forecasts in an exceptionally\nshort period of time in the filtering stage: one minute per question.\nThese \u201cshort-fuse forecasts\u201d may be less reliable than forecasts that\ninvolve higher degrees of thought and effort. Participants spent longer\namounts of time on the forecasts that inform VOI calculations.<\/li>\n\n\n\n<li>Participants in this study were all either experts who are highly\nconcerned about existential risks from AI, or superforecasters who are\nnot. As a result, we are not able to separate differences caused by risk\nassessment from differences caused by forecasting aptitude, professional\ntraining, or other factors.<\/li>\n\n\n\n<li>AI developments seem particularly challenging to predict, and\nforecasters on this topic in past FRI projects have emphasized their\nuncertainty. As a result, their predictions about future AI\ndevelopments, especially those that will not resolve for many years, may\nnot be reliable enough to be practically useful.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"next-steps\">7. Next Steps<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Further research related to this topic could include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Assessing whether the questions identified through this process continue to perform better than status quo forecasting questions (in terms of value of information) when a larger number of people forecast on them. We have added relevant questions from this project to two forecasting platforms (see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=113\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 7<\/a> for links) and will be interested to see whether they receive many forecasts and how their value of information compares to other questions.\n<ul class=\"wp-block-list\">\n<li>So far, public forecasting platforms have not applied question\nmetrics like VOI to their questions or incentivized questions that are\nunusually informative or decision-relevant. It\u2019s possible that\nincentives on those platforms could produce questions as good as the\nones identified by the trees method. In general, we would be interested\nto see forecasting platforms implement the kinds of question metrics\ndiscussed in this report so that questions can be sorted according to\nvalue of information on major topics such as AI existential\nrisk.<\/li>\n\n\n\n<li>We have had some discussions with forecasting platforms like\nMetaculus and hope that metrics like the ones used in this project can\nhelp platforms find the highest-value questions.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Replicating the conditional trees process with larger sample\nsizes and in other domains. For example, would this process also\nidentify more informative questions on topics such as nuclear policy and\nclimate change?\n<ul class=\"wp-block-list\">\n<li>In particular, choosing domains where important questions will\nresolve sooner could help assess how useful the conditional trees\nprocess is.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>As the questions in the trees resolve (beginning in 2030),\nparticipants could be re-surveyed to see how well conditional trees\nperformed.\n<ul class=\"wp-block-list\">\n<li>For example, once we know whether the 2030 questions have\nhappened or not, we could ask participants for their new forecast on the\nprobability of extinction due to AI by 2100, and see if it is similar to\nwhat was predicted by the conditional trees.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Would other research groups or organizations be able to replicate and run their own conditional tree interview process based on the information in this report and the <a href=\"#executive-summary-key-outputs\"><u>resources<\/u><\/a> we provide?<\/li>\n\n\n\n<li>FRI recently completed another research project with a similar goal: an <a href=\"https:\/\/forecastingresearch.org\/ai-adversarial-collaboration\"><u>adversarial collaboration project<\/u><\/a> (<a href=\"https:\/\/web.archive.org\/web\/20240727230248\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/65ef1ee52e64b52f145ebb49\/1710169832137\/AIcollaboration.pdf\"><u>a<\/u><\/a>) that brought together generalist forecasters and domain experts who disagreed about the risk AI poses to humanity in the next century and asked them to work together to find questions that underlie their disagreement.\n<ul class=\"wp-block-list\">\n<li>Comparing the questions from the two methods may help us\nunderstand the merits of each approach, so that we can design better\nforecasting questions and elicitation processes on AI and other\ntopics.<\/li>\n\n\n\n<li>In particular, in both projects, people who were less concerned\nabout extinction due to AI by 2100 tended to value questions that\nfocused on concrete harms caused by AI, while those more concerned were\nmore likely to value questions regarding advanced capabilities or\nwhether artificial intelligence had been successfully aligned.\n<ul class=\"wp-block-list\">\n<li>This may be related to each group\u2019s expectations of how difficult\nit will be to align a powerful AI model: participants skeptical of AI\nrisk were likely to think that alignment is a technical problem that is\nnot fundamentally different from problems that people have previously\nsolved and that we are likely to come up with workable solutions when we\nneed to. If this is true, there may be useful cruxes related to ease of\nalignment.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>FRI also conducted a conditional trees experiment focused on\nforecasting the outcome of baseball games. Future work could examine\nthose results alongside the AI results for additional tests of the\nconditional trees method.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-availability\">Data Availability<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Survey data from the <a href=\"https:\/\/forecastingresearch.org\/s\/Filtering-Round-Supers-Short-Fuse.xlsx\"><u>filtering\nround<\/u><\/a>, <a href=\"https:\/\/forecastingresearch.org\/s\/AI-Conditional-Trees-Data.xlsx\"><u>main\nsurvey, supplementary survey, and the question combinations\nsurvey<\/u><\/a> are available at the previous links.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Notes<\/h2>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"70fde7b6-0fde-4599-9c2d-11d3813282a7\">We will refer to this set of forecasters as \u201csuperforecasters\u201d henceforth. Note that while seven of the forecasters are Superforecasters \u2122 as officially designated by Good Judgment Inc., one is a skilled forecaster who does not have that label but has a comparable track record of calibrated forecasts. <a href=\"#70fde7b6-0fde-4599-9c2d-11d3813282a7-link\" aria-label=\"Jump to footnote reference 1\">\u21a9\ufe0e<\/a><\/li><li id=\"6a0fde15-496f-4006-8d0a-e475bac7a3e3\">To ensure the integrity of links in this report, we include stable archive.org links in parentheses after each citation to an external URL. <a href=\"#6a0fde15-496f-4006-8d0a-e475bac7a3e3-link\" aria-label=\"Jump to footnote reference 2\">\u21a9\ufe0e<\/a><\/li><li id=\"d2fbd3d6-4e36-4a09-8929-799e1c840251\">More specifically, the ultimate question was defined as the global human population falling below 5,000 individuals at any time before 2100, with AI being a proximate cause of such reduction. <a href=\"#d2fbd3d6-4e36-4a09-8929-799e1c840251-link\" aria-label=\"Jump to footnote reference 3\">\u21a9\ufe0e<\/a><\/li><li id=\"b5003308-fdda-4da9-9b27-b48cbaa7fdf2\">\u201cPlausible\u201d meaning that the forecaster deemed the indicator event to be at least 10% likely to occur. This 10% probability was not necessarily an unconditional probability, but may have been conditional on a previous node in the conditional tree. <a href=\"#b5003308-fdda-4da9-9b27-b48cbaa7fdf2-link\" aria-label=\"Jump to footnote reference 4\">\u21a9\ufe0e<\/a><\/li><li id=\"d32de3b7-6235-47f8-a676-3de84c95b8e3\">By \u201cinformative,\u201d we mean that knowing the answer to one of these questions would make a larger difference, in expectation, to a participants\u2019 forecast of the ultimate question, in this case, \u201cWill AI cause human extinction by 2100.\u201d For more on informativeness and the metric we use to assess it, see the section on <a href=\"#value-of-information-results\" id=\"#value-of-information-results\">Value of Information (VOI)<\/a> . Forecasting platforms are generally focused on making accurate predictions by aggregating many people\u2019s forecasts and usually allow participants to choose which questions to forecast. The questions that are popular on forecasting platforms are often questions that are important in themselves, more than as indicators of other events, and the platforms are not deliberately attempting to find high VOI questions. <a href=\"#d32de3b7-6235-47f8-a676-3de84c95b8e3-link\" aria-label=\"Jump to footnote reference 5\">\u21a9\ufe0e<\/a><\/li><li id=\"af4cda98-41b4-41c2-a145-96b1e4d40348\">For more on the question filtering process, see <a href=\"#judging-questions-and-constructing-aggregate-trees\" id=\"1242\">Section 2.2<\/a>. <a href=\"#af4cda98-41b4-41c2-a145-96b1e4d40348-link\" aria-label=\"Jump to footnote reference 6\">\u21a9\ufe0e<\/a><\/li><li id=\"682d449b-dfd9-4422-a922-275153246709\">The four lowest-scoring AICT questions \u2013 <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>, and <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a> \u2013 ranked 12th, 13th, 14th, and 20th out of 23, respectively. <a href=\"#682d449b-dfd9-4422-a922-275153246709-link\" aria-label=\"Jump to footnote reference 7\">\u21a9\ufe0e<\/a><\/li><li id=\"6688d111-6e2b-4a8c-8a28-2f4224a7caa9\">At the time of data collection, we had not yet developed the POM VOI metric, so participants were not deliberately optimizing for it. Later, we found that POM VOI captured the idea of question informativeness better than VOI alone, which yields a number that is hard to interpret and contextualize. For a full list of questions analyzed, see <a href=\"#tab-3-1-3\" id=\"#tab-3-1-3\">Table 3.1.3<\/a> . A comprehensive explanation of the POM VOI metric can be found in <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 4<\/a>. <a href=\"#6688d111-6e2b-4a8c-8a28-2f4224a7caa9-link\" aria-label=\"Jump to footnote reference 8\">\u21a9\ufe0e<\/a><\/li><li id=\"32da7651-60fa-46b9-8bde-e13f84cbf288\">Careful readers will note that the probabilities in this figure do not yield the mean POM VOI values we report (see <a href=\"#tab-e-1\" id=\"#executive-summary-results\">Table E.1<\/a>). Mean POM VOI tells us how valuable a crux is for a group, on average, by computing POM VOI at the individual level and then aggregating. The average relative updates, across individuals in the same group, sometimes tell a quite different story. <a href=\"#32da7651-60fa-46b9-8bde-e13f84cbf288-link\" aria-label=\"Jump to footnote reference 9\">\u21a9\ufe0e<\/a><\/li><li id=\"6118b2f7-6787-4dfd-afe5-c07c40a45373\">Several related methods, such as Delphi and Bayesian Network elicitation, may be useful to forecasting research in similar ways. See Bernice B. Brown, \u201cDelphi Process: A Methodology Used for the Elicitation of Opinions of Experts,\u201d Rand Corporation report (September 1968) and Judea Pearl, <em>Probabilistic Reasoning in Intelligent Systems<\/em> , (New York, Morgan-Kaufman: 1998). <a href=\"#6118b2f7-6787-4dfd-afe5-c07c40a45373-link\" aria-label=\"Jump to footnote reference 10\">\u21a9\ufe0e<\/a><\/li><li id=\"21f37ef0-875f-40d5-8ee5-697b5e11a2e3\">Karger et al., \u201cForecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament,\u201d 2023. <a href=\"https:\/\/forecastingresearch.org\/research\/existential-risk-persuasion-tournament\" id=\"876\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/forecastingresearch.org\/research\/existential-risk-persuasion-tournament<\/a> (<a href=\"https:\/\/web.archive.org\/web\/20240803193928\/https:\/\/forecastingresearch.org\/xpt\" id=\"https:\/\/web.archive.org\/web\/20240803193928\/https:\/\/forecastingresearch.org\/xpt\" target=\"_blank\" rel=\"noreferrer noopener\">a<\/a>) (XPT report). <a href=\"#21f37ef0-875f-40d5-8ee5-697b5e11a2e3-link\" aria-label=\"Jump to footnote reference 11\">\u21a9\ufe0e<\/a><\/li><li id=\"92fe9899-6054-4948-89c7-3ce49c8e1155\">These numbers are intended to be illustrative and are not based on actual vaccine data. <a href=\"#92fe9899-6054-4948-89c7-3ce49c8e1155-link\" aria-label=\"Jump to footnote reference 12\">\u21a9\ufe0e<\/a><\/li><li id=\"ed3527fb-4b46-4664-84fa-cbf5179e8f4a\">`Judea Pearl, \u201cFrom Bayesian Networks to Causal Networks,\u201d in <em>Mathematical Models for Handling Partial Knowledge in Artificial Intelligence<\/em> , ed. Giulianella Coletti et al., 160. Boston, MA: Springer, 1995. <a href=\"https:\/\/doi.org\/10.1007\/978-1-4899-1424-8_9\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/doi.org\/10.1007\/978-1-4899-1424-8_9<\/a> (<a href=\"https:\/\/web.archive.org\/web\/20240804170543\/https:\/\/link.springer.com\/chapter\/10.1007\/978-1-4899-1424-8_9\" id=\"https:\/\/web.archive.org\/web\/20240804170543\/https:\/\/link.springer.com\/chapter\/10.1007\/978-1-4899-1424-8_9\" target=\"_blank\" rel=\"noreferrer noopener\">a<\/a>) <a href=\"#ed3527fb-4b46-4664-84fa-cbf5179e8f4a-link\" aria-label=\"Jump to footnote reference 13\">\u21a9\ufe0e<\/a><\/li><li id=\"4e290cb7-80ab-4a3a-b613-f62a34ce57e9\">This relationship can be causal, but it does not need to be; in this project we did not constrain conditional trees to only causal relationships, nor did we probe expert models for causality in the interviews. <a href=\"#4e290cb7-80ab-4a3a-b613-f62a34ce57e9-link\" aria-label=\"Jump to footnote reference 14\">\u21a9\ufe0e<\/a><\/li><li id=\"394cd98f-69e7-4c1e-9375-31ae48008979\">We defined \u201chigh risk\u201d as forecasting &gt;10% chance of extinction due to AI by 2100, and low risk as &lt;10%. We defined \u201clong AI timelines\u201d as forecasting &gt;30 years until transformative AI or artificial general intelligence and \u201cshort AI timelines\u201d as less than 30 years. <a href=\"#394cd98f-69e7-4c1e-9375-31ae48008979-link\" aria-label=\"Jump to footnote reference 15\">\u21a9\ufe0e<\/a><\/li><li id=\"32a1292a-1d04-40c2-a886-13b8686eab4f\">Karger et al., <a href=\"https:\/\/forecastingresearch.org\/research\/existential-risk-persuasion-tournament\" target=\"_blank\" rel=\"noreferrer noopener\">XPT report<\/a>. <a href=\"#32a1292a-1d04-40c2-a886-13b8686eab4f-link\" aria-label=\"Jump to footnote reference 16\">\u21a9\ufe0e<\/a><\/li><li id=\"1aac17fe-78e4-4520-95c4-28786b9faf48\">One interviewee is not represented in this graph because in the interview, they responded \u201c&gt;0.1%, &lt;50%\u201d rather than give a point estimate. <a href=\"#1aac17fe-78e4-4520-95c4-28786b9faf48-link\" aria-label=\"Jump to footnote reference 17\">\u21a9\ufe0e<\/a><\/li><li id=\"a746be21-a9d3-4c66-8d70-541042d2fc54\">One interviewee is not represented in this graph because in the interview, they responded \u201c&gt;0.1%, &lt;50%\u201d rather than give a point estimate. <a href=\"#a746be21-a9d3-4c66-8d70-541042d2fc54-link\" aria-label=\"Jump to footnote reference 18\">\u21a9\ufe0e<\/a><\/li><li id=\"dc215cf9-4489-4c1a-b355-db80879581cd\">Interviewers were: Tegan McCaslin (11\/24 interviews), Josh Rosenberg (10\/24 interviews), and Ezra Karger (3\/24 interviews) <a href=\"#dc215cf9-4489-4c1a-b355-db80879581cd-link\" aria-label=\"Jump to footnote reference 19\">\u21a9\ufe0e<\/a><\/li><li id=\"9c620011-1ec6-4ba0-9140-b6220171f196\">For a full description of the interview process, see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=106\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=106\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 6<\/a>. <a href=\"#9c620011-1ec6-4ba0-9140-b6220171f196-link\" aria-label=\"Jump to footnote reference 20\">\u21a9\ufe0e<\/a><\/li><li id=\"f44a301a-93ed-4572-b5bb-17245c846296\">This incentive was not explained in further detail given time constraints of the interview. <a href=\"#f44a301a-93ed-4572-b5bb-17245c846296-link\" aria-label=\"Jump to footnote reference 21\">\u21a9\ufe0e<\/a><\/li><li id=\"da58cb0f-dbfd-4329-924a-e36319728ae0\">In the XPT, \u201cExtinction\u201d was defined as \u201creduction of the global population to less than 5000,\u201d and extinction was considered \u201cdue to AI\u201d if AI was the direct or proximate cause of the deaths. This definition encompasses events that would not have occurred or would have counterfactually been extremely unlikely to occur \u201cbut for\u201d the substantial involvement of AI within one year prior to the event. For more details, see Karger et al., <a href=\"https:\/\/forecastingresearch.org\/pdf\/existential-risk-persuasion-tournament.pdf\/#page=135\" id=\"https:\/\/forecastingresearch.org\/pdf\/existential-risk-persuasion-tournament.pdf\/#page=135\" target=\"_blank\" rel=\"noreferrer noopener\">XPT Report, 134<\/a>. For some interviewees, this was a question for which they had already devoted substantial time (in the XPT or other contexts) forming a quantitative forecast, and thus such participants were able to offer a relatively quick probability judgment. Most participants had previously spent substantial time thinking about the possibility of AI-related extinction, but not as much time forming a precise quantitative estimate for the date in question, and many expressed hesitancy about their answer in the interview. <a href=\"#da58cb0f-dbfd-4329-924a-e36319728ae0-link\" aria-label=\"Jump to footnote reference 22\">\u21a9\ufe0e<\/a><\/li><li id=\"b084562c-33ab-487c-b18c-e3ef068fefa1\">These resolution years were chosen to match XPT questions. <a href=\"#b084562c-33ab-487c-b18c-e3ef068fefa1-link\" aria-label=\"Jump to footnote reference 23\">\u21a9\ufe0e<\/a><\/li><li id=\"143c5609-deba-4198-83e6-19ddfc990eb2\">Question writers were Tegan McCaslin, Taylor Smith, Josh Rosenberg, Rose Hadshar, Adam Kuzee, Ezra Karger, Arunim Agrawal, and Bridget Williams. One primary question writer was assigned to each question prompt, and would draft several different versions of the question, using the interview notes as an aid to understanding the interviewee\u2019s underlying models. These drafts would receive feedback from the rest of the question-writing team, and in particular from the relevant interviewer. This interviewer had final say over revisions and finalizing the question. <a href=\"#143c5609-deba-4198-83e6-19ddfc990eb2-link\" aria-label=\"Jump to footnote reference 24\">\u21a9\ufe0e<\/a><\/li><li id=\"eb7be1da-3caa-4a57-b161-d92e144ef5b0\">The initial screen was not simply a VOI threshold. To get a diverse question set, we wanted to include at least one question from each of the following categories: 1) high VOI for superforecasters, 2) high VOI for experts, 3) high VOD between experts and superforecasters, 4) jointly high VOI between superforecasters and experts, 5) randomly chosen representative of the bottom half of the AICT question set, and 6) top comparable question from outside the AICT set. Choosing cutoffs separately for each of these categories resulted in thirteen questions. <a href=\"#eb7be1da-3caa-4a57-b161-d92e144ef5b0-link\" aria-label=\"Jump to footnote reference 25\">\u21a9\ufe0e<\/a><\/li><li id=\"5b641fe8-818d-4fec-b3ff-f40b3002ae42\">Participants gave estimates for the probability of the question resolving positively (P(c)), and the probability of AI extinction <em>conditional<\/em> on the question resolving positively (P(U|c)). We then used these figures to calculate each respondent\u2019s VOI for each question. <a href=\"#5b641fe8-818d-4fec-b3ff-f40b3002ae42-link\" aria-label=\"Jump to footnote reference 26\">\u21a9\ufe0e<\/a><\/li><li id=\"3139a76e-57fb-46c5-a2db-53f31e54b15f\">The \u201cconcerned expert proxies\u201d were teammates or collaborators who had had extensive contact with concerned experts, who we expected to be able to model this group\u2019s views well. <a href=\"#3139a76e-57fb-46c5-a2db-53f31e54b15f-link\" aria-label=\"Jump to footnote reference 27\">\u21a9\ufe0e<\/a><\/li><li id=\"77204061-1d5c-478e-bd5c-16f116fd0fed\">Instead of giving probability judgments on all 75 questions, the concerned expert proxies chose and rank-ordered their top 10 questions from each of: the set of first-tier nodes (usually 2030); the set of second-tier nodes (2035-2050); and the set of third-tier nodes (2040-2070). They then provided short-fuse VOI judgments for only the questions they had ranked in their top 10 for each position. <a href=\"#77204061-1d5c-478e-bd5c-16f116fd0fed-link\" aria-label=\"Jump to footnote reference 28\">\u21a9\ufe0e<\/a><\/li><li id=\"30e386f4-d2b9-4385-b61d-9215a7a005ac\">For the concerned expert-proxy data, we ranked questions via ranked choice voting. We also employed the value of discrimination (VOD) metric, which measures the change in disagreement between two forecasters a question is expected to make (see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=104\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 4<\/a>). VOD was determined by the median of pairwise VOD across both skeptical superforecasters and concerned expert-proxies. We excluded questions which closely resembled other questions ranked higher, those which the question-writing team did not operationalize, and those with the lowest individual-level VOI ranking. <a href=\"#30e386f4-d2b9-4385-b61d-9215a7a005ac-link\" aria-label=\"Jump to footnote reference 29\">\u21a9\ufe0e<\/a><\/li><li id=\"d800c7e2-8c39-4bcb-be62-cb151e9ff6ea\">The filtered question set included the following questions. See <a href=\"#tab-3-1-3\" id=\"#tab-3-1-3\">Table 3.1.3<\/a> for concise question summaries. Node 1 (dates up to 2030): <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\" target=\"_blank\" rel=\"noreferrer noopener\">NG30<\/a>, respectively: The VOI top-ranked node for skeptical superforecasters; the VOI top-ranked node for concerned expert proxies (also ranked 2nd for VOD); the VOI 2nd-ranked node for concerned expert proxies; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a>: The top-ranked node for VOD; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\" target=\"_blank\" rel=\"noreferrer noopener\">ZD30<\/a>: Included for having relatively good agreement on high VOI between groups; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>: Randomly chosen from the set of nodes ranked in the bottom half by both groups, as a check on the validity of the filtering process; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\" target=\"_blank\" rel=\"noreferrer noopener\">STQ9<\/a>: A question from outside our question set, the most-upvoted AI question on Metaculus resolving around 2030. Node 2 (2031 &#8211; 2070): <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">EX50<\/a>, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>, respectively: The VOI top-ranked node for skeptical superforecasters; the VOI top-ranked node for concerned expert proxies; the VOI 2nd-ranked node for concerned expert proxies; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\" target=\"_blank\" rel=\"noreferrer noopener\">CX70<\/a>: The top-ranked VOD node; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>: Randomly chosen from the set of nodes ranked in the bottom half by both groups; <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\" target=\"_blank\" rel=\"noreferrer noopener\">STQ247<\/a>: The most-upvoted AI question on Metaculus resolving post-2030. <a href=\"#d800c7e2-8c39-4bcb-be62-cb151e9ff6ea-link\" aria-label=\"Jump to footnote reference 30\">\u21a9\ufe0e<\/a><\/li><li id=\"86bd0efc-0ce7-4b31-9824-28cf142754aa\">The eight superforecasters in this sample took part in FRI\u2019s Adversarial Collaboration project that brought together generalist forecasters and domain experts with divergent views on AI\u2019s long-term risks to humanity. See Forecasting Research Institute, <em><a href=\"https:\/\/forecastingresearch.org\/research\/roots-of-disagreement-on-ai-risk\" id=\"1680\" target=\"_blank\" rel=\"noreferrer noopener\">Roots of Disagreement on AI Risk: Exploring the Potential and Pitfalls of Adversarial Collaboration<\/a><\/em> (2024) (<a href=\"https:\/\/web.archive.org\/web\/20240727230248\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/65ef1ee52e64b52f145ebb49\/1710169832137\/AIcollaboration.pdf\" id=\"https:\/\/web.archive.org\/web\/20240727230248\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/65ef1ee52e64b52f145ebb49\/1710169832137\/AIcollaboration.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">a<\/a>). <a href=\"#86bd0efc-0ce7-4b31-9824-28cf142754aa-link\" aria-label=\"Jump to footnote reference 31\">\u21a9\ufe0e<\/a><\/li><li id=\"3ff8a5d8-5f64-4156-b481-d4f1dadd5d79\">For more detail on the selection pool, see <a href=\"https:\/\/forecastingresearch.org\/research\/roots-of-disagreement-on-ai-risk\" id=\"https:\/\/forecastingresearch.org\/research\/roots-of-disagreement-on-ai-risk\" target=\"_blank\" rel=\"noreferrer noopener\">Roots of Disagreement on AI Risk<\/a>. The \u201cAI-concerned\u201d expert for this project consisted of domain experts referred to us by our funder and the broader effective altruism community. <a href=\"#3ff8a5d8-5f64-4156-b481-d4f1dadd5d79-link\" aria-label=\"Jump to footnote reference 32\">\u21a9\ufe0e<\/a><\/li><li id=\"c37b37f6-4559-4391-90a3-424dbe8b0425\">Respondents were not able to revise this forecast later in the survey. <a href=\"#c37b37f6-4559-4391-90a3-424dbe8b0425-link\" aria-label=\"Jump to footnote reference 33\">\u21a9\ufe0e<\/a><\/li><li id=\"1c7539de-7fa2-4eb2-af5a-b1d709260d6e\">Once forecasters submitted their answers on a question, the survey checked for coherence, and then prompted the respondent to revise their answers if the coherence condition was not met. Coherence requires that P(U) &gt; P(U|c)P(c), where P(U) is the forecaster\u2019s probability of the ultimate question U resolving positively, P(U|c) is the probability of U resolving positively if the crux c resolves positively, and P(c) is the probability of the crux resolving positively. This coherence prompt was not repeated on a question if the respondent failed to give coherent revised answers on that question. For any answers which remained incoherent after the respondent finished the survey, we followed up and requested revision. <a href=\"#1c7539de-7fa2-4eb2-af5a-b1d709260d6e-link\" aria-label=\"Jump to footnote reference 34\">\u21a9\ufe0e<\/a><\/li><li id=\"796e78fb-b101-4a96-a470-75e287ba1d38\">Due to coding errors in an early version of the survey, not all participants were given an opportunity to review their answers in the survey. We instead asked such participants to manually review their answers afterward. <a href=\"#796e78fb-b101-4a96-a470-75e287ba1d38-link\" aria-label=\"Jump to footnote reference 35\">\u21a9\ufe0e<\/a><\/li><li id=\"225fd0f8-1486-480a-ac77-0f00898db949\">The two questions added to the supplementary survey were <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a> and <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>. See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 1<\/a> for full question descriptions. <a href=\"#225fd0f8-1486-480a-ac77-0f00898db949-link\" aria-label=\"Jump to footnote reference 36\">\u21a9\ufe0e<\/a><\/li><li id=\"2ea4ff60-61c5-47e7-bd25-e3b893a6e335\">For details on the selection criteria, see <a href=\"#candidate-high-voi-trees-from-two-camps\">Section 3.2<\/a>. A z-score indicates how many standard deviations an observation is from the mean and in which direction. David S. Moore, George P. McCabe, and Bruce A. Craig, <em>Introduction to the Practice of Statistics<\/em> , 6th ed. (New York: W. H. Freeman and Company, 2009), 61. <a href=\"#2ea4ff60-61c5-47e7-bd25-e3b893a6e335-link\" aria-label=\"Jump to footnote reference 37\">\u21a9\ufe0e<\/a><\/li><li id=\"a8b2606f-13ca-4cbd-a9e7-3804cf06b584\">8 superforecasters (7-8 respondents per question) and 11 domain experts (4-6 respondents per question). <a href=\"#a8b2606f-13ca-4cbd-a9e7-3804cf06b584-link\" aria-label=\"Jump to footnote reference 38\">\u21a9\ufe0e<\/a><\/li><li id=\"c2d3bf2e-0129-446a-8ad6-69d827c4f894\">Most questions in the main question-rating survey were selected based on high scores from either superforecasters or expert \u201cproxy\u201d judges, or both. However, two questions, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a> and <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\" target=\"_blank\" rel=\"noreferrer noopener\">HS50<\/a>, were randomly selected from the intersection of the bottom half of superforecaster and expert proxy scores. While these questions ranked poorly among superforecasters in the main survey, <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a> notably received the second-highest score from experts. Overall, the correlation between expert \u201cproxy\u201d scores and expert scores in the main question-rating round was weak. <a href=\"#c2d3bf2e-0129-446a-8ad6-69d827c4f894-link\" aria-label=\"Jump to footnote reference 39\">\u21a9\ufe0e<\/a><\/li><li id=\"457d57d4-7074-4e27-ba31-a3cfdda6a74c\">For a description of Kullback-Leibler VOI, see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=104\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 4<\/a>: VOI technical explanation. <a href=\"#457d57d4-7074-4e27-ba31-a3cfdda6a74c-link\" aria-label=\"Jump to footnote reference 40\">\u21a9\ufe0e<\/a><\/li><li id=\"50a3568c-4aaa-4010-87cf-270fd37074d8\">The advantages of POM over straight VOI are (i) it is more interpretable; and (ii) it does not penalize respondents with low prior probability P(U). The size of the update is constrained by the prior probability P(U) together with the probability of the crux event P(c) to be less than P(U) \/ P(c). <a href=\"#50a3568c-4aaa-4010-87cf-270fd37074d8-link\" aria-label=\"Jump to footnote reference 41\">\u21a9\ufe0e<\/a><\/li><li id=\"ee57fe49-3195-462b-b7d4-da0c326ba5b5\">In the supplementary survey (see <a href=\"#voi-comparison\" id=\"#voi-comparison\">Section 4.1<\/a>), two superforecasters updated their forecasts slightly, resulting in an average P(U) of 0.26%. <a href=\"#ee57fe49-3195-462b-b7d4-da0c326ba5b5-link\" aria-label=\"Jump to footnote reference 42\">\u21a9\ufe0e<\/a><\/li><li id=\"504d98f7-8629-43b3-9132-3194a74cf216\">The goal was to choose the most informative questions. The initial selection criteria were to choose the top-ranked question by POM and POM-z for questions resolving in 2030 and 2050-2070 separately, including both where these disagreed. For 2030, we chose <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CX30<\/a> (highest POM) and <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\" target=\"_blank\" rel=\"noreferrer noopener\">CQ30<\/a> (highest POM-z). For 2050-2070, we chose <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a> based on it having the highest POM. While the selection criteria suggested that <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a> should be selected as the top POM-z question, as a whole the evidence pointed to <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\" target=\"_blank\" rel=\"noreferrer noopener\">ZA50<\/a> being more informative (higher POM, at 1.59% vs 0.54%; POM-z close to <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\" target=\"_blank\" rel=\"noreferrer noopener\">VL70<\/a>, at 0.53 vs 0.67; and higher under the pairwise wins robustness check, at 87% vs 64%). <a href=\"#504d98f7-8629-43b3-9132-3194a74cf216-link\" aria-label=\"Jump to footnote reference 43\">\u21a9\ufe0e<\/a><\/li><li id=\"94b6adc5-54cf-4da2-8183-6949b68a7581\">Careful readers will note that the probabilities in this figure do not yield the mean POM VOI values we report (see Tables <a href=\"#tab-3-4-1\" id=\"#tab-3-4-1\">3.4.1<\/a> and <a href=\"#tab-3-4-2\">3.4.2<\/a>). Mean POM VOI tells us how valuable a crux is for a group, on average, by computing POM VOI at the individual level and then aggregating. The average relative updates, across individuals in the same group, sometimes tells a quite different story. <a href=\"#94b6adc5-54cf-4da2-8183-6949b68a7581-link\" aria-label=\"Jump to footnote reference 44\">\u21a9\ufe0e<\/a><\/li><li id=\"52f26f06-7eb3-4450-a347-dc695b3df8d8\">While an extreme data point could typically indicate a coding error, the subcomponents of VOI analysis suggest a genuine answer rather than a common error such as a misplaced decimal. The outlier respondent assigned a low probability (0.5%) to the \u201cadministrative disempowerment warning shot\u201d scenario, but provided a substantial update (a 100-fold increase, from 0.1% to 10%) toward AI extinction if the scenario were to occur. In contrast, all other respondents thought the probability of it occurring was higher (mean=18%), but offered smaller updates than the outlying respondent (mean = 1x, with three updating not at all and one updating down). <a href=\"#52f26f06-7eb3-4450-a347-dc695b3df8d8-link\" aria-label=\"Jump to footnote reference 45\">\u21a9\ufe0e<\/a><\/li><li id=\"3308dac0-2e96-4bf8-b052-a5242d45c786\">Or, indeed, the motivations of a misaligned AI system with access to weaponizable technology. <a href=\"#3308dac0-2e96-4bf8-b052-a5242d45c786-link\" aria-label=\"Jump to footnote reference 46\">\u21a9\ufe0e<\/a><\/li><li id=\"d335c71e-1743-483e-a771-9cf0b58b9960\">Interquartile range (IQR) is the middle 50%, or the difference between the 25th and 75th percentile forecasts. <a href=\"#d335c71e-1743-483e-a771-9cf0b58b9960-link\" aria-label=\"Jump to footnote reference 47\">\u21a9\ufe0e<\/a><\/li><li id=\"3fae1832-ff19-4949-8b6b-46492cd511ff\">The outlier respondent assigns a low probability to the question (5%), but updates substantially (relative risk = 3x), while on average respondents rated the question as having moderate probability (mean=37%) and a moderate relative risk (mean=1.9x). <a href=\"#3fae1832-ff19-4949-8b6b-46492cd511ff-link\" aria-label=\"Jump to footnote reference 48\">\u21a9\ufe0e<\/a><\/li><li id=\"ba0ea008-1455-4f63-a334-49975b362690\">Proxy ratings for 2030 questions showed strong negative correlation with POM VOI judgments from the small sample of experts in the main survey. They also showed slight negative correlation with the main survey POM-z. Notably, a question randomly chosen from the bottom half of proxy scores ranked second by expert POM (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\" target=\"_blank\" rel=\"noreferrer noopener\">EX30<\/a>). This suggests that many questions from our larger 2030 set might have performed better than the average question in our main question-rating survey if presented to these particular experts. <a href=\"#ba0ea008-1455-4f63-a334-49975b362690-link\" aria-label=\"Jump to footnote reference 49\">\u21a9\ufe0e<\/a><\/li><li id=\"d73bb912-46b5-4ac7-8c40-901425b9b182\">The 2050\/2070 proxy performed moderately well for our small expert sample, with a correlation between mean expert POM and proxy rank of -0.4, and mean expert POM-z score and proxy rank of -0.5 (a more negative value indicates a stronger correlation, as higher rank orders are considered worse, while higher VOI scores are better). <a href=\"#d73bb912-46b5-4ac7-8c40-901425b9b182-link\" aria-label=\"Jump to footnote reference 50\">\u21a9\ufe0e<\/a><\/li><li id=\"10c48bb4-79b0-42e6-a192-9236ceca0466\">For example, the top five questions on Metaculus at the time of this writing (July 25, 2024), are \u201cWho will be elected US president in 2024?\u201d; \u201cFive years after AGI, will an AI company be a military power?\u201d; \u201cFive years after AGI, if there are digital people, what will be their population?\u201d; \u201cWho will be the Democratic nominee for Vice President on Election Day 2024 (if Joe Biden is no longer the nominee for President)?\u201d and \u201cWhen will an AI win a Gold Medal in the International Math Olympiad?\u201d Of those, only \u201cWhen will an AI win a Gold Medal in the International Math Olympiad?\u201d seems to be interesting primarily because it is an indicator about a more important question. <a href=\"#10c48bb4-79b0-42e6-a192-9236ceca0466-link\" aria-label=\"Jump to footnote reference 51\">\u21a9\ufe0e<\/a><\/li><li id=\"43cff954-4b3f-4059-914a-b20d2877bc75\">Out of the 265 questions in our status quo set, 253 of them (~95%) came from just two platforms: Metaculus and Manifold Markets. We included in our set all questions resolving no earlier than 2027, and which were tagged \u201cAI,\u201d \u201cartificial intelligence,\u201d \u201cmachine learning,\u201d or similar. Because Manifold Markets had a very large overall volume of questions, and because many questions with little engagement on this platform were duplicates of other questions, or otherwise low-quality, we only included Manifold Markets questions which had at least 50 traders at the time of collection. <a href=\"#43cff954-4b3f-4059-914a-b20d2877bc75-link\" aria-label=\"Jump to footnote reference 52\">\u21a9\ufe0e<\/a><\/li><li id=\"3b6caa56-c3da-44a1-890c-adf30fabc3d0\">Questions can fall into multiple categories. <a href=\"#3b6caa56-c3da-44a1-890c-adf30fabc3d0-link\" aria-label=\"Jump to footnote reference 53\">\u21a9\ufe0e<\/a><\/li><li id=\"22b58859-8e48-4a06-b61f-93aa97393620\">Example questions in each category (many questions fall into multiple categories): <br><strong>Acceleration:<\/strong> Deep learning revenue (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\" target=\"_blank\" rel=\"noreferrer noopener\">VL30<\/a>)\u2014Revenue from deep learning doubles every two years before 2030. <br><strong>Social \/ Political \/ Economic:<\/strong> AI Socializing (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=85\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=85\" target=\"_blank\" rel=\"noreferrer noopener\">MQ70<\/a>)\u2014humans talk to AIs more than to humans by 2070. <br><strong>Alignment:<\/strong> No interpretability progress (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=77\" target=\"_blank\" rel=\"noreferrer noopener\">ZA40c<\/a>)\u2014by 2040, there are no interpretability tools which allow us to understand the function of state-of-the-art transformer component parts\/circuits.<br><strong>AI harms:<\/strong> Repeated AI harms (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=76\" id=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=76\" target=\"_blank\" rel=\"noreferrer noopener\">HS40<\/a>)\u2014by 2040, there are at least two events in a five-year period in which an AI system used by a major company causes at least 1,000 deaths or damage of $10B. <a href=\"#22b58859-8e48-4a06-b61f-93aa97393620-link\" aria-label=\"Jump to footnote reference 54\">\u21a9\ufe0e<\/a><\/li><li id=\"e92b2983-5493-48ad-82e6-56c5268bd965\">We chose the \u201cAlignment\u201d category because it was much more prevalent in the AICT set than in the status quo set, suggesting that the questions in that category may be unique in interesting ways. <a href=\"#e92b2983-5493-48ad-82e6-56c5268bd965-link\" aria-label=\"Jump to footnote reference 55\">\u21a9\ufe0e<\/a><\/li><li id=\"452e355e-3bc4-4c32-8089-684d98657bdd\">The 25 questions in the AICT set were divided into different themes for analysis of uniqueness, some of which overlapped. <a href=\"#452e355e-3bc4-4c32-8089-684d98657bdd-link\" aria-label=\"Jump to footnote reference 56\">\u21a9\ufe0e<\/a><\/li><li id=\"2c5fd0c1-116f-414b-9616-a83afcb3710b\">The theme \u201cpower-seeking\u201d covers questions about AI models developing power-seeking or deceptive behavior; the theme \u201cdeveloper perception\u201d covers questions about AI developers\u2019 perception of alignment work. See <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=95\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=95\" target=\"_blank\" rel=\"noreferrer noopener\"><a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=97\" target=\"_blank\" rel=\"noreferrer noopener\">Appendix 3.2<\/a><\/a> for additional information about categorization into themes. <a href=\"#2c5fd0c1-116f-414b-9616-a83afcb3710b-link\" aria-label=\"Jump to footnote reference 57\">\u21a9\ufe0e<\/a><\/li><li id=\"7b29920b-e450-4202-9a40-834e6210314d\">Because AICT questions are often complex or technical, we suspect they may be less fun to forecast and therefore attract fewer participants, though this is untested. As an inexpensive experiment, we are posting these questions to two forecasting platforms to see whether they get engagement. We encourage readers to see <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=113\" id=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=113\">Appendix 7<\/a> for further details on how you can submit your own forecasts on these questions. <a href=\"#7b29920b-e450-4202-9a40-834e6210314d-link\" aria-label=\"Jump to footnote reference 58\">\u21a9\ufe0e<\/a><\/li><li id=\"78d4749e-b022-4635-bbf2-2efc3c10648a\">Questions relating to concrete harms also featured in all three interviews with superforecasters, though this very small sample size makes it difficult to draw any conclusions about superforecasters\u2019 concerns in general. <a href=\"#78d4749e-b022-4635-bbf2-2efc3c10648a-link\" aria-label=\"Jump to footnote reference 59\">\u21a9\ufe0e<\/a><\/li><li id=\"5f7eb7bd-e9af-4559-9f51-64107e91deb5\">The careful reader will notice that the values in this column don\u2019t match those found in Tables <a href=\"#tab-3-1-2\" id=\"#tab-3-1-2\">3.1.2<\/a>, <a href=\"#tab-3-3-1\">3.3.1<\/a> and <a href=\"#tab-3-3-2\" id=\"#tab-3-3-6\">3.3.2<\/a>. This is because the two additional questions (<a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\" target=\"_blank\" rel=\"noreferrer noopener\">HB30<\/a> and <a href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\" target=\"_blank\" rel=\"noreferrer noopener\">CX50<\/a>) forecasted on by superforecasters are not included in the calculation of z-scores here. <a href=\"#5f7eb7bd-e9af-4559-9f51-64107e91deb5-link\" aria-label=\"Jump to footnote reference 60\">\u21a9\ufe0e<\/a><\/li><li id=\"2bd98e8d-bd99-48de-a9f3-6288ead09b62\">For example, in the Good Judgment Inc. project that compared superforecasters to other participants in an online forecasting competition, the average question was open for 214 days, with the entire tournament taking place over six years. Christopher W. Karvetski, \u201cSuperforecasters: A Decade of Stochastic Dominance,\u201d technical white paper (2021): 2, <a href=\"https:\/\/goodjudgment.com\/wp-content\/uploads\/2021\/10\/Superforecasters-A-Decade-of-Stochastic-Dominance.pdf\">https:\/\/goodjudgment.com\/wp-content\/uploads\/2021\/10\/Superforecasters-A-Decade-of-Stochastic-Dominance.pdf<\/a>. In addition to extensive research on shorter-term forecasts, Tetlock et al. found that, at least on some types of questions, experts are more accurate than simple base rate extrapolation over 25 year horizons, although they are much less accurate than they were over 0-2 years. Our research asks forecasters to consider forecasts over many decades, and we do not yet know how much accuracy declines over that much longer period. Philip E. Tetlock et al., \u201c<a href=\"https:\/\/forecastingresearch.org\/research\/longrange-subjective-probability-forecasts-of-slowmotion-variables-in-world-politics\" id=\"163\" target=\"_blank\" rel=\"noreferrer noopener\">Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment<\/a>,\u201d <em>Futures &amp; Foresight Science<\/em> (2023), 33. <a href=\"#2bd98e8d-bd99-48de-a9f3-6288ead09b62-link\" aria-label=\"Jump to footnote reference 61\">\u21a9\ufe0e<\/a><\/li><\/ol>\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"btn orange\" href=\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\" target=\"_blank\" rel=\"noreferrer noopener\">The Appendices are provided in the full PDF Report <svg width=\"7\" height=\"9\" viewBox=\"0 0 7 9\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.000156283 8.60806L4.22416 4.33606V4.24006L0.000156283 6.10352e-05H1.80816L6.06416 4.28806L1.80816 8.60806H0.000156283Z\" fill=\"#102B23\"\/>\n<\/svg>\n<svg width=\"8\" height=\"10\" viewBox=\"0 0 8 10\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.601719 8.85794L4.82572 4.58594V4.48994L0.601719 0.249939H2.40972L6.66572 4.53794L2.40972 8.85794H0.601719Z\" fill=\"#102B23\"\/>\n<\/svg><\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"In this study, we used structured interviews with domain experts and superforecasters to generate questions that provide high &#8220;value of information&#8221; regarding a far-future outcome.","protected":false},"featured_media":1243,"template":"","meta":{"footnotes":"[{\"id\":\"70fde7b6-0fde-4599-9c2d-11d3813282a7\",\"content\":\"We will refer to this set of forecasters as \u201csuperforecasters\u201d henceforth. Note that while seven of the forecasters are Superforecasters \u2122 as officially designated by Good Judgment Inc., one is a skilled forecaster who does not have that label but has a comparable track record of calibrated forecasts.\"},{\"id\":\"6a0fde15-496f-4006-8d0a-e475bac7a3e3\",\"content\":\"To ensure the integrity of links in this report, we include stable archive.org links in parentheses after each citation to an external URL.\"},{\"id\":\"d2fbd3d6-4e36-4a09-8929-799e1c840251\",\"content\":\"More specifically, the ultimate question was defined as the global human population falling below 5,000 individuals at any time before 2100, with AI being a proximate cause of such reduction.\"},{\"id\":\"b5003308-fdda-4da9-9b27-b48cbaa7fdf2\",\"content\":\"\u201cPlausible\u201d meaning that the forecaster deemed the indicator event to be at least 10% likely to occur. This 10% probability was not necessarily an unconditional probability, but may have been conditional on a previous node in the conditional tree.\"},{\"id\":\"d32de3b7-6235-47f8-a676-3de84c95b8e3\",\"content\":\"By \u201cinformative,\u201d we mean that knowing the answer to one of these questions would make a larger difference, in expectation, to a participants\u2019 forecast of the ultimate question, in this case, \u201cWill AI cause human extinction by 2100.\u201d For more on informativeness and the metric we use to assess it, see the section on <a href=\\\"#value-of-information-results\\\" id=\\\"#value-of-information-results\\\">Value of Information (VOI)<\/a> . Forecasting platforms are generally focused on making accurate predictions by aggregating many people\u2019s forecasts and usually allow participants to choose which questions to forecast. The questions that are popular on forecasting platforms are often questions that are important in themselves, more than as indicators of other events, and the platforms are not deliberately attempting to find high VOI questions.\"},{\"id\":\"af4cda98-41b4-41c2-a145-96b1e4d40348\",\"content\":\"For more on the question filtering process, see <a href=\\\"#judging-questions-and-constructing-aggregate-trees\\\" id=\\\"1242\\\">Section 2.2<\/a>.\"},{\"content\":\"The four lowest-scoring AICT questions \u2013 <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=78\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX50<\/a>, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">HS50<\/a>, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=71\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">NG30<\/a>, and <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=66\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX30<\/a> \u2013 ranked 12th, 13th, 14th, and 20th out of 23, respectively.\",\"id\":\"682d449b-dfd9-4422-a922-275153246709\"},{\"content\":\"At the time of data collection, we had not yet developed the POM VOI metric, so participants were not deliberately optimizing for it. Later, we found that POM VOI captured the idea of question informativeness better than VOI alone, which yields a number that is hard to interpret and contextualize. For a full list of questions analyzed, see <a href=\\\"#tab-3-1-3\\\" id=\\\"#tab-3-1-3\\\">Table 3.1.3<\/a> . A comprehensive explanation of the POM VOI metric can be found in <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Appendix 4<\/a>.\",\"id\":\"6688d111-6e2b-4a8c-8a28-2f4224a7caa9\"},{\"id\":\"32da7651-60fa-46b9-8bde-e13f84cbf288\",\"content\":\"Careful readers will note that the probabilities in this figure do not yield the mean POM VOI values we report (see <a href=\\\"#tab-e-1\\\" id=\\\"#executive-summary-results\\\">Table E.1<\/a>). Mean POM VOI tells us how valuable a crux is for a group, on average, by computing POM VOI at the individual level and then aggregating. The average relative updates, across individuals in the same group, sometimes tell a quite different story.\"},{\"id\":\"6118b2f7-6787-4dfd-afe5-c07c40a45373\",\"content\":\"Several related methods, such as Delphi and Bayesian Network elicitation, may be useful to forecasting research in similar ways. See Bernice B. Brown, \u201cDelphi Process: A Methodology Used for the Elicitation of Opinions of Experts,\u201d Rand Corporation report (September 1968) and Judea Pearl, <em>Probabilistic Reasoning in Intelligent Systems<\/em> , (New York, Morgan-Kaufman: 1998).\"},{\"content\":\"Karger et al., \u201cForecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament,\u201d 2023. <a href=\\\"https:\/\/forecastingresearch.org\/research\/existential-risk-persuasion-tournament\\\" id=\\\"876\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">https:\/\/forecastingresearch.org\/research\/existential-risk-persuasion-tournament<\/a> (<a href=\\\"https:\/\/web.archive.org\/web\/20240803193928\/https:\/\/forecastingresearch.org\/xpt\\\" id=\\\"https:\/\/web.archive.org\/web\/20240803193928\/https:\/\/forecastingresearch.org\/xpt\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">a<\/a>) (XPT report).\",\"id\":\"21f37ef0-875f-40d5-8ee5-697b5e11a2e3\"},{\"id\":\"92fe9899-6054-4948-89c7-3ce49c8e1155\",\"content\":\"These numbers are intended to be illustrative and are not based on actual vaccine data.\"},{\"id\":\"ed3527fb-4b46-4664-84fa-cbf5179e8f4a\",\"content\":\"`Judea Pearl, \u201cFrom Bayesian Networks to Causal Networks,\u201d in <em>Mathematical Models for Handling Partial Knowledge in Artificial Intelligence<\/em> , ed. Giulianella Coletti et al., 160. Boston, MA: Springer, 1995. <a href=\\\"https:\/\/doi.org\/10.1007\/978-1-4899-1424-8_9\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">https:\/\/doi.org\/10.1007\/978-1-4899-1424-8_9<\/a> (<a href=\\\"https:\/\/web.archive.org\/web\/20240804170543\/https:\/\/link.springer.com\/chapter\/10.1007\/978-1-4899-1424-8_9\\\" id=\\\"https:\/\/web.archive.org\/web\/20240804170543\/https:\/\/link.springer.com\/chapter\/10.1007\/978-1-4899-1424-8_9\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">a<\/a>)\"},{\"id\":\"4e290cb7-80ab-4a3a-b613-f62a34ce57e9\",\"content\":\"This relationship can be causal, but it does not need to be; in this project we did not constrain conditional trees to only causal relationships, nor did we probe expert models for causality in the interviews.\"},{\"id\":\"394cd98f-69e7-4c1e-9375-31ae48008979\",\"content\":\"We defined \u201chigh risk\u201d as forecasting &gt;10% chance of extinction due to AI by 2100, and low risk as &lt;10%. We defined \u201clong AI timelines\u201d as forecasting &gt;30 years until transformative AI or artificial general intelligence and \u201cshort AI timelines\u201d as less than 30 years.\"},{\"content\":\"Karger et al., <a href=\\\"https:\/\/forecastingresearch.org\/research\/existential-risk-persuasion-tournament\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">XPT report<\/a>.\",\"id\":\"32a1292a-1d04-40c2-a886-13b8686eab4f\"},{\"id\":\"1aac17fe-78e4-4520-95c4-28786b9faf48\",\"content\":\"One interviewee is not represented in this graph because in the interview, they responded \u201c&gt;0.1%, &lt;50%\u201d rather than give a point estimate.\"},{\"id\":\"a746be21-a9d3-4c66-8d70-541042d2fc54\",\"content\":\"One interviewee is not represented in this graph because in the interview, they responded \u201c&gt;0.1%, &lt;50%\u201d rather than give a point estimate.\"},{\"id\":\"dc215cf9-4489-4c1a-b355-db80879581cd\",\"content\":\"Interviewers were: Tegan McCaslin (11\/24 interviews), Josh Rosenberg (10\/24 interviews), and Ezra Karger (3\/24 interviews)\"},{\"content\":\"For a full description of the interview process, see <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=106\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=106\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Appendix 6<\/a>.\",\"id\":\"9c620011-1ec6-4ba0-9140-b6220171f196\"},{\"id\":\"f44a301a-93ed-4572-b5bb-17245c846296\",\"content\":\"This incentive was not explained in further detail given time constraints of the interview.\"},{\"content\":\"In the XPT, \u201cExtinction\u201d was defined as \u201creduction of the global population to less than 5000,\u201d and extinction was considered \u201cdue to AI\u201d if AI was the direct or proximate cause of the deaths. This definition encompasses events that would not have occurred or would have counterfactually been extremely unlikely to occur \u201cbut for\u201d the substantial involvement of AI within one year prior to the event. For more details, see Karger et al., <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/existential-risk-persuasion-tournament.pdf\/#page=135\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/existential-risk-persuasion-tournament.pdf\/#page=135\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">XPT Report, 134<\/a>. For some interviewees, this was a question for which they had already devoted substantial time (in the XPT or other contexts) forming a quantitative forecast, and thus such participants were able to offer a relatively quick probability judgment. Most participants had previously spent substantial time thinking about the possibility of AI-related extinction, but not as much time forming a precise quantitative estimate for the date in question, and many expressed hesitancy about their answer in the interview.\",\"id\":\"da58cb0f-dbfd-4329-924a-e36319728ae0\"},{\"id\":\"b084562c-33ab-487c-b18c-e3ef068fefa1\",\"content\":\"These resolution years were chosen to match XPT questions.\"},{\"id\":\"143c5609-deba-4198-83e6-19ddfc990eb2\",\"content\":\"Question writers were Tegan McCaslin, Taylor Smith, Josh Rosenberg, Rose Hadshar, Adam Kuzee, Ezra Karger, Arunim Agrawal, and Bridget Williams. One primary question writer was assigned to each question prompt, and would draft several different versions of the question, using the interview notes as an aid to understanding the interviewee\u2019s underlying models. These drafts would receive feedback from the rest of the question-writing team, and in particular from the relevant interviewer. This interviewer had final say over revisions and finalizing the question.\"},{\"id\":\"eb7be1da-3caa-4a57-b161-d92e144ef5b0\",\"content\":\"The initial screen was not simply a VOI threshold. To get a diverse question set, we wanted to include at least one question from each of the following categories: 1) high VOI for superforecasters, 2) high VOI for experts, 3) high VOD between experts and superforecasters, 4) jointly high VOI between superforecasters and experts, 5) randomly chosen representative of the bottom half of the AICT question set, and 6) top comparable question from outside the AICT set. Choosing cutoffs separately for each of these categories resulted in thirteen questions.\"},{\"id\":\"5b641fe8-818d-4fec-b3ff-f40b3002ae42\",\"content\":\"Participants gave estimates for the probability of the question resolving positively (P(c)), and the probability of AI extinction <em>conditional<\/em> on the question resolving positively (P(U|c)). We then used these figures to calculate each respondent\u2019s VOI for each question.\"},{\"id\":\"3139a76e-57fb-46c5-a2db-53f31e54b15f\",\"content\":\"The \u201cconcerned expert proxies\u201d were teammates or collaborators who had had extensive contact with concerned experts, who we expected to be able to model this group\u2019s views well.\"},{\"id\":\"77204061-1d5c-478e-bd5c-16f116fd0fed\",\"content\":\"Instead of giving probability judgments on all 75 questions, the concerned expert proxies chose and rank-ordered their top 10 questions from each of: the set of first-tier nodes (usually 2030); the set of second-tier nodes (2035-2050); and the set of third-tier nodes (2040-2070). They then provided short-fuse VOI judgments for only the questions they had ranked in their top 10 for each position.\"},{\"content\":\"For the concerned expert-proxy data, we ranked questions via ranked choice voting. We also employed the value of discrimination (VOD) metric, which measures the change in disagreement between two forecasters a question is expected to make (see <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=104\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Appendix 4<\/a>). VOD was determined by the median of pairwise VOD across both skeptical superforecasters and concerned expert-proxies. We excluded questions which closely resembled other questions ranked higher, those which the question-writing team did not operationalize, and those with the lowest individual-level VOI ranking.\",\"id\":\"30e386f4-d2b9-4385-b61d-9215a7a005ac\"},{\"content\":\"The filtered question set included the following questions. See <a href=\\\"#tab-3-1-3\\\" id=\\\"#tab-3-1-3\\\">Table 3.1.3<\/a> for concise question summaries. Node 1 (dates up to 2030): <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CQ30<\/a>, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">VL30<\/a>, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=71\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">NG30<\/a>, respectively: The VOI top-ranked node for skeptical superforecasters; the VOI top-ranked node for concerned expert proxies (also ranked 2nd for VOD); the VOI 2nd-ranked node for concerned expert proxies; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CX30<\/a>: The top-ranked node for VOD; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=73\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">ZD30<\/a>: Included for having relatively good agreement on high VOI between groups; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX30<\/a>: Randomly chosen from the set of nodes ranked in the bottom half by both groups, as a check on the validity of the filtering process; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=88\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">STQ9<\/a>: A question from outside our question set, the most-upvoted AI question on Metaculus resolving around 2030. Node 2 (2031 - 2070): <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">ZA50<\/a>, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX50<\/a>, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">VL70<\/a>, respectively: The VOI top-ranked node for skeptical superforecasters; the VOI top-ranked node for concerned expert proxies; the VOI 2nd-ranked node for concerned expert proxies; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=83\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CX70<\/a>: The top-ranked VOD node; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">HS50<\/a>: Randomly chosen from the set of nodes ranked in the bottom half by both groups; <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=89\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">STQ247<\/a>: The most-upvoted AI question on Metaculus resolving post-2030.\",\"id\":\"d800c7e2-8c39-4bcb-be62-cb151e9ff6ea\"},{\"content\":\"The eight superforecasters in this sample took part in FRI\u2019s Adversarial Collaboration project that brought together generalist forecasters and domain experts with divergent views on AI\u2019s long-term risks to humanity. See Forecasting Research Institute, <em><a href=\\\"https:\/\/forecastingresearch.org\/research\/roots-of-disagreement-on-ai-risk\\\" type=\\\"research\\\" id=\\\"1680\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Roots of Disagreement on AI Risk: Exploring the Potential and Pitfalls of Adversarial Collaboration<\/a><\/em> (2024) (<a href=\\\"https:\/\/web.archive.org\/web\/20240727230248\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/65ef1ee52e64b52f145ebb49\/1710169832137\/AIcollaboration.pdf\\\" id=\\\"https:\/\/web.archive.org\/web\/20240727230248\/https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/65ef1ee52e64b52f145ebb49\/1710169832137\/AIcollaboration.pdf\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">a<\/a>).\",\"id\":\"86bd0efc-0ce7-4b31-9824-28cf142754aa\"},{\"content\":\"For more detail on the selection pool, see <a href=\\\"https:\/\/forecastingresearch.org\/research\/roots-of-disagreement-on-ai-risk\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/research\/roots-of-disagreement-on-ai-risk\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Roots of Disagreement on AI Risk<\/a>. The \u201cAI-concerned\u201d expert for this project consisted of domain experts referred to us by our funder and the broader effective altruism community.\",\"id\":\"3ff8a5d8-5f64-4156-b481-d4f1dadd5d79\"},{\"id\":\"c37b37f6-4559-4391-90a3-424dbe8b0425\",\"content\":\"Respondents were not able to revise this forecast later in the survey.\"},{\"id\":\"1c7539de-7fa2-4eb2-af5a-b1d709260d6e\",\"content\":\"Once forecasters submitted their answers on a question, the survey checked for coherence, and then prompted the respondent to revise their answers if the coherence condition was not met. Coherence requires that P(U) &gt; P(U|c)P(c), where P(U) is the forecaster\u2019s probability of the ultimate question U resolving positively, P(U|c) is the probability of U resolving positively if the crux c resolves positively, and P(c) is the probability of the crux resolving positively. This coherence prompt was not repeated on a question if the respondent failed to give coherent revised answers on that question. For any answers which remained incoherent after the respondent finished the survey, we followed up and requested revision.\"},{\"id\":\"796e78fb-b101-4a96-a470-75e287ba1d38\",\"content\":\"Due to coding errors in an early version of the survey, not all participants were given an opportunity to review their answers in the survey. We instead asked such participants to manually review their answers afterward.\"},{\"content\":\"The two questions added to the supplementary survey were <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=67\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">HB30<\/a> and <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=78\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CX50<\/a>. See <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=63\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Appendix 1<\/a> for full question descriptions.\",\"id\":\"225fd0f8-1486-480a-ac77-0f00898db949\"},{\"id\":\"2ea4ff60-61c5-47e7-bd25-e3b893a6e335\",\"content\":\"For details on the selection criteria, see <a href=\\\"#candidate-high-voi-trees-from-two-camps\\\">Section 3.2<\/a>. A z-score indicates how many standard deviations an observation is from the mean and in which direction. David S. Moore, George P. McCabe, and Bruce A. Craig, <em>Introduction to the Practice of Statistics<\/em> , 6th ed. (New York: W. H. Freeman and Company, 2009), 61.\"},{\"id\":\"a8b2606f-13ca-4cbd-a9e7-3804cf06b584\",\"content\":\"8 superforecasters (7-8 respondents per question) and 11 domain experts (4-6 respondents per question).\"},{\"content\":\"Most questions in the main question-rating survey were selected based on high scores from either superforecasters or expert \u201cproxy\u201d judges, or both. However, two questions, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX30<\/a> and <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=79\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">HS50<\/a>, were randomly selected from the intersection of the bottom half of superforecaster and expert proxy scores. While these questions ranked poorly among superforecasters in the main survey, <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX30<\/a> notably received the second-highest score from experts. Overall, the correlation between expert \u201cproxy\u201d scores and expert scores in the main question-rating round was weak.\",\"id\":\"c2d3bf2e-0129-446a-8ad6-69d827c4f894\"},{\"content\":\"For a description of Kullback-Leibler VOI, see <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=104\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=104\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Appendix 4<\/a>: VOI technical explanation.\",\"id\":\"457d57d4-7074-4e27-ba31-a3cfdda6a74c\"},{\"id\":\"50a3568c-4aaa-4010-87cf-270fd37074d8\",\"content\":\"The advantages of POM over straight VOI are (i) it is more interpretable; and (ii) it does not penalize respondents with low prior probability P(U). The size of the update is constrained by the prior probability P(U) together with the probability of the crux event P(c) to be less than P(U) \/ P(c).\"},{\"id\":\"ee57fe49-3195-462b-b7d4-da0c326ba5b5\",\"content\":\"In the supplementary survey (see <a href=\\\"#voi-comparison\\\" id=\\\"#voi-comparison\\\">Section 4.1<\/a>), two superforecasters updated their forecasts slightly, resulting in an average P(U) of 0.26%.\"},{\"content\":\"The goal was to choose the most informative questions. The initial selection criteria were to choose the top-ranked question by POM and POM-z for questions resolving in 2030 and 2050-2070 separately, including both where these disagreed. For 2030, we chose <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=65\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CX30<\/a> (highest POM) and <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=65\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CQ30<\/a> (highest POM-z). For 2050-2070, we chose <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CX50<\/a> based on it having the highest POM. While the selection criteria suggested that <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">VL70<\/a> should be selected as the top POM-z question, as a whole the evidence pointed to <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=82\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">ZA50<\/a> being more informative (higher POM, at 1.59% vs 0.54%; POM-z close to <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=86\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">VL70<\/a>, at 0.53 vs 0.67; and higher under the pairwise wins robustness check, at 87% vs 64%).\",\"id\":\"504d98f7-8629-43b3-9132-3194a74cf216\"},{\"id\":\"94b6adc5-54cf-4da2-8183-6949b68a7581\",\"content\":\"Careful readers will note that the probabilities in this figure do not yield the mean POM VOI values we report (see Tables <a href=\\\"#tab-3-4-1\\\" id=\\\"#tab-3-4-1\\\">3.4.1<\/a> and <a href=\\\"#tab-3-4-2\\\">3.4.2<\/a>). Mean POM VOI tells us how valuable a crux is for a group, on average, by computing POM VOI at the individual level and then aggregating. The average relative updates, across individuals in the same group, sometimes tells a quite different story.\"},{\"id\":\"52f26f06-7eb3-4450-a347-dc695b3df8d8\",\"content\":\"While an extreme data point could typically indicate a coding error, the subcomponents of VOI analysis suggest a genuine answer rather than a common error such as a misplaced decimal. The outlier respondent assigned a low probability (0.5%) to the \u201cadministrative disempowerment warning shot\u201d scenario, but provided a substantial update (a 100-fold increase, from 0.1% to 10%) toward AI extinction if the scenario were to occur. In contrast, all other respondents thought the probability of it occurring was higher (mean=18%), but offered smaller updates than the outlying respondent (mean = 1x, with three updating not at all and one updating down).\"},{\"id\":\"3308dac0-2e96-4bf8-b052-a5242d45c786\",\"content\":\"Or, indeed, the motivations of a misaligned AI system with access to weaponizable technology.\"},{\"id\":\"d335c71e-1743-483e-a771-9cf0b58b9960\",\"content\":\"Interquartile range (IQR) is the middle 50%, or the difference between the 25th and 75th percentile forecasts.\"},{\"id\":\"3fae1832-ff19-4949-8b6b-46492cd511ff\",\"content\":\"The outlier respondent assigns a low probability to the question (5%), but updates substantially (relative risk = 3x), while on average respondents rated the question as having moderate probability (mean=37%) and a moderate relative risk (mean=1.9x).\"},{\"content\":\"Proxy ratings for 2030 questions showed strong negative correlation with POM VOI judgments from the small sample of experts in the main survey. They also showed slight negative correlation with the main survey POM-z. Notably, a question randomly chosen from the bottom half of proxy scores ranked second by expert POM (<a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=66\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">EX30<\/a>). This suggests that many questions from our larger 2030 set might have performed better than the average question in our main question-rating survey if presented to these particular experts.\",\"id\":\"ba0ea008-1455-4f63-a334-49975b362690\"},{\"id\":\"d73bb912-46b5-4ac7-8c40-901425b9b182\",\"content\":\"The 2050\/2070 proxy performed moderately well for our small expert sample, with a correlation between mean expert POM and proxy rank of -0.4, and mean expert POM-z score and proxy rank of -0.5 (a more negative value indicates a stronger correlation, as higher rank orders are considered worse, while higher VOI scores are better).\"},{\"id\":\"10c48bb4-79b0-42e6-a192-9236ceca0466\",\"content\":\"For example, the top five questions on Metaculus at the time of this writing (July 25, 2024), are \u201cWho will be elected US president in 2024?\u201d; \u201cFive years after AGI, will an AI company be a military power?\u201d; \u201cFive years after AGI, if there are digital people, what will be their population?\u201d; \u201cWho will be the Democratic nominee for Vice President on Election Day 2024 (if Joe Biden is no longer the nominee for President)?\u201d and \u201cWhen will an AI win a Gold Medal in the International Math Olympiad?\u201d Of those, only \u201cWhen will an AI win a Gold Medal in the International Math Olympiad?\u201d seems to be interesting primarily because it is an indicator about a more important question.\"},{\"id\":\"43cff954-4b3f-4059-914a-b20d2877bc75\",\"content\":\"Out of the 265 questions in our status quo set, 253 of them (~95%) came from just two platforms: Metaculus and Manifold Markets. We included in our set all questions resolving no earlier than 2027, and which were tagged \u201cAI,\u201d \u201cartificial intelligence,\u201d \u201cmachine learning,\u201d or similar. Because Manifold Markets had a very large overall volume of questions, and because many questions with little engagement on this platform were duplicates of other questions, or otherwise low-quality, we only included Manifold Markets questions which had at least 50 traders at the time of collection.\"},{\"id\":\"3b6caa56-c3da-44a1-890c-adf30fabc3d0\",\"content\":\"Questions can fall into multiple categories.\"},{\"content\":\"Example questions in each category (many questions fall into multiple categories): <br><strong>Acceleration:<\/strong> Deep learning revenue (<a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=72\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">VL30<\/a>)\u2014Revenue from deep learning doubles every two years before 2030. <br><strong>Social \/ Political \/ Economic:<\/strong> AI Socializing (<a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=85\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=85\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">MQ70<\/a>)\u2014humans talk to AIs more than to humans by 2070. <br><strong>Alignment:<\/strong> No interpretability progress (<a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=77\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">ZA40c<\/a>)\u2014by 2040, there are no interpretability tools which allow us to understand the function of state-of-the-art transformer component parts\/circuits.<br><strong>AI harms:<\/strong> Repeated AI harms (<a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=76\\\" type=\\\"link\\\" id=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=76\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">HS40<\/a>)\u2014by 2040, there are at least two events in a five-year period in which an AI system used by a major company causes at least 1,000 deaths or damage of $10B.\",\"id\":\"22b58859-8e48-4a06-b61f-93aa97393620\"},{\"id\":\"e92b2983-5493-48ad-82e6-56c5268bd965\",\"content\":\"We chose the \u201cAlignment\u201d category because it was much more prevalent in the AICT set than in the status quo set, suggesting that the questions in that category may be unique in interesting ways.\"},{\"id\":\"452e355e-3bc4-4c32-8089-684d98657bdd\",\"content\":\"The 25 questions in the AICT set were divided into different themes for analysis of uniqueness, some of which overlapped.\"},{\"content\":\"The theme \u201cpower-seeking\u201d covers questions about AI models developing power-seeking or deceptive behavior; the theme \u201cdeveloper perception\u201d covers questions about AI developers\u2019 perception of alignment work. See <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=95\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=95\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\"><a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=97\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Appendix 3.2<\/a><\/a> for additional information about categorization into themes.\",\"id\":\"2c5fd0c1-116f-414b-9616-a83afcb3710b\"},{\"content\":\"Because AICT questions are often complex or technical, we suspect they may be less fun to forecast and therefore attract fewer participants, though this is untested. As an inexpensive experiment, we are posting these questions to two forecasting platforms to see whether they get engagement. We encourage readers to see <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=113\\\" id=\\\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2024\/08\/ai-conditional-trees.pdf#page=113\\\">Appendix 7<\/a> for further details on how you can submit your own forecasts on these questions.\",\"id\":\"7b29920b-e450-4202-9a40-834e6210314d\"},{\"id\":\"78d4749e-b022-4635-bbf2-2efc3c10648a\",\"content\":\"Questions relating to concrete harms also featured in all three interviews with superforecasters, though this very small sample size makes it difficult to draw any conclusions about superforecasters\u2019 concerns in general.\"},{\"content\":\"The careful reader will notice that the values in this column don\u2019t match those found in Tables <a href=\\\"#tab-3-1-2\\\" id=\\\"#tab-3-1-2\\\">3.1.2<\/a>, <a href=\\\"#tab-3-3-1\\\">3.3.1<\/a> and <a href=\\\"#tab-3-3-2\\\" id=\\\"#tab-3-3-6\\\">3.3.2<\/a>. This is because the two additional questions (<a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=67\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">HB30<\/a> and <a href=\\\"https:\/\/forecastingresearch.org\/pdf\/ai-conditional-trees.pdf#page=78\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">CX50<\/a>) forecasted on by superforecasters are not included in the calculation of z-scores here.\",\"id\":\"5f7eb7bd-e9af-4559-9f51-64107e91deb5\"},{\"content\":\"For example, in the Good Judgment Inc. project that compared superforecasters to other participants in an online forecasting competition, the average question was open for 214 days, with the entire tournament taking place over six years. Christopher W. Karvetski, \u201cSuperforecasters: A Decade of Stochastic Dominance,\u201d technical white paper (2021): 2, <a href=\\\"https:\/\/goodjudgment.com\/wp-content\/uploads\/2021\/10\/Superforecasters-A-Decade-of-Stochastic-Dominance.pdf\\\">https:\/\/goodjudgment.com\/wp-content\/uploads\/2021\/10\/Superforecasters-A-Decade-of-Stochastic-Dominance.pdf<\/a>. In addition to extensive research on shorter-term forecasts, Tetlock et al. found that, at least on some types of questions, experts are more accurate than simple base rate extrapolation over 25 year horizons, although they are much less accurate than they were over 0-2 years. Our research asks forecasters to consider forecasts over many decades, and we do not yet know how much accuracy declines over that much longer period. Philip E. Tetlock et al., \u201c<a href=\\\"https:\/\/forecastingresearch.org\/research\/longrange-subjective-probability-forecasts-of-slowmotion-variables-in-world-politics\\\" type=\\\"research\\\" id=\\\"163\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Long-Range Subjective-Probability Forecasts of Slow-Motion Variables in World Politics: Exploring Limits on Expert Judgment<\/a>,\u201d <em>Futures &amp; Foresight Science<\/em> (2023), 33.\",\"id\":\"2bd98e8d-bd99-48de-a9f3-6288ead09b62\"}]"},"research_type":[4],"class_list":["post-1242","research","type-research","status-publish","has-post-thumbnail","hentry","research_type-working-paper"],"acf":[],"yoast_head":"<title>Conditional Trees: A Method for Generating Informative Questions about Complex Topics &#8211; Forecasting Research Institute<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Conditional Trees: A Method for Generating Informative Questions about Complex Topics &#8211; Forecasting Research Institute\" \/>\n<meta property=\"og:description\" content=\"In this study, we used structured interviews with domain experts and superforecasters to generate questions that provide high &quot;value of information&quot; regarding a far-future outcome.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees\" \/>\n<meta property=\"og:site_name\" content=\"Forecasting Research Institute\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-05T14:27:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/illustration_Midjourney_AI-Conditional-Trees.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1607\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees\",\"name\":\"Conditional Trees: A Method for Generating Informative Questions about Complex Topics &#8211; Forecasting Research Institute\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2026\\\/04\\\/illustration_Midjourney_AI-Conditional-Trees.jpg\",\"datePublished\":\"2024-08-12T12:00:00+00:00\",\"dateModified\":\"2026-05-05T14:27:24+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees#primaryimage\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2026\\\/04\\\/illustration_Midjourney_AI-Conditional-Trees.jpg\",\"contentUrl\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2026\\\/04\\\/illustration_Midjourney_AI-Conditional-Trees.jpg\",\"width\":2560,\"height\":1607},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/ai-conditional-trees#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/forecastingresearch.org\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Conditional Trees: A Method for Generating Informative Questions about Complex Topics\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/#website\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/\",\"name\":\"Forecasting Research Institute\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/forecastingresearch.org\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>","yoast_head_json":{"title":"Conditional Trees: A Method for Generating Informative Questions about Complex Topics &#8211; Forecasting Research Institute","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees","og_locale":"en_US","og_type":"article","og_title":"Conditional Trees: A Method for Generating Informative Questions about Complex Topics &#8211; Forecasting Research Institute","og_description":"In this study, we used structured interviews with domain experts and superforecasters to generate questions that provide high \"value of information\" regarding a far-future outcome.","og_url":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees","og_site_name":"Forecasting Research Institute","article_modified_time":"2026-05-05T14:27:24+00:00","og_image":[{"width":2560,"height":1607,"url":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/illustration_Midjourney_AI-Conditional-Trees.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees","url":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees","name":"Conditional Trees: A Method for Generating Informative Questions about Complex Topics &#8211; Forecasting Research Institute","isPartOf":{"@id":"https:\/\/forecastingresearch.org\/#website"},"primaryImageOfPage":{"@id":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#primaryimage"},"image":{"@id":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#primaryimage"},"thumbnailUrl":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/illustration_Midjourney_AI-Conditional-Trees.jpg","datePublished":"2024-08-12T12:00:00+00:00","dateModified":"2026-05-05T14:27:24+00:00","breadcrumb":{"@id":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/forecastingresearch.org\/research\/ai-conditional-trees"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#primaryimage","url":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/illustration_Midjourney_AI-Conditional-Trees.jpg","contentUrl":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/illustration_Midjourney_AI-Conditional-Trees.jpg","width":2560,"height":1607},{"@type":"BreadcrumbList","@id":"https:\/\/forecastingresearch.org\/research\/ai-conditional-trees#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/forecastingresearch.org\/"},{"@type":"ListItem","position":2,"name":"Conditional Trees: A Method for Generating Informative Questions about Complex Topics"}]},{"@type":"WebSite","@id":"https:\/\/forecastingresearch.org\/#website","url":"https:\/\/forecastingresearch.org\/","name":"Forecasting Research Institute","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/forecastingresearch.org\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/1242","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research"}],"about":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/types\/research"}],"version-history":[{"count":79,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/1242\/revisions"}],"predecessor-version":[{"id":2187,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/1242\/revisions\/2187"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/media\/1243"}],"wp:attachment":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/media?parent=1242"}],"wp:term":[{"taxonomy":"research_type","embeddable":true,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research_type?post=1242"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}