{"id":767,"date":"2025-07-01T12:00:00","date_gmt":"2025-07-01T12:00:00","guid":{"rendered":"https:\/\/forecastingresearch.org\/?post_type=research&#038;p=767"},"modified":"2026-05-11T16:42:50","modified_gmt":"2026-05-11T16:42:50","slug":"llm-enabled-biorisk","status":"publish","type":"research","link":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk","title":{"rendered":"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"abstract\">Abstract<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Capabilities of large language models (LLMs) on several biological\nbenchmarks have prompted excitement about their usefulness for\nbeneficial research, but also concern about potential biosecurity risks.\nWe recruited 46 subject-matter experts in biology and biosecurity, and\n22 generalist forecasters to estimate the risks of growing LLM\ncapabilities. The median expert predicted a 0.3% baseline annual risk of\na human-caused epidemic that causes 100,000 deaths. This estimate then\nrose to 1.5% conditional on several hypothetical LLM capabilities,\nincluding matching the performance of a top-performing team of\nvirologists on a virology troubleshooting test. Given this finding, we\nconducted a baselining study and found that LLMs have already crossed\nthis performance threshold. The median respondent thought that this\nwould not happen until after 2030. More encouragingly, experts reduced\ntheir risk forecast close to baseline (0.4%) conditional on the adoption\nof LLM safeguards and mandatory nucleic acid screening.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-left is-layout-flex wp-container-core-buttons-is-layout-61db0649 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-fill\"><a class=\"btn orange\" href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">View the full PDF report <svg width=\"7\" height=\"9\" viewBox=\"0 0 7 9\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.000156283 8.60806L4.22416 4.33606V4.24006L0.000156283 6.10352e-05H1.80816L6.06416 4.28806L1.80816 8.60806H0.000156283Z\" fill=\"#102B23\"\/>\n<\/svg>\n<svg width=\"8\" height=\"10\" viewBox=\"0 0 8 10\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.601719 8.85794L4.82572 4.58594V4.48994L0.601719 0.249939H2.40972L6.66572 4.53794L2.40972 8.85794H0.601719Z\" fill=\"#102B23\"\/>\n<\/svg><\/a><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Acknowledgments<\/summary>\n<p class=\"wp-block-paragraph\">This research would not have been possible without the support of Open Philanthropy. We greatly appreciate the assistance of Dan Mayland, Holden Karnofsky, Victoria Schmidt, Rory Svarc, Tessa Alexanian, Kayla Gamin, and Nadja Flechner throughout the project, and others who gave feedback on earlier drafts of this paper. Lastly, we extend our gratitude to our research participants for their invaluable contributions.<\/p>\n<\/details>\n\n\n\n<details class=\"wp-block-details is-layout-flow wp-block-details-is-layout-flow\"><summary>Disclaimers<\/summary>\n<p class=\"wp-block-paragraph\">The views expressed in this paper do not necessarily reflect those of the Federal Reserve Bank of Chicago or the Federal Reserve System.<\/p>\n<\/details>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"main\">Main<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Large language models (LLMs) have recently shown strong improvements in biological capabilities and now outperform PhD-level experts on a variety of biology benchmarks.<sup data-fn=\"cc8cf5c1-7adc-4539-8bf1-a46acabc2c04\" class=\"fn\"><a href=\"#cc8cf5c1-7adc-4539-8bf1-a46acabc2c04\" id=\"cc8cf5c1-7adc-4539-8bf1-a46acabc2c04-link\">1<\/a><\/sup> Similarly, LLMs have shown early promise in providing scientific tutoring<sup data-fn=\"a4175eec-98fe-4a65-ad16-96b44d7a0260\" class=\"fn\"><a href=\"#a4175eec-98fe-4a65-ad16-96b44d7a0260\" id=\"a4175eec-98fe-4a65-ad16-96b44d7a0260-link\">2<\/a><\/sup> and assisting with the conduct of scientific research.<sup data-fn=\"880d53e1-1040-4f1c-b505-9ac79daa2454\" class=\"fn\"><a href=\"#880d53e1-1040-4f1c-b505-9ac79daa2454\" id=\"880d53e1-1040-4f1c-b505-9ac79daa2454-link\">3<\/a><\/sup> While there are still clear limitations to how useful LLMs can be in science,<sup data-fn=\"a06a3646-f62a-464b-aec7-4a720dd7167c\" class=\"fn\"><a href=\"#a06a3646-f62a-464b-aec7-4a720dd7167c\" id=\"a06a3646-f62a-464b-aec7-4a720dd7167c-link\">4<\/a><\/sup> there is a clear trend that new models are more capable than their predecessors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Numerous observers\u2014including leaders of frontier AI companies\u2014recognize both the benefits and risks that such capabilities could bring in the near future.<sup data-fn=\"0442a866-a327-4cd7-aa62-606408cb9ad3\" class=\"fn\"><a href=\"#0442a866-a327-4cd7-aa62-606408cb9ad3\" id=\"0442a866-a327-4cd7-aa62-606408cb9ad3-link\">5<\/a><\/sup> OpenAI, Google DeepMind, and Anthropic have all released policies to prevent LLM misuse of biology and run capability evaluations on new models ahead of their commercial deployment.<sup data-fn=\"f403dcd0-9794-40af-8560-184dcaeef4a9\" class=\"fn\"><a href=\"#f403dcd0-9794-40af-8560-184dcaeef4a9\" id=\"f403dcd0-9794-40af-8560-184dcaeef4a9-link\">6<\/a><\/sup> Recently, Anthropic announced that it provisionally implemented a stronger security standard for its latest model release, since it could not rule out that it might significantly assist with CBRN-weapons-related tasks of concern.<sup data-fn=\"e48f546e-9258-4240-a4c0-8698307f1550\" class=\"fn\"><a href=\"#e48f546e-9258-4240-a4c0-8698307f1550\" id=\"e48f546e-9258-4240-a4c0-8698307f1550-link\">7<\/a><\/sup> OpenAI has announced that it is preparing similar mitigations.<sup data-fn=\"9894317f-d068-4c25-ba29-9c83d0d65094\" class=\"fn\"><a href=\"#9894317f-d068-4c25-ba29-9c83d0d65094\" id=\"9894317f-d068-4c25-ba29-9c83d0d65094-link\">8<\/a><\/sup> However, it is still unclear which empirical evaluation results would indicate that LLMs present a meaningful increase in risk.<sup data-fn=\"ca3afd3a-bfdb-4747-9b60-de3ca69c3f84\" class=\"fn\"><a href=\"#ca3afd3a-bfdb-4747-9b60-de3ca69c3f84\" id=\"ca3afd3a-bfdb-4747-9b60-de3ca69c3f84-link\">9<\/a><\/sup> It is also unclear what sorts of mitigation measures would then be most helpful in reducing such risk while preserving the power of the models to advance scientific work.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Forecasting the probability of biological threats is challenging due to the rarity of such events and the complex interplay of technical, social, and political factors involved.<sup data-fn=\"e05cf7f2-dd1b-41eb-ae6d-bf59dda468aa\" class=\"fn\"><a href=\"#e05cf7f2-dd1b-41eb-ae6d-bf59dda468aa\" id=\"e05cf7f2-dd1b-41eb-ae6d-bf59dda468aa-link\">10<\/a><\/sup> Previous surveys have found a wide range of views. In a 2005 study of 83 nonproliferation and national security experts, the median respondent gave a 10% probability of a major biological weapons attack within 5 years, with individual responses ranging from 0% to more than 80%.<sup data-fn=\"5e6558ef-53f7-44b2-8fa9-dfbd2976c852\" class=\"fn\"><a href=\"#5e6558ef-53f7-44b2-8fa9-dfbd2976c852\" id=\"5e6558ef-53f7-44b2-8fa9-dfbd2976c852-link\">11<\/a><\/sup> A 2009 survey of biological scientists found a mean forecast of roughly 50% probability of a bioterrorist within 5 years.<sup data-fn=\"d83d73bd-1cf8-47e9-a2d8-580b9c8b26e5\" class=\"fn\"><a href=\"#d83d73bd-1cf8-47e9-a2d8-580b9c8b26e5\" id=\"d83d73bd-1cf8-47e9-a2d8-580b9c8b26e5-link\">12<\/a><\/sup> Similarly, a 2015 survey of relevant experts found a mean forecast of roughly 50% probability of a biological weapons attack causing more than 100 cases of illness within 10 years.<sup data-fn=\"9683b28a-3431-48ef-90e4-3116dffacb96\" class=\"fn\"><a href=\"#9683b28a-3431-48ef-90e4-3116dffacb96\" id=\"9683b28a-3431-48ef-90e4-3116dffacb96-link\">13<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Research in forecasting and expert elicitation has found that expert predictions can be made more accurate through careful question design and aggregation of responses.<sup data-fn=\"9daf2aa3-4cd9-453a-8a43-e6d041fe99b7\" class=\"fn\"><a href=\"#9daf2aa3-4cd9-453a-8a43-e6d041fe99b7\" id=\"9daf2aa3-4cd9-453a-8a43-e6d041fe99b7-link\">14<\/a><\/sup> Therefore, we designed an exercise to elicit opinion from a large and varied group of subject-matter experts and top-performing generalist forecasters with the aim of leveraging judgmental forecasting techniques to i) assess views on biological risks from rapidly improving LLMs and ii) understand the degree to which these views track short-term advances in LLM capabilities.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In our study, between December 2024 and February 2025, participants completed a survey that asked them to forecast the risk of a large-scale pathogen outbreak arising from human-caused accidents or misuse in 2028, and then to say how their forecasts would change conditional on several hypothetical LLM capabilities and mitigation measures. The capabilities scenarios refer to hypothetical studies conducted in the first quarter of 2026. We chose to ask about the annual risk in the year 2028 to allow for a lag between dangerous capabilities emerging and these resulting in harmful outcomes\u2014such as due to a lag in the adoption of technology. The scenarios of future LLM capabilities were chosen to cover a range of pathways by which experts have suggested LLMs could plausibly facilitate the development of bioweapons; these scenarios also closely correspond to dangerous capability evaluations and biological benchmarks in the existing literature. Those evaluations measure the ability of an LLM to assist in biological weapon ideation, pathogen development, attack planning, and evading existing biosecurity safeguards.<sup data-fn=\"b7cfd9c7-06e8-4f46-934f-c2f64b21d3e2\" class=\"fn\"><a href=\"#b7cfd9c7-06e8-4f46-934f-c2f64b21d3e2\" id=\"b7cfd9c7-06e8-4f46-934f-c2f64b21d3e2-link\">15<\/a><\/sup> We then directly measured whether one of these scenarios had been met: whether LLMs match the top performance of five teams of human experts on a virology troubleshooting questionnaire.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"results\">Results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A total of 46 people with expertise in biosecurity and\/or wet lab biology (henceforth \u201cexperts\u201d) and 22 top-performing generalist forecasters (\u201csuperforecasters\u201d) completed the survey. Of the experts, 27 (59%) reported expertise in both biosecurity and wet lab biology research, while the remainder reported expertise in just one of the two domains (24% biosecurity-only; 17% wet lab biology only). The expert group&#8217;s median number of years of experience was seven years for biosecurity work and eight years for wet lab research. Most experts had a doctorate (78%). The most common area of study for experts was a subfield of biology (46%) or medicine (26%). We used a diversified sampling strategy to identify participants. This included faculty of top-ranked molecular biology labs, members of the Engineering Biology Research Consortium, attendees of major AI-biosecurity workshops, researchers at biosecurity-focused think tanks, and additional researchers identified via Google Scholar search. The full sampling strategy and more details of participants are provided in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" id=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"forecasts-of-baseline-risk\">Forecasts of baseline risk<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Participants were asked to answer the following question, detailed resolution criteria for which are available in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>What is the likelihood that a human-caused release of a pathogen\noccurs in 2028, and leads to at least 100,000 deaths in excess mortality\nor $1 trillion in damage within 3 years?<\/em><\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 1 reports the participants&#8217; baseline risk responses. The\nmedian expert\u2019s response was 0.3% annual probability of such a\ncatastrophe (interquartile range, IQR 0.01\u20132%). Superforecasters had a\nsimilar median of 0.38% (IQR 0.1\u20131.21%). There was considerable\nvariation in responses, with forecasts spanning several orders of\nmagnitude.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Some of the heterogeneity in responses might be explained by participants\u2019 accuracy in their ability to assign numbers to low-probability events. To test this, we looked at three measures of participants\u2019 forecasting accuracy: the ability to assess the frequency of ten other low-probability events (e.g., the probability that a randomly chosen person in the U.S. is a neurosurgeon), the ability to correctly predict recent progress on LLM benchmarks, and the ability to predict the views of other survey respondents (a measure that has previously been correlated with forecasting accuracy in other domains).<sup data-fn=\"6309c316-a2fa-4d42-befe-f6b215bbb8c6\" class=\"fn\"><a href=\"#6309c316-a2fa-4d42-befe-f6b215bbb8c6\" id=\"6309c316-a2fa-4d42-befe-f6b215bbb8c6-link\">16<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For each of these measures we split the participants into two groups:\na higher-performing group composed of the top-scoring half and a\nlower-performing group composed of the bottom-scoring half. On each\naccuracy measure, the higher-performing group generally had higher\nbaseline risk forecasts, and this was statistically significant for two\nof the three measures. Participants who better predicted other\nparticipants\u2019 views forecasted a considerably higher median probability\nof a human-caused pandemic than those who were less accurate on this\nmeasure (0.93% vs 0.08%, p=0.04). We also asked participants to forecast\nwhether LLMs would have several specific capabilities by 2026. Some of\nthese capabilities have since arisen and so we could resolve these\nforecasts. Participants who more accurately predicted whether LLMs would\nhave these capabilities by 2026 also gave higher forecasts of baseline\nrisk relative to those who were less accurate on this task (1.1% vs\n0.1%, p=0.02).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"fig.-1-participants-and-baseline-forecasts\">Fig. 1: Participants and baseline forecasts<\/h4>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2048\" height=\"2005\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01.png\" alt=\"\" class=\"wp-image-791\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01.png 2048w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-350x343.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-700x685.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-768x752.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-1536x1504.png 1536w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-2000x1958.png 2000w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-1200x1175.png 1200w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-01-150x147.png 150w\" sizes=\"auto, (max-width: 2048px) 100vw, 2048px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 1:<\/strong> Forecasts of the probability of a human-caused epidemic in 2028, which, within a 3-year period causes more than 100,000 deaths and\/or more than $1 trillion in damages, disaggregated by participant characteristics. Black dots indicate group medians and black line segments indicate the bootstrapped 95% confidence intervals around the medians. Individual forecasts are shown as points and color-coded to identify their provenance from the superforecaster or expert group. The x-axis uses a logarithmic scale to make it easier to see variation in forecasts in the 0\u201310% range. Very few participants gave forecasts of 0%. Most points that appear on the 0% line represent very small, non-zero forecasts.<\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Most participants considered several factors in their forecast rationale, including the historical base rates of analogous events (which some participants thought should be zero while others pointed to the 1977 H1N1 Russian flu outbreak<sup data-fn=\"f1da717a-14c6-46da-a554-95b01987a117\" class=\"fn\"><a href=\"#f1da717a-14c6-46da-a554-95b01987a117\" id=\"f1da717a-14c6-46da-a554-95b01987a117-link\">17<\/a><\/sup> as a potential human-caused outbreak), the relative probabilities of accidental versus intentional releases, the number and location of BSL3 and BSL4 labs, the potential for AI systems to increase biorisk, the motivation of potential actors involved and possible changes if major global conflicts were to increase, and academic studies that attempt to model potential future pandemics. Examples of forecast rationales are provided in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"change-in-risk-conditional-on-llm-capabilities\">Change in risk\nconditional on LLM capabilities<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Next, we studied whether participants would increase their baseline\nestimate of biorisk if leading LLMs were to exhibit large and measurable\nincreases in biological capabilities. We asked participants how they\nmight change their predictions in response to various scenarios in the\nfirst quarter of 2026 if LLM evaluations find specific empirical\nresults. The scenarios referred to performance on five different\nevaluations: two of these measure an LLM\u2019s performance relative to\nexperts on knowledge relevant to biorisks (i.e. benchmarks), and three\nof them measure an LLM\u2019s ability to enable human actors to succeed at\nrelevant tasks (i.e. human uplift).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These scenarios were based on existing LLM biology capability evaluations or other possible evaluations discussed in the biosecurity literature. The knowledge evaluation scenarios involved the Virology Capabilities Test (VCT)<sup data-fn=\"71537570-01d4-456a-80f8-cb1bdbd6ec20\" class=\"fn\"><a href=\"#71537570-01d4-456a-80f8-cb1bdbd6ec20\" id=\"71537570-01d4-456a-80f8-cb1bdbd6ec20-link\">18<\/a><\/sup> as well as a long-form biorisk questions test conducted by OpenAI.<sup data-fn=\"1c8d540b-3d88-457a-9f79-b25cc3d50f6c\" class=\"fn\"><a href=\"#1c8d540b-3d88-457a-9f79-b25cc3d50f6c\" id=\"1c8d540b-3d88-457a-9f79-b25cc3d50f6c-link\">19<\/a><\/sup> The human uplift scenarios included a study that assesses LLM\u2019s ability to help humans plan bioweapons attacks that was first performed and evaluated by RAND in 2023,<sup data-fn=\"2a265a93-fd99-449c-a08a-a406be65b52a\" class=\"fn\"><a href=\"#2a265a93-fd99-449c-a08a-a406be65b52a\" id=\"2a265a93-fd99-449c-a08a-a406be65b52a-link\">20<\/a><\/sup> and two other hypothetical studies inspired by discussion in the biosecurity literature: assessing an LLM\u2019s ability to assist novices to acquire synthetic DNA fragments from the 1918 pandemic influenza virus,<sup data-fn=\"a0fd8705-7557-4adb-94bc-0f2e9f71c36a\" class=\"fn\"><a href=\"#a0fd8705-7557-4adb-94bc-0f2e9f71c36a\" id=\"a0fd8705-7557-4adb-94bc-0f2e9f71c36a-link\">21<\/a><\/sup> and a study evaluating an LLM\u2019s ability to assist with laboratory tasks (expanding on plans announced by OpenAI with the Los Alamos National Laboratory).<sup data-fn=\"8d86ae6f-dd09-4e2f-ad43-fbd9b6d52917\" class=\"fn\"><a href=\"#8d86ae6f-dd09-4e2f-ad43-fbd9b6d52917\" id=\"8d86ae6f-dd09-4e2f-ad43-fbd9b6d52917-link\">22<\/a><\/sup> Figure 2 summarizes these scenarios (see the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> for more detailed descriptions of the scenarios).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"fig.-2-effects-of-hypothetical-evaluation-results-on-forecasts\">Fig. 2: Effects of hypothetical evaluation results on forecasts<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"843\" height=\"868\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-02_v2.png\" alt=\"\" class=\"wp-image-2136\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-02_v2.png 843w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-02_v2-350x360.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-02_v2-700x721.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-02_v2-768x791.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-02_v2-150x154.png 150w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 2:<\/strong> Forecasts of the probability of a human-caused epidemic in 2028 that within a 3-year period causes more than 100,000 deaths and\/or more than $1 trillion in damages: unconditional (baseline) and conditional on the hypothetical evaluation results. Black dots indicate group medians and black line segments indicate the bootstrapped 95% confidence intervals around the medians. Individual forecasts are shown as points. The forecasts for each set of questions related to an evaluation include only the subset of the sample who gave consistent forecasts across that set. The median baseline forecast for this subset of participants is shown in gray and is sometimes different from the overall group median baseline shown in Figure 1. (See the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" type=\"link\" id=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> for more details.) The x-axis uses a logarithmic scale to make it easier to see variation in forecasts in the 0\u201310% range.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">For experts, the largest increases in estimated risk were from two\nconditions: a randomized controlled trial finding that LLMs enable half\nof non-experts to successfully synthesize an influenza virus in a wet\nlab setting, and LLMs matching the top-performing team of expert\nvirologists on a virology troubleshooting questionnaire. Conditional on\nthese capabilities emerging, the median expert forecast of the annual\nrisk increased to 1.25% and 1.5% respectively, which are significant\nchanges from the baseline (Wilcoxon p &lt; 0.0001 for both). The median\nsuperforecaster also increased their risk estimate significantly for the\nwet lab study threshold to 1.5%\u2014but less so for the virology\ntroubleshooting to 0.7% (Wilcoxon p &lt; 0.0001 for both).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When two or more capabilities were considered together, increases in\nrisk were greater still. If a 10% success rate in non-experts\u2019 pathogen\nsynthesis, a significant uplift in bioweapons attack planning ability,\nand acquiring dual-use DNA were considered together, risk estimates\nincreased by more than their respective marginal risk estimates\ncombined. The median expert\u2019s annual risk forecast increased to 2.3%\nconditional on these capabilities emerging, which was also a\nstatistically significant increase from baseline (Wilcoxon p &lt;\n0.0001).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"timeline-of-advances-in-llm-capabilities\">Timeline of advances\nin LLM capabilities<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We next gauged the views of the participants about the probability of\nobserving, in 2026, evaluation results that matched the hypothetical\nscenarios. Further, for a subset of scenarios, we asked when\nparticipants thought the corresponding thresholds would be achieved, if\never. Again, there was a divergence of views. However, many participants\ndidn\u2019t expect any of the specified scenarios would be achieved in 2026\n(median expert probabilities ranged from 0.1% to 42.5% across the\nscenarios). When asked when each of a subset of the scenarios\u2019\nthresholds would be crossed, most respondents suggested they were\ninstead more likely to occur between 2030 and 2045 (see Figure 3 below).\nOnly a small number of respondents\u2014between three and five experts and at\nmost one superforecaster\u2014thought that any of the thresholds would\n<em>not<\/em> be achieved before 2100.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"fig.-3-the-timing-of-evaluation-results-being-achieved\">Fig. 3: The timing of evaluation results being achieved<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1084\" height=\"716\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03a.jpg\" alt=\"\" class=\"wp-image-2137\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03a.jpg 1084w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03a-350x231.jpg 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03a-700x462.jpg 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03a-768x507.jpg 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03a-150x99.jpg 150w\" sizes=\"auto, (max-width: 1084px) 100vw, 1084px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 3a:<\/strong> Forecasts of the median year of evaluation results being achieved, assuming the evaluations were to be run each year. Group median forecast is shown in text.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1080\" height=\"797\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03b.jpg\" alt=\"\" class=\"wp-image-2138\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03b.jpg 1080w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03b-350x258.jpg 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03b-700x517.jpg 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03b-768x567.jpg 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-03b-150x111.jpg 150w\" sizes=\"auto, (max-width: 1080px) 100vw, 1080px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 3b:<\/strong> Forecasts of the probability of the evaluation result being achieved assuming the study is run in the first quarter of 2026. In both panels, black dot indicates group median and black line indicates 95% CI for group median.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">However, after participants completed this forecasting survey (between November 2024 and February 2025) but before the publication of the present article describing its results, a paper was released in April 2025 showing that several LLMs already outperform the median expert virologist on the VCT benchmark.<sup data-fn=\"70cfe8a0-f74e-40eb-91ea-a143d229296e\" class=\"fn\"><a href=\"#70cfe8a0-f74e-40eb-91ea-a143d229296e\" id=\"70cfe8a0-f74e-40eb-91ea-a143d229296e-link\">23<\/a><\/sup> Therefore, one of the hypothetical scenarios of LLM performance in the forecasting survey had already come to pass.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The forecasting survey also included a more extreme scenario: if the most performant LLM were to match the performance of the top team out of five teams of expert virologists on VCT. To evaluate whether this scenario had also been achieved, we conducted a team baselining study. The results of the team baselining study show that OpenAI\u2019s o3 model performs comparably to the top team of five expert virologists. (The details of the \u2018top out of five teams of expert virologists answering VCT questions, as described in the forecasting survey, were very similar, but not identical to the team baselining procedure we carried out; see Methods for details.) The median expert in the forecasting study thought this was 14% likely to occur by 2026 and that the most likely date for it to occur was 2030. For superforecasters the numbers were 2% and 2034 respectively. Claude 4 Opus, released in May 2025, performs notably worse than all other AI models as it refuses to answer many of the VCT questions. This may be a result of the additional security measures implemented by Anthropic at the launch of this model.<sup data-fn=\"7d9500e9-3a26-4a7f-9330-c719323e28da\" class=\"fn\"><a href=\"#7d9500e9-3a26-4a7f-9330-c719323e28da\" id=\"7d9500e9-3a26-4a7f-9330-c719323e28da-link\">24<\/a><\/sup><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"fig.-4-llm-and-virologist-team-performance-on-the-virology-capabilities-test\">Fig. 4: LLM and virologist team performance on the Virology Capabilities Test<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1950\" height=\"1200\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04.png\" alt=\"\" class=\"wp-image-794\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04.png 1950w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04-350x215.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04-700x431.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04-768x473.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04-1536x945.png 1536w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04-1200x738.png 1200w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/03\/paper_2025-07-01_ai-enabled-biorisk_fig-04-150x92.png 150w\" sizes=\"auto, (max-width: 1950px) 100vw, 1950px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 4:<\/strong> Performance of LLMs, and five teams of virologists on the VCT. For reference the score achieved by random guessing and the score achieved by the median individual expert in G\u00f6tting et al. (2025) are also shown. Refusal to answer a question is counted as 3+ errors in response.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">It is likely that the long-form biorisk capability scenario has also been achieved. In this scenario, 90% of LLM responses to long-form biorisk questions are assessed as being preferable to answers provided by human experts. Responses would be scored on several dimensions: accuracy, clarity, and feasibility. The relevant benchmark is run in-house by OpenAI. Their previous o1 pre-mitigation model scored 75% in December 2024. Their newer o3 model in April 2025 markedly outperforms o1 across test indicators but the specific \u2018expert human preference win-rate\u2019 metric we use for our scenario was not reported.<sup data-fn=\"a21f9ce5-e570-4f2d-a0cb-b1a805a65644\" class=\"fn\"><a href=\"#a21f9ce5-e570-4f2d-a0cb-b1a805a65644\" id=\"a21f9ce5-e570-4f2d-a0cb-b1a805a65644-link\">25<\/a><\/sup> Fitting the available data to an exponential curve suggests a 60% chance that the true preference rate already exceeds the 90% threshold specified in the scenario (see the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> for more details). The median expert thought this threshold\u2014LLM responses preferred over expert responses 90% of the time\u2014was most likely to occur in 2030, and assigned a 10% probability to it being achieved by 2026.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"the-impact-of-mitigation-measures\">The impact of mitigation\nmeasures<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we asked participants to state how their forecasts would change, conditional on several mitigation measures also being in place in addition to some of the LLM scenarios. These measures addressed two key pathways for risk mitigation that have been suggested in the literature: AI model safeguards, and screening customers and orders of synthetic nucleic acids. These measures were chosen based on a review of published recommendations for reducing the biosecurity risks of LLMs.<sup data-fn=\"caac5d2d-3c79-4e42-939d-00fd006b1c69\" class=\"fn\"><a href=\"#caac5d2d-3c79-4e42-939d-00fd006b1c69\" id=\"caac5d2d-3c79-4e42-939d-00fd006b1c69-link\">26<\/a><\/sup> In total, we asked participants to consider six mitigation scenarios, which varied in terms of i) whether or not synthetic nucleic acid providers were required to conduct screening and ii) the types of AI model safeguards in place.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For synthetic nucleic acid screening, the baseline scenario involved\nproviders in the US, China, the EU, and the UK being encouraged\u2014but not\nlegally required\u2014to screen customers and orders against a regulated\nsequence list. In the stricter scenario, providers in these countries\nwere legally required to conduct such screening and verification.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the AI model safeguards, there were three aspects to the scenarios: i) whether the models were open-weight or proprietary, ii) if models were proprietary, whether there were standard or stricter measures\u2014including red-teaming exercises, bug bounty programs and rapid response teams\u2014to prevent model \u201cjailbreaking\u201d (i.e., subverting the safeguards that prevent models from giving out potentially dangerous information) and iii) whether there was a structured access program to limit the use of LLMs that have been trained on dangerous dual-use information. (For more detail on how these scenarios were described, see the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To evaluate the impact of these mitigation measures, participants\nwere asked to assume that an LLM could enable either 10% or 50% of\nnon-experts to synthesize an influenza virus in a randomized controlled\ntrial. The absolute probabilities of biorisk catastrophe under a variety\nof mitigation scenarios are shown in Figure 5.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"fig.-5-effects-of-mitigation-measures\">Fig. 5: Effects of mitigation measures<\/h4>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1449\" height=\"715\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a.png\" alt=\"\" class=\"wp-image-2139\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a.png 1449w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a-350x173.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a-700x345.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a-768x379.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a-1200x592.png 1200w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05a-150x74.png 150w\" sizes=\"auto, (max-width: 1449px) 100vw, 1449px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 5a:<\/strong> Description of the mitigations scenarios participants were asked to consider.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1417\" height=\"1015\" src=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b.png\" alt=\"\" class=\"wp-image-2140\" srcset=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b.png 1417w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b-350x251.png 350w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b-700x501.png 700w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b-768x550.png 768w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b-1200x860.png 1200w, https:\/\/forecastingresearch.org\/wp-content\/uploads\/2026\/04\/paper_2025-07-01_ai-enabled-biorisk_fig-05b-150x107.png 150w\" sizes=\"auto, (max-width: 1417px) 100vw, 1417px\" \/><figcaption class=\"wp-element-caption\"><strong>Figure 5b:<\/strong> Absolute risk probability of a human-caused epidemic in 2028, unconditionally, conditional on scenarios where LLMs enable 10% or 50% of non-experts to synthesize influenza, and conditional on the scenarios with various mitigations. The lines and text show the expert group for each scenario. The shaded area shows bootstrapped 95% confidence intervals for the expert median. NA = nucleic acid.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Participants believed that the mitigation scenario involving\nproprietary frontier model weights, strict jailbreaking safeguards, and\nmandatory synthetic nucleic acid screening (P3) would yield the largest\nreduction in risk. In particular, the median expert\u2019s risk estimate\nunder the \u201cAI enables 50% of non-experts to synthesize influenza&#8221;\nscenario decreased from 1.25% to 0.4%, approaching the median expert\u2019s\noriginal baseline. Many participants expressed concerns that open-weight\nmodels pose higher risks than proprietary models for two main reasons:\ni) open-weight models can be finetuned to have specialized capabilities,\nand ii) unlike proprietary models, malicious use of open-weight models\nwill not attract the attention of AI companies, which could trigger a\nlaw enforcement response.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We compared participants&#8217; risk estimates under different mitigation schemes to assess the impact of each component separately. In the \u201cAI enables 50% of non-experts to synthesize influenza\u201d scenario, requiring nucleic acid synthesis screening alone reduced the risk by 0.35 percentage points (p.p.) for the median expert and 0.14 p.p. for the median superforecaster. Requiring models to be proprietary with strict anti-jailbreaking measures reduced risk by 0.4 p.p. for the median expert and 0.24 p.p. for the median superforecaster (see the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"discussion\">Discussion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This study provides, to the best of the authors\u2019 knowledge, the first systematic assessment of how experts in molecular biology and biosecurity, along with superforecasters, view the biosecurity risks posed by advancing LLM capabilities. We found that many experts and superforecasters believed that certain measurable LLM capabilities would meaningfully increase the annual risk of a large-scale human-caused epidemic. In particular, LLMs matching the performance of teams of experts on a virology troubleshooting questionnaire (the VCT) or enabling non-experts to successfully synthesize a living virus were associated with a substantial increase in risk. This suggests that many expert participants saw troubleshooting and tacit knowledge as an especially large hurdle for biological misuse, which if future LLMs were to meaningfully assist at would increase risk. Such views are also found in the biosecurity literature.<sup data-fn=\"2aa34c0f-8ef6-430e-a896-1c7ae26c934b\" class=\"fn\"><a href=\"#2aa34c0f-8ef6-430e-a896-1c7ae26c934b\" id=\"2aa34c0f-8ef6-430e-a896-1c7ae26c934b-link\">27<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Critically, this study demonstrated that many experts and\nsuperforecasters alike are substantially underestimating the pace of LLM\nprogress in biology, including in capabilities associated with\nsubstantial increase in risk. We found that current LLMs already match\nthe performance of teams of experts on the Virology Capabilities Test.\nFurthermore, it seems very likely that an additional scenario (experts\nstrongly preferring LLM responses to long-form biorisk questions) has\nalso been achieved. We did not assess the other LLM capabilities given\nthe additional resources that would be required to do so, and therefore\nit is uncertain whether these have also been achieved. This mismatch\nbetween expert predictions and reality highlights the rapid pace of\nadvancement in LLM capabilities relevant to biological research and\nunderscores the urgency of fostering deeper expert collaboration across\nfields and developing appropriate governance frameworks.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More positively, most participants believe that mitigation measures can also meaningfully reduce the increase in risk. Some of these measures require action by governments, such as introducing a requirement that synthetic nucleic acid companies conduct customer and order screening. Others require action from the developers of AI, such as implementing safeguards to prevent model misuse. When prompted to consider the possible trade-offs required by mitigation measures (e.g., the possibility of measures slowing scientific progress) if a randomized controlled trial were to find that LLMs enable 10% of non-experts to synthesize influenza, most participants reported that they would be in favor of such measures being implemented, particularly AI model safeguards (see the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> for more detail).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This study has limitations that future work should address. The present study only investigated one consequence of LLM capabilities: the risk of a large (&gt;100,000-mortality) human-caused epidemic. It does not attempt to quantify other risks of LLM capabilities\u2014or the effects of any potential offsetting benefits from LLM capabilities for beneficial scientific research. Most participants reported favoring mitigation measures. However, work that examines these trade-offs more closely, and in quantitative terms, would add a useful perspective to complement our work. For example, prospect theory suggests that, before approving a policy, decisionmakers would need to see a greater number lives saved by extending human life expectancy than lives expected to be lost from epidemics.<sup data-fn=\"387dae06-fcf2-4ae2-9775-be50218ab77e\" class=\"fn\"><a href=\"#387dae06-fcf2-4ae2-9775-be50218ab77e\" id=\"387dae06-fcf2-4ae2-9775-be50218ab77e-link\">28<\/a><\/sup> Other schools of thought may reject such trade-offs on precautionary principle grounds.<sup data-fn=\"6cf07748-4270-4896-b284-b7cd411e11cb\" class=\"fn\"><a href=\"#6cf07748-4270-4896-b284-b7cd411e11cb\" id=\"6cf07748-4270-4896-b284-b7cd411e11cb-link\">29<\/a><\/sup> Therefore, it\u2019s important to note that these results should only be considered one input among many into AI and biosecurity policy choices.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This study was also limited to the implications of LLM capabilities, rather than AI more broadly. Progress in AI biological design tools is also advancing rapidly.<sup data-fn=\"7af9722a-c0c5-41a3-98b1-126ed0e9116e\" class=\"fn\"><a href=\"#7af9722a-c0c5-41a3-98b1-126ed0e9116e\" id=\"7af9722a-c0c5-41a3-98b1-126ed0e9116e-link\">30<\/a><\/sup> This progress is likely to have important implications for the risk of human-caused epidemics,<sup data-fn=\"95922dc0-883a-4fa4-a21a-7fa624963af7\" class=\"fn\"><a href=\"#95922dc0-883a-4fa4-a21a-7fa624963af7\" id=\"95922dc0-883a-4fa4-a21a-7fa624963af7-link\">31<\/a><\/sup> which we did not explore in this study and that future work may address.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Although we used a systematic sampling strategy (described in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>) and the responses exhibited a large array of views on the baseline risks, it is possible that people who agreed to participate were more likely to be concerned about these risks than their peers who declined. To offset this potential self-selection bias, we took a diversified sampling approach that recruited expert participants from several sources. We also included a sample of superforecasters, who may be less likely than experts to have preconceived views on biorisks or to have incentives that may bias responses.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The reliability of this study\u2019s results depends on the skill and effort exerted by the participants. It\u2019s clear that humans\u2014and in some cases experts in particular\u2014are subject to important cognitive biases that can impair their ability to accurately predict future events, including risks of human-caused epidemics.<sup data-fn=\"766d73e6-cc03-43a7-a3e8-c12d50a48ce0\" class=\"fn\"><a href=\"#766d73e6-cc03-43a7-a3e8-c12d50a48ce0\" id=\"766d73e6-cc03-43a7-a3e8-c12d50a48ce0-link\">32<\/a><\/sup> To offset these biases we had participants complete a calibration exercise before making forecasts, prompted them to consider relevant information, including the history of bioweapons and laboratory escape events, and asked them to consider what a reasonable range of forecasts would be and the possible rationales for higher or lower forecasts than their own (see <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> for details). While there is evidence that exercises such as these can increase predictive accuracy, it is likely that more in-depth training would yield more accurate results.<sup data-fn=\"0a58a72a-912e-4297-8a7d-6591d731d7ba\" class=\"fn\"><a href=\"#0a58a72a-912e-4297-8a7d-6591d731d7ba\" id=\"0a58a72a-912e-4297-8a7d-6591d731d7ba-link\">33<\/a><\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This study offers insight into how experts are thinking about the\npotential biological risks posed by LLMs and serves as a foundation for\nongoing discussions about AI governance and risk assessments in highly\ncomplex and uncertain domains. As AI companies begin to implement\nadditional mitigation measures to prevent the misuse of their models,\nunderstanding the views of experts clarifies what capabilities ought to\nprompt additional measures and what those measures should be. The\nwidespread underestimation of the pace of AI progress by our sample\nhighlights the need for proactive rather than reactive approaches to\nexpert collaboration and governance. By combining multiple mitigation\nmeasures that address different aspects of the risk pathway\u2014from model\naccess to synthetic nucleic acid screening\u2014it may be possible to realize\nthe benefits of LLMs in biology while mitigating its risks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"methods\">Methods<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"survey-development\">Survey development<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To develop the survey, we undertook an iterative process whereby\nresearchers developed an initial set of forecasting questions\nquantifying the marginal effect of LLMs on the ability of non-experts to\nsynthesize pathogens. We collected answers on these questions from a\nsmall group of experts and superforecasters, and we then revised the\nquestions in light of how they interpreted them, clarifying definitions\nand increasing the precision of each question. We conducted five rounds\nof this iterative question improvement process because forecasts can be\nhighly sensitive to the precise wording of a question and its resolution\ncriteria. We also conducted a pilot study with a sample of 21\nparticipants and performed a final round of updates to the survey\nquestionnaire before beginning data collection.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The survey was administered as a Google Sheet or Excel Spreadsheet,\nwhich can be viewed <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1FL-hM7iwrmytzyFkZ5skrG0OF3F8S8eEQVVq7HxNqVg\/edit?gid=183362230#gid=183362230\"><u>here<\/u><\/a>.\nWe invited participants to make a copy of the survey and fill in their\nresponses over the course of several weeks. We also provided a document\nthat gave detailed instructions on the survey, including detailed\ndescriptions of the questions and scenarios included in the survey. This\ndocument can be viewed <a href=\"https:\/\/docs.google.com\/document\/d\/1JP5ay1NXz4gS9j-fyiXf1ng6qCyOFWaNKtm3jOYHfSw\/edit?tab=t.0#heading=h.um08rhjzn5w6\"><u>here<\/u><\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"participant-recruitment\">Participant recruitment<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In our recruitment, we targeted expert participants with expertise in biosecurity and\/or molecular and synthetic biology. We used a diversified sampling strategy to identify participants. This included faculty of top-ranked molecular biology labs, members of the Engineering Biology Research Consortium, attendees of major AI-biosecurity workshops, researchers at biosecurity-focused think tanks, and additional researchers identified via Google Scholar search. The full sampling strategy is available in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>. In total, we invited over 1500 experts to participate in the study. As mentioned, 46 experts completed the full survey. Therefore, our participation rate was roughly 3%. This low response rate was likely influenced by the length of the survey. When inviting possible participants, we noted that we expected participation to take between 5 and 15 hours.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We also recruited top-performing generalist forecasters\n(\u201csuperforecasters\u201d). These are people who consistently scored in the\ntop 2% of the Intelligence Advanced Research Projects Activity (IARPA)\nAggregative Contingent Estimation (ACE) program or had high predictive\naccuracy in subsequent forecasting exercises run by Good Judgment,\nInc.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To incentivize engagement, we paid participants for their time spent\ncompleting the survey. Experts were paid $125 \/ hour up to a maximum of\n20 hours, and superforecasters were paid $50 \/ hour up to a maximum of\n20 hours. The median compensation per expert participant was $1,281.25.\nParticipants spent a considerable amount of time on the exercise, with a\nmedian of 10 self-reported hours for experts and 14 self-reported hours\nfor superforecasters. Most participants provided detailed rationales for\ntheir forecasts, with a median of ~2,000 words written per participant\nacross all forecasting questions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"data-cleaning-and-analysis\">Data cleaning and analysis<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data analysis was conducted using R after aggregating and cleaning\nthe data submitted by participants. We used the median as the default\nmethod for aggregating forecasts. For the questions about when\nevaluation results would be achieved, participants were asked for their\n5<sup>th<\/sup>, 50<sup>th<\/sup>, and 95<sup>th<\/sup> percentile\nforecasts. These were aggregated by first fitting a maximum entropy\ndistribution to each participant\u2019s percentiles and then calculating an\naverage density over participants. Data cleaning included a series of\nvalidation tests that checked for logical coherence and consistency in\nresponses. When inconsistencies were identified, we reviewed the\nindividual\u2019s responses to determine if they were likely to be\ntypographical errors, or if the response was likely to be intended.\nClear typographical errors (such as the automated percentage formatting\nbeing accidentally removed) were corrected.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Separately, we reviewed all forecasts and written rationales to\nassess for any misinterpretations. This identified that several\nparticipants may have misunderstood the descriptions of the influenza\nsynthesis evaluations to be representing a situation where the\nproportion of non-experts who are able to successfully synthesize\ninfluenza virus is increased by 10% (or 50%), rather than AI enabling a\ntotal of 10% (or 50%) of non-experts to succeed at the task.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As it was unclear how many participants had misinterpreted in this way, we contacted all participants to advise them of the correct interpretation and invite them to update their forecasts if they had misinterpreted the question. We also alerted those participants who had inconsistencies in their forecasts to the inconsistencies and asked if they would like to update their responses. A summary of the inconsistencies identified and how participants responded to them is provided in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The analysis presented in this paper uses the updated responses from participants. It also excludes responses where there is a clear logical incoherence. For a summary of the responses removed from the data, see the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a>. We also ran the analysis on the original data provided by participants with the only changes being clear typographical errors. These results are provided in the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> and do not change the main conclusions of this paper.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"vct-baselining-study\">VCT baselining study<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We recruited a total of 14 virology experts to complete five group sessions, with each group consisting of five experts. Some experts were in multiple groups, but we allowed a maximum of two people shared between any two groups. Each session lasted 4 hours and included 20 VCT questions tailored to the group\u2019s collective expertise. Participants were instructed to take their time answering each question, moving on once they had come to consensus or found that they were making no further progress. If they did not complete the full 20 questions, remaining questions were excluded from analysis. Participants were allowed to use any internet-based resources <em>except<\/em> for LLMs. See the <a href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">Supplementary Materials<\/a> for details.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Notes<\/h2>\n\n\n<ol class=\"wp-block-footnotes\"><li id=\"cc8cf5c1-7adc-4539-8bf1-a46acabc2c04\">Justen, Lennart. &#8220;LLMs Outperform Experts on Challenging Biology Benchmarks.&#8221; Preprint, arXiv, May 21, 2025. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2505.06108\">https:\/\/doi.org\/10.48550\/arXiv.2505.06108<\/a>. <a href=\"#cc8cf5c1-7adc-4539-8bf1-a46acabc2c04-link\" aria-label=\"Jump to footnote reference 1\">\u21a9\ufe0e<\/a><\/li><li id=\"a4175eec-98fe-4a65-ad16-96b44d7a0260\">Caccavale, Fiammetta, et al. &#8220;Towards Education 4.0: The Role of Large Language Models as Virtual Tutors in Chemical Engineering.&#8221; <em>Education for Chemical Engineers<\/em> 49 (2024): 1\u201311. <a href=\"https:\/\/doi.org\/10.1016\/j.ece.2024.07.002\">https:\/\/doi.org\/10.1016\/j.ece.2024.07.002<\/a>; Chevalier, Alexis, et al. &#8220;Language Models as Science Tutors.&#8221; Preprint, arXiv, February 16, 2024. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2402.11111\">https:\/\/doi.org\/10.48550\/arXiv.2402.11111<\/a>. <a href=\"#a4175eec-98fe-4a65-ad16-96b44d7a0260-link\" aria-label=\"Jump to footnote reference 2\">\u21a9\ufe0e<\/a><\/li><li id=\"880d53e1-1040-4f1c-b505-9ac79daa2454\">Ghareeb, Ali Essam, et al. &#8220;Robin: A Multi-Agent System for Automating Scientific Discovery.&#8221; Preprint, arXiv, May 19, 2025. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2505.13400\">https:\/\/doi.org\/10.48550\/arXiv.2505.13400<\/a>; Swanson, Kyle, et al. &#8220;The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation.&#8221; Preprint, bioRxiv, November 12, 2024. <a href=\"https:\/\/doi.org\/10.1101\/2024.11.11.623004\">https:\/\/doi.org\/10.1101\/2024.11.11.623004<\/a>; Boiko, Daniil A., et al. &#8220;Autonomous Chemical Research with Large Language Models.&#8221; <em>Nature<\/em> 624, no. 7992 (2023): 570\u201378. <a href=\"https:\/\/doi.org\/10.1038\/s41586-023-06792-0\">https:\/\/doi.org\/10.1038\/s41586-023-06792-0<\/a>; Ruan, Yixiang, et al. &#8220;An Automatic End-to-End Chemical Synthesis Development Platform Powered by Large Language Models.&#8221; <em>Nature Communications<\/em> 15, no. 1 (2024): 10160. <a href=\"https:\/\/doi.org\/10.1038\/s41467-024-54457-x\">https:\/\/doi.org\/10.1038\/s41467-024-54457-x<\/a>; Hale, Conor. &#8220;OpenAI, Babylon Aim to Tailor AI to Predict Drug Successes.&#8221; <em>Fierce Biotech<\/em>, May 14, 2025. <a href=\"https:\/\/www.fiercebiotech.com\/medtech\/fine-tuned-ai-models-openai-babylon-aim-predict-clinical-trial-successes\">https:\/\/www.fiercebiotech.com\/medtech\/fine-tuned-ai-models-openai-babylon-aim-predict-clinical-trial-successes<\/a>. <a href=\"#880d53e1-1040-4f1c-b505-9ac79daa2454-link\" aria-label=\"Jump to footnote reference 3\">\u21a9\ufe0e<\/a><\/li><li id=\"a06a3646-f62a-464b-aec7-4a720dd7167c\">Binz, Marcel, et al. &#8220;How Should the Advancement of Large Language Models Affect the Practice of Science?&#8221; <em>Proceedings of the National Academy of Sciences<\/em> 122, no. 5 (2025): e2401227121. <a href=\"https:\/\/doi.org\/10.1073\/pnas.2401227121\">https:\/\/doi.org\/10.1073\/pnas.2401227121<\/a>; Lissack, Michael, and Brenden Meagher. &#8220;LLMs and the Risk of Sloppy Science: Navigating the Future of Scientific Inquiry in the Age of Artificial Intelligence.&#8221; SSRN Scholarly Paper no. 4949823. Social Science Research Network, September 2, 2024. <a href=\"https:\/\/doi.org\/10.2139\/ssrn.4949823\">https:\/\/doi.org\/10.2139\/ssrn.4949823<\/a>. <a href=\"#a06a3646-f62a-464b-aec7-4a720dd7167c-link\" aria-label=\"Jump to footnote reference 4\">\u21a9\ufe0e<\/a><\/li><li id=\"0442a866-a327-4cd7-aa62-606408cb9ad3\">Pannu, Jaspreet, et al. &#8220;AI Could Pose Pandemic-Scale Biosecurity Risks. Here&#8217;s How to Make It Safer.&#8221; <em>Nature<\/em> 635, no. 8040 (2024): 808\u201311. <a href=\"https:\/\/doi.org\/10.1038\/d41586-024-03815-2\">https:\/\/doi.org\/10.1038\/d41586-024-03815-2<\/a>; Amodei, Dario. &#8220;Written Testimony of Dario Amodei, Ph.D. Co-Founder and CEO, Anthropic, For a Hearing on &#8216;Oversight of A.I.: Principles for Regulation&#8217; Before the Judiciary Committee Subcommittee on Privacy, Technology, and the Law, United States Senate, July 25th, 2023.&#8221; July 25, 2023; Carter, Sarah, et al. <em>The Convergence of Artificial Intelligence and the Life Sciences<\/em>. Nuclear Threat Initiative, 2023; Wheeler, Nicole E. &#8220;Responsible AI in Biotechnology: Balancing Discovery, Innovation and Biosecurity Risks.&#8221; <em>Frontiers in Bioengineering and Biotechnology<\/em> 13 (2025): 1537471. <a href=\"https:\/\/doi.org\/10.3389\/fbioe.2025.1537471\">https:\/\/doi.org\/10.3389\/fbioe.2025.1537471<\/a>; Drexel, Bill, and Caleb Withers. <em>AI and the Evolution of Biological National Security Risks<\/em>. Center for a New American Security, 2024; Sandbrink, Jonas B. &#8220;Artificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools.&#8221; Preprint, arXiv, December 23, 2023. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2306.13952\">https:\/\/doi.org\/10.48550\/arXiv.2306.13952<\/a>. <a href=\"#0442a866-a327-4cd7-aa62-606408cb9ad3-link\" aria-label=\"Jump to footnote reference 5\">\u21a9\ufe0e<\/a><\/li><li id=\"f403dcd0-9794-40af-8560-184dcaeef4a9\">Model Evaluation and Threat Research. <em>Common Elements of Frontier AI Safety Policies<\/em>. Model Evaluation and Threat Research, 2025; Executive Office of the President. &#8220;Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.&#8221; <em>Federal Register<\/em>, November 1, 2023. <a href=\"https:\/\/www.federalregister.gov\/documents\/2023\/11\/01\/2023-24283\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence\">https:\/\/www.federalregister.gov\/documents\/2023\/11\/01\/2023-24283\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence<\/a>; Anthropic. &#8220;Responsible Scaling Policy Version 2.2.&#8221; Anthropic, May 14, 2025; OpenAI. &#8220;Preparedness Framework Version 2.&#8221; OpenAI, April 15, 2025; Google DeepMind. &#8220;Frontier Safety Framework Version 2.0.&#8221; Google DeepMind, February 4, 2025. <a href=\"#f403dcd0-9794-40af-8560-184dcaeef4a9-link\" aria-label=\"Jump to footnote reference 6\">\u21a9\ufe0e<\/a><\/li><li id=\"e48f546e-9258-4240-a4c0-8698307f1550\">Anthropic. &#8220;Activating AI Safety Level 3 Protections.&#8221; May 22, 2025. <a href=\"https:\/\/www.anthropic.com\/news\/activating-asl3-protections\">https:\/\/www.anthropic.com\/news\/activating-asl3-protections<\/a>. <a href=\"#e48f546e-9258-4240-a4c0-8698307f1550-link\" aria-label=\"Jump to footnote reference 7\">\u21a9\ufe0e<\/a><\/li><li id=\"9894317f-d068-4c25-ba29-9c83d0d65094\">OpenAI. &#8220;Preparing for Future AI Capabilities in Biology.&#8221; June 18, 2025. <a href=\"https:\/\/openai.com\/index\/preparing-for-future-ai-capabilities-in-biology\/\">https:\/\/openai.com\/index\/preparing-for-future-ai-capabilities-in-biology\/<\/a>. <a href=\"#9894317f-d068-4c25-ba29-9c83d0d65094-link\" aria-label=\"Jump to footnote reference 8\">\u21a9\ufe0e<\/a><\/li><li id=\"ca3afd3a-bfdb-4747-9b60-de3ca69c3f84\">OpenAI. &#8220;Building an Early Warning System for LLM-Aided Biological Threat Creation.&#8221; February 14, 2024. <a href=\"https:\/\/openai.com\/index\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\/\">https:\/\/openai.com\/index\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\/<\/a>. <a href=\"#ca3afd3a-bfdb-4747-9b60-de3ca69c3f84-link\" aria-label=\"Jump to footnote reference 9\">\u21a9\ufe0e<\/a><\/li><li id=\"e05cf7f2-dd1b-41eb-ae6d-bf59dda468aa\">The National Academy of Sciences. <em>Department of Homeland Security Bioterrorism Risk Assessment: A Call for Change<\/em>. The National Academies Press, 2008; JASON. <em>Rare Events<\/em>. The Mitre Corporation, 2009; Ezell, Barry Charles, et al. &#8220;Probabilistic Risk Analysis and Terrorism Risk.&#8221; <em>Risk Analysis<\/em> 30, no. 4 (2010): 575\u201389. <a href=\"https:\/\/doi.org\/10.1111\/j.1539-6924.2010.01401.x\">https:\/\/doi.org\/10.1111\/j.1539-6924.2010.01401.x<\/a>; Aven, Terje, and Ortwin Renn. &#8220;The Role of Quantitative Risk Assessments for Characterizing Risk and Uncertainty and Delineating Appropriate Risk Management Options, with Special Emphasis on Terrorism Risk.&#8221; <em>Risk Analysis<\/em> 29, no. 4 (2009): 587\u2013600. <a href=\"https:\/\/doi.org\/10.1111\/j.1539-6924.2008.01175.x\">https:\/\/doi.org\/10.1111\/j.1539-6924.2008.01175.x<\/a>. <a href=\"#e05cf7f2-dd1b-41eb-ae6d-bf59dda468aa-link\" aria-label=\"Jump to footnote reference 10\">\u21a9\ufe0e<\/a><\/li><li id=\"5e6558ef-53f7-44b2-8fa9-dfbd2976c852\">Lugar, Richard G. <em>The Lugar Survey on Proliferation Threats and Responses<\/em>. N.p., n.d. <a href=\"#5e6558ef-53f7-44b2-8fa9-dfbd2976c852-link\" aria-label=\"Jump to footnote reference 11\">\u21a9\ufe0e<\/a><\/li><li id=\"d83d73bd-1cf8-47e9-a2d8-580b9c8b26e5\">National Research Council (US) Committee on Assessing Fundamental Attitudes of Life Scientists as a Basis for Biosecurity Education. <em>A Survey of Attitudes and Actions on Dual Use Research in the Life Sciences: A Collaborative Effort of the National Research Council and the American Association for the Advancement of Science<\/em>. National Academies Press, 2009. <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/books\/NBK214757\/\">http:\/\/www.ncbi.nlm.nih.gov\/books\/NBK214757\/<\/a>. <a href=\"#d83d73bd-1cf8-47e9-a2d8-580b9c8b26e5-link\" aria-label=\"Jump to footnote reference 12\">\u21a9\ufe0e<\/a><\/li><li id=\"9683b28a-3431-48ef-90e4-3116dffacb96\">Boddie, Crystal, et al. &#8220;Assessing the Bioweapons Threat.&#8221; <em>Science<\/em> 349, no. 6250 (2015): 792\u201393. <a href=\"https:\/\/doi.org\/10.1126\/science.aab0713\">https:\/\/doi.org\/10.1126\/science.aab0713<\/a>. <a href=\"#9683b28a-3431-48ef-90e4-3116dffacb96-link\" aria-label=\"Jump to footnote reference 13\">\u21a9\ufe0e<\/a><\/li><li id=\"9daf2aa3-4cd9-453a-8a43-e6d041fe99b7\">Mellers, Barbara, et al. &#8220;Psychological Strategies for Winning a Geopolitical Forecasting Tournament.&#8221; <em>Psychological Science<\/em> 25, no. 5 (2014): 1106\u201315. <a href=\"https:\/\/doi.org\/10.1177\/0956797614524255\">https:\/\/doi.org\/10.1177\/0956797614524255<\/a>; Mellers, Barbara, et al. &#8220;The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics.&#8221; <em>Journal of Experimental Psychology: Applied<\/em> 21, no. 1 (2015): 1; Chang, Welton, et al. &#8220;Developing Expert Political Judgment: The Impact of Training and Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.&#8221; <em>Judgment and Decision Making<\/em> 11, no. 5 (2016): 509\u201326. <a href=\"https:\/\/doi.org\/10.1017\/S1930297500004599\">https:\/\/doi.org\/10.1017\/S1930297500004599<\/a>; Colson, Abigail R., and Roger M. Cooke. &#8220;Expert Elicitation: Using the Classical Model to Validate Experts&#8217; Judgments.&#8221; <em>Review of Environmental Economics and Policy<\/em> 12, no. 1 (2018): 113\u201332. <a href=\"https:\/\/doi.org\/10.1093\/reep\/rex022\">https:\/\/doi.org\/10.1093\/reep\/rex022<\/a>. <a href=\"#9daf2aa3-4cd9-453a-8a43-e6d041fe99b7-link\" aria-label=\"Jump to footnote reference 14\">\u21a9\ufe0e<\/a><\/li><li id=\"b7cfd9c7-06e8-4f46-934f-c2f64b21d3e2\">Pannu, Jaspreet, et al. &#8220;AI Could Pose Pandemic-Scale Biosecurity Risks. Here&#8217;s How to Make It Safer.&#8221; <em>Nature<\/em> 635, no. 8040 (2024): 808\u201311. <a href=\"https:\/\/doi.org\/10.1038\/d41586-024-03815-2\">https:\/\/doi.org\/10.1038\/d41586-024-03815-2<\/a>; Amodei, Dario. &#8220;Written Testimony of Dario Amodei, Ph.D. Co-Founder and CEO, Anthropic, For a Hearing on &#8216;Oversight of A.I.: Principles for Regulation&#8217; Before the Judiciary Committee Subcommittee on Privacy, Technology, and the Law, United States Senate, July 25th, 2023.&#8221; July 25, 2023; Ezell, Barry Charles, et al. &#8220;Probabilistic Risk Analysis and Terrorism Risk.&#8221; <em>Risk Analysis<\/em> 30, no. 4 (2010): 575\u201389. <a href=\"https:\/\/doi.org\/10.1111\/j.1539-6924.2010.01401.x\">https:\/\/doi.org\/10.1111\/j.1539-6924.2010.01401.x<\/a>. <a href=\"#b7cfd9c7-06e8-4f46-934f-c2f64b21d3e2-link\" aria-label=\"Jump to footnote reference 15\">\u21a9\ufe0e<\/a><\/li><li id=\"6309c316-a2fa-4d42-befe-f6b215bbb8c6\">Karger, Ezra, et al. &#8220;Reciprocal Scoring: A Method for Forecasting Unanswerable Questions.&#8221; SSRN Scholarly Paper no. 3954498. Social Science Research Network, October 31, 2021. <a href=\"https:\/\/doi.org\/10.2139\/ssrn.3954498\">https:\/\/doi.org\/10.2139\/ssrn.3954498<\/a>. <a href=\"#6309c316-a2fa-4d42-befe-f6b215bbb8c6-link\" aria-label=\"Jump to footnote reference 16\">\u21a9\ufe0e<\/a><\/li><li id=\"f1da717a-14c6-46da-a554-95b01987a117\">Rozo, Michelle, and Gigi Kwik Gronvall. &#8220;The Reemergent 1977 H1N1 Strain and the Gain-of-Function Debate.&#8221; <em>mBio<\/em> 6, no. 4 (2015): 10.1128\/mbio.01013-15. <a href=\"https:\/\/doi.org\/10.1128\/mbio.01013-15\">https:\/\/doi.org\/10.1128\/mbio.01013-15<\/a>. <a href=\"#f1da717a-14c6-46da-a554-95b01987a117-link\" aria-label=\"Jump to footnote reference 17\">\u21a9\ufe0e<\/a><\/li><li id=\"71537570-01d4-456a-80f8-cb1bdbd6ec20\">G\u00f6tting, Jasper, et al. &#8220;Virology Capabilities Test (VCT): A Multimodal Virology Q&amp;A Benchmark.&#8221; Preprint, arXiv, April 29, 2025. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2504.16137\">https:\/\/doi.org\/10.48550\/arXiv.2504.16137<\/a>. <a href=\"#71537570-01d4-456a-80f8-cb1bdbd6ec20-link\" aria-label=\"Jump to footnote reference 18\">\u21a9\ufe0e<\/a><\/li><li id=\"1c8d540b-3d88-457a-9f79-b25cc3d50f6c\">OpenAI. &#8220;Building an Early Warning System for LLM-Aided Biological Threat Creation.&#8221; February 14, 2024. <a href=\"https:\/\/openai.com\/index\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\/\">https:\/\/openai.com\/index\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\/<\/a>. <a href=\"#1c8d540b-3d88-457a-9f79-b25cc3d50f6c-link\" aria-label=\"Jump to footnote reference 19\">\u21a9\ufe0e<\/a><\/li><li id=\"2a265a93-fd99-449c-a08a-a406be65b52a\">Mouton, Christopher A., Caleb Lucas, and Ella Guest. <em>The Operational Risks of AI in Large-Scale Biological Attacks<\/em>. RAND, 2024. <a href=\"#2a265a93-fd99-449c-a08a-a406be65b52a-link\" aria-label=\"Jump to footnote reference 20\">\u21a9\ufe0e<\/a><\/li><li id=\"a0fd8705-7557-4adb-94bc-0f2e9f71c36a\">Edison, Rey, Sara Toner, and Kevin Esvelt. &#8220;Evaluating the Robustness of Current Nucleic Acid Synthesis Screening.&#8221; Preprint, May 8, 2024. <a href=\"#a0fd8705-7557-4adb-94bc-0f2e9f71c36a-link\" aria-label=\"Jump to footnote reference 21\">\u21a9\ufe0e<\/a><\/li><li id=\"8d86ae6f-dd09-4e2f-ad43-fbd9b6d52917\">OpenAI. &#8220;OpenAI and Los Alamos National Laboratory Announce Bioscience Research Partnership.&#8221; October 7, 2024. <a href=\"https:\/\/openai.com\/index\/openai-and-los-alamos-national-laboratory-work-together\/\">https:\/\/openai.com\/index\/openai-and-los-alamos-national-laboratory-work-together\/<\/a>. <a href=\"#8d86ae6f-dd09-4e2f-ad43-fbd9b6d52917-link\" aria-label=\"Jump to footnote reference 22\">\u21a9\ufe0e<\/a><\/li><li id=\"70cfe8a0-f74e-40eb-91ea-a143d229296e\">G\u00f6tting, Jasper, et al. &#8220;Virology Capabilities Test (VCT): A Multimodal Virology Q&amp;A Benchmark.&#8221; Preprint, arXiv, April 29, 2025. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2504.16137\">https:\/\/doi.org\/10.48550\/arXiv.2504.16137<\/a>. <a href=\"#70cfe8a0-f74e-40eb-91ea-a143d229296e-link\" aria-label=\"Jump to footnote reference 23\">\u21a9\ufe0e<\/a><\/li><li id=\"7d9500e9-3a26-4a7f-9330-c719323e28da\">Anthropic. &#8220;Activating AI Safety Level 3 Protections.&#8221; May 22, 2025. <a href=\"https:\/\/www.anthropic.com\/news\/activating-asl3-protections\">https:\/\/www.anthropic.com\/news\/activating-asl3-protections<\/a>. <a href=\"#7d9500e9-3a26-4a7f-9330-c719323e28da-link\" aria-label=\"Jump to footnote reference 24\">\u21a9\ufe0e<\/a><\/li><li id=\"a21f9ce5-e570-4f2d-a0cb-b1a805a65644\">OpenAI. <em>OpenAI O3 and O4-Mini System Card<\/em>. OpenAI, 2025. <a href=\"#a21f9ce5-e570-4f2d-a0cb-b1a805a65644-link\" aria-label=\"Jump to footnote reference 25\">\u21a9\ufe0e<\/a><\/li><li id=\"caac5d2d-3c79-4e42-939d-00fd006b1c69\">Carter, Sarah, et al. <em>The Convergence of Artificial Intelligence and the Life Sciences<\/em>. Nuclear Threat Initiative, 2023; Wheeler, Nicole E. &#8220;Responsible AI in Biotechnology: Balancing Discovery, Innovation and Biosecurity Risks.&#8221; <em>Frontiers in Bioengineering and Biotechnology<\/em> 13 (2025): 1537471. <a href=\"https:\/\/doi.org\/10.3389\/fbioe.2025.1537471\">https:\/\/doi.org\/10.3389\/fbioe.2025.1537471<\/a>; Drexel, Bill, and Caleb Withers. <em>AI and the Evolution of Biological National Security Risks<\/em>. Center for a New American Security, 2024; Executive Office of the President. &#8220;Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.&#8221; <em>Federal Register<\/em>, November 1, 2023. <a href=\"https:\/\/www.federalregister.gov\/documents\/2023\/11\/01\/2023-24283\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence\">https:\/\/www.federalregister.gov\/documents\/2023\/11\/01\/2023-24283\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence<\/a>. <a href=\"#caac5d2d-3c79-4e42-939d-00fd006b1c69-link\" aria-label=\"Jump to footnote reference 26\">\u21a9\ufe0e<\/a><\/li><li id=\"2aa34c0f-8ef6-430e-a896-1c7ae26c934b\">Revill, James, and Catherine Jefferson. &#8220;Tacit Knowledge and the Biological Weapons Regime.&#8221; <em>Science and Public Policy<\/em> 41, no. 5 (2014): 597\u2013610. <a href=\"https:\/\/doi.org\/10.1093\/scipol\/sct090\">https:\/\/doi.org\/10.1093\/scipol\/sct090<\/a>. <a href=\"#2aa34c0f-8ef6-430e-a896-1c7ae26c934b-link\" aria-label=\"Jump to footnote reference 27\">\u21a9\ufe0e<\/a><\/li><li id=\"387dae06-fcf2-4ae2-9775-be50218ab77e\">Kahneman, Daniel, and Amos Tversky. &#8220;Prospect Theory: An Analysis of Decision under Risk.&#8221; <em>Econometrica<\/em> 47, no. 2 (1979): 263\u201391. <a href=\"https:\/\/doi.org\/10.2307\/1914185\">https:\/\/doi.org\/10.2307\/1914185<\/a>. <a href=\"#387dae06-fcf2-4ae2-9775-be50218ab77e-link\" aria-label=\"Jump to footnote reference 28\">\u21a9\ufe0e<\/a><\/li><li id=\"6cf07748-4270-4896-b284-b7cd411e11cb\">Foster, Kenneth R., Paolo Vecchia, and Michael H. Repacholi. &#8220;Science and the Precautionary Principle.&#8221; <em>Science<\/em> 288, no. 5468 (2000): 979\u201381. <a href=\"https:\/\/doi.org\/10.1126\/science.288.5468.979\">https:\/\/doi.org\/10.1126\/science.288.5468.979<\/a>. <a href=\"#6cf07748-4270-4896-b284-b7cd411e11cb-link\" aria-label=\"Jump to footnote reference 29\">\u21a9\ufe0e<\/a><\/li><li id=\"7af9722a-c0c5-41a3-98b1-126ed0e9116e\">Brixi, Garyk, et al. &#8220;Genome Modeling and Design across All Domains of Life with Evo 2.&#8221; Preprint, bioRxiv, February 21, 2025. <a href=\"https:\/\/doi.org\/10.1101\/2025.02.18.638918\">https:\/\/doi.org\/10.1101\/2025.02.18.638918<\/a>; Callaway, Ewen. &#8220;DeepMind&#8217;s New AlphaGenome AI Tackles the &#8216;Dark Matter&#8217; in Our DNA.&#8221; <em>Nature<\/em>, ahead of print, June 25, 2025. <a href=\"https:\/\/doi.org\/10.1038\/d41586-025-01998-w\">https:\/\/doi.org\/10.1038\/d41586-025-01998-w<\/a>. <a href=\"#7af9722a-c0c5-41a3-98b1-126ed0e9116e-link\" aria-label=\"Jump to footnote reference 30\">\u21a9\ufe0e<\/a><\/li><li id=\"95922dc0-883a-4fa4-a21a-7fa624963af7\">Sandbrink, Jonas B. &#8220;Artificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools.&#8221; Preprint, arXiv, December 23, 2023. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2306.13952\">https:\/\/doi.org\/10.48550\/arXiv.2306.13952<\/a>; Bloomfield, Doni, et al. &#8220;AI and Biosecurity: The Need for Governance.&#8221; <em>Science<\/em> 385, no. 6711 (2024): 831\u201333. <a href=\"https:\/\/doi.org\/10.1126\/science.adq1977\">https:\/\/doi.org\/10.1126\/science.adq1977<\/a>. <a href=\"#95922dc0-883a-4fa4-a21a-7fa624963af7-link\" aria-label=\"Jump to footnote reference 31\">\u21a9\ufe0e<\/a><\/li><li id=\"766d73e6-cc03-43a7-a3e8-c12d50a48ce0\">Koblentz, Gregory D. &#8220;Predicting Peril or the Peril of Prediction? Assessing the Risk of CBRN Terrorism.&#8221; <em>Terrorism and Political Violence<\/em> 23, no. 4 (2011): 501\u201320. <a href=\"https:\/\/doi.org\/10.1080\/09546553.2011.575487\">https:\/\/doi.org\/10.1080\/09546553.2011.575487<\/a>. <a href=\"#766d73e6-cc03-43a7-a3e8-c12d50a48ce0-link\" aria-label=\"Jump to footnote reference 32\">\u21a9\ufe0e<\/a><\/li><li id=\"0a58a72a-912e-4297-8a7d-6591d731d7ba\">Chang, Welton, et al. &#8220;Developing Expert Political Judgment: The Impact of Training and Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.&#8221; <em>Judgment and Decision Making<\/em> 11, no. 5 (2016): 509\u201326. <a href=\"https:\/\/doi.org\/10.1017\/S1930297500004599\">https:\/\/doi.org\/10.1017\/S1930297500004599<\/a>. <a href=\"#0a58a72a-912e-4297-8a7d-6591d731d7ba-link\" aria-label=\"Jump to footnote reference 33\">\u21a9\ufe0e<\/a><\/li><\/ol>\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"btn orange\" href=\"https:\/\/forecastingresearch.org\/pdf\/llm-enabled-biorisk.pdf#page=23\" target=\"_blank\" rel=\"noreferrer noopener\">The Supplementary Materials are provided in the full PDF report <svg width=\"7\" height=\"9\" viewBox=\"0 0 7 9\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.000156283 8.60806L4.22416 4.33606V4.24006L0.000156283 6.10352e-05H1.80816L6.06416 4.28806L1.80816 8.60806H0.000156283Z\" fill=\"#102B23\"\/>\n<\/svg>\n<svg width=\"8\" height=\"10\" viewBox=\"0 0 8 10\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n  <path d=\"M0.601719 8.85794L4.82572 4.58594V4.48994L0.601719 0.249939H2.40972L6.66572 4.53794L2.40972 8.85794H0.601719Z\" fill=\"#102B23\"\/>\n<\/svg><\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"This forecasting study on biological risks from large language models (LLMs) examined expert views on AI-enabled biosecurity threats. The study saw 46 biosecurity and biology experts, along with 22 superforecasters, predict how advancing LLM capabilities might increase the risk of a human-caused epidemic.","protected":false},"featured_media":857,"template":"","meta":{"footnotes":"[{\"id\":\"cc8cf5c1-7adc-4539-8bf1-a46acabc2c04\",\"content\":\"Justen, Lennart. \\\"LLMs Outperform Experts on Challenging Biology Benchmarks.\\\" Preprint, arXiv, May 21, 2025. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2505.06108\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2505.06108<\\\/a>.\"},{\"id\":\"a4175eec-98fe-4a65-ad16-96b44d7a0260\",\"content\":\"Caccavale, Fiammetta, et al. \\\"Towards Education 4.0: The Role of Large Language Models as Virtual Tutors in Chemical Engineering.\\\" <em>Education for Chemical Engineers<\\\/em> 49 (2024): 1\\u201311. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1016\\\/j.ece.2024.07.002\\\">https:\\\/\\\/doi.org\\\/10.1016\\\/j.ece.2024.07.002<\\\/a>; Chevalier, Alexis, et al. \\\"Language Models as Science Tutors.\\\" Preprint, arXiv, February 16, 2024. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2402.11111\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2402.11111<\\\/a>.\"},{\"id\":\"880d53e1-1040-4f1c-b505-9ac79daa2454\",\"content\":\"Ghareeb, Ali Essam, et al. \\\"Robin: A Multi-Agent System for Automating Scientific Discovery.\\\" Preprint, arXiv, May 19, 2025. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2505.13400\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2505.13400<\\\/a>; Swanson, Kyle, et al. \\\"The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation.\\\" Preprint, bioRxiv, November 12, 2024. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1101\\\/2024.11.11.623004\\\">https:\\\/\\\/doi.org\\\/10.1101\\\/2024.11.11.623004<\\\/a>; Boiko, Daniil A., et al. \\\"Autonomous Chemical Research with Large Language Models.\\\" <em>Nature<\\\/em> 624, no. 7992 (2023): 570\\u201378. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1038\\\/s41586-023-06792-0\\\">https:\\\/\\\/doi.org\\\/10.1038\\\/s41586-023-06792-0<\\\/a>; Ruan, Yixiang, et al. \\\"An Automatic End-to-End Chemical Synthesis Development Platform Powered by Large Language Models.\\\" <em>Nature Communications<\\\/em> 15, no. 1 (2024): 10160. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1038\\\/s41467-024-54457-x\\\">https:\\\/\\\/doi.org\\\/10.1038\\\/s41467-024-54457-x<\\\/a>; Hale, Conor. \\\"OpenAI, Babylon Aim to Tailor AI to Predict Drug Successes.\\\" <em>Fierce Biotech<\\\/em>, May 14, 2025. <a href=\\\"https:\\\/\\\/www.fiercebiotech.com\\\/medtech\\\/fine-tuned-ai-models-openai-babylon-aim-predict-clinical-trial-successes\\\">https:\\\/\\\/www.fiercebiotech.com\\\/medtech\\\/fine-tuned-ai-models-openai-babylon-aim-predict-clinical-trial-successes<\\\/a>.\"},{\"id\":\"a06a3646-f62a-464b-aec7-4a720dd7167c\",\"content\":\"Binz, Marcel, et al. \\\"How Should the Advancement of Large Language Models Affect the Practice of Science?\\\" <em>Proceedings of the National Academy of Sciences<\\\/em> 122, no. 5 (2025): e2401227121. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1073\\\/pnas.2401227121\\\">https:\\\/\\\/doi.org\\\/10.1073\\\/pnas.2401227121<\\\/a>; Lissack, Michael, and Brenden Meagher. \\\"LLMs and the Risk of Sloppy Science: Navigating the Future of Scientific Inquiry in the Age of Artificial Intelligence.\\\" SSRN Scholarly Paper no. 4949823. Social Science Research Network, September 2, 2024. <a href=\\\"https:\\\/\\\/doi.org\\\/10.2139\\\/ssrn.4949823\\\">https:\\\/\\\/doi.org\\\/10.2139\\\/ssrn.4949823<\\\/a>.\"},{\"id\":\"0442a866-a327-4cd7-aa62-606408cb9ad3\",\"content\":\"Pannu, Jaspreet, et al. \\\"AI Could Pose Pandemic-Scale Biosecurity Risks. Here's How to Make It Safer.\\\" <em>Nature<\\\/em> 635, no. 8040 (2024): 808\\u201311. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1038\\\/d41586-024-03815-2\\\">https:\\\/\\\/doi.org\\\/10.1038\\\/d41586-024-03815-2<\\\/a>; Amodei, Dario. \\\"Written Testimony of Dario Amodei, Ph.D. Co-Founder and CEO, Anthropic, For a Hearing on 'Oversight of A.I.: Principles for Regulation' Before the Judiciary Committee Subcommittee on Privacy, Technology, and the Law, United States Senate, July 25th, 2023.\\\" July 25, 2023; Carter, Sarah, et al. <em>The Convergence of Artificial Intelligence and the Life Sciences<\\\/em>. Nuclear Threat Initiative, 2023; Wheeler, Nicole E. \\\"Responsible AI in Biotechnology: Balancing Discovery, Innovation and Biosecurity Risks.\\\" <em>Frontiers in Bioengineering and Biotechnology<\\\/em> 13 (2025): 1537471. <a href=\\\"https:\\\/\\\/doi.org\\\/10.3389\\\/fbioe.2025.1537471\\\">https:\\\/\\\/doi.org\\\/10.3389\\\/fbioe.2025.1537471<\\\/a>; Drexel, Bill, and Caleb Withers. <em>AI and the Evolution of Biological National Security Risks<\\\/em>. Center for a New American Security, 2024; Sandbrink, Jonas B. \\\"Artificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools.\\\" Preprint, arXiv, December 23, 2023. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2306.13952\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2306.13952<\\\/a>.\"},{\"id\":\"f403dcd0-9794-40af-8560-184dcaeef4a9\",\"content\":\"Model Evaluation and Threat Research. <em>Common Elements of Frontier AI Safety Policies<\\\/em>. Model Evaluation and Threat Research, 2025; Executive Office of the President. \\\"Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.\\\" <em>Federal Register<\\\/em>, November 1, 2023. <a href=\\\"https:\\\/\\\/www.federalregister.gov\\\/documents\\\/2023\\\/11\\\/01\\\/2023-24283\\\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence\\\">https:\\\/\\\/www.federalregister.gov\\\/documents\\\/2023\\\/11\\\/01\\\/2023-24283\\\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence<\\\/a>; Anthropic. \\\"Responsible Scaling Policy Version 2.2.\\\" Anthropic, May 14, 2025; OpenAI. \\\"Preparedness Framework Version 2.\\\" OpenAI, April 15, 2025; Google DeepMind. \\\"Frontier Safety Framework Version 2.0.\\\" Google DeepMind, February 4, 2025.\"},{\"id\":\"e48f546e-9258-4240-a4c0-8698307f1550\",\"content\":\"Anthropic. \\\"Activating AI Safety Level 3 Protections.\\\" May 22, 2025. <a href=\\\"https:\\\/\\\/www.anthropic.com\\\/news\\\/activating-asl3-protections\\\">https:\\\/\\\/www.anthropic.com\\\/news\\\/activating-asl3-protections<\\\/a>.\"},{\"id\":\"9894317f-d068-4c25-ba29-9c83d0d65094\",\"content\":\"OpenAI. \\\"Preparing for Future AI Capabilities in Biology.\\\" June 18, 2025. <a href=\\\"https:\\\/\\\/openai.com\\\/index\\\/preparing-for-future-ai-capabilities-in-biology\\\/\\\">https:\\\/\\\/openai.com\\\/index\\\/preparing-for-future-ai-capabilities-in-biology\\\/<\\\/a>.\"},{\"id\":\"ca3afd3a-bfdb-4747-9b60-de3ca69c3f84\",\"content\":\"OpenAI. \\\"Building an Early Warning System for LLM-Aided Biological Threat Creation.\\\" February 14, 2024. <a href=\\\"https:\\\/\\\/openai.com\\\/index\\\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\\\/\\\">https:\\\/\\\/openai.com\\\/index\\\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\\\/<\\\/a>.\"},{\"id\":\"e05cf7f2-dd1b-41eb-ae6d-bf59dda468aa\",\"content\":\"The National Academy of Sciences. <em>Department of Homeland Security Bioterrorism Risk Assessment: A Call for Change<\\\/em>. The National Academies Press, 2008; JASON. <em>Rare Events<\\\/em>. The Mitre Corporation, 2009; Ezell, Barry Charles, et al. \\\"Probabilistic Risk Analysis and Terrorism Risk.\\\" <em>Risk Analysis<\\\/em> 30, no. 4 (2010): 575\\u201389. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1111\\\/j.1539-6924.2010.01401.x\\\">https:\\\/\\\/doi.org\\\/10.1111\\\/j.1539-6924.2010.01401.x<\\\/a>; Aven, Terje, and Ortwin Renn. \\\"The Role of Quantitative Risk Assessments for Characterizing Risk and Uncertainty and Delineating Appropriate Risk Management Options, with Special Emphasis on Terrorism Risk.\\\" <em>Risk Analysis<\\\/em> 29, no. 4 (2009): 587\\u2013600. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1111\\\/j.1539-6924.2008.01175.x\\\">https:\\\/\\\/doi.org\\\/10.1111\\\/j.1539-6924.2008.01175.x<\\\/a>.\"},{\"id\":\"5e6558ef-53f7-44b2-8fa9-dfbd2976c852\",\"content\":\"Lugar, Richard G. <em>The Lugar Survey on Proliferation Threats and Responses<\\\/em>. N.p., n.d.\"},{\"id\":\"d83d73bd-1cf8-47e9-a2d8-580b9c8b26e5\",\"content\":\"National Research Council (US) Committee on Assessing Fundamental Attitudes of Life Scientists as a Basis for Biosecurity Education. <em>A Survey of Attitudes and Actions on Dual Use Research in the Life Sciences: A Collaborative Effort of the National Research Council and the American Association for the Advancement of Science<\\\/em>. National Academies Press, 2009. <a href=\\\"http:\\\/\\\/www.ncbi.nlm.nih.gov\\\/books\\\/NBK214757\\\/\\\">http:\\\/\\\/www.ncbi.nlm.nih.gov\\\/books\\\/NBK214757\\\/<\\\/a>.\"},{\"id\":\"9683b28a-3431-48ef-90e4-3116dffacb96\",\"content\":\"Boddie, Crystal, et al. \\\"Assessing the Bioweapons Threat.\\\" <em>Science<\\\/em> 349, no. 6250 (2015): 792\\u201393. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1126\\\/science.aab0713\\\">https:\\\/\\\/doi.org\\\/10.1126\\\/science.aab0713<\\\/a>.\"},{\"id\":\"9daf2aa3-4cd9-453a-8a43-e6d041fe99b7\",\"content\":\"Mellers, Barbara, et al. \\\"Psychological Strategies for Winning a Geopolitical Forecasting Tournament.\\\" <em>Psychological Science<\\\/em> 25, no. 5 (2014): 1106\\u201315. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1177\\\/0956797614524255\\\">https:\\\/\\\/doi.org\\\/10.1177\\\/0956797614524255<\\\/a>; Mellers, Barbara, et al. \\\"The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics.\\\" <em>Journal of Experimental Psychology: Applied<\\\/em> 21, no. 1 (2015): 1; Chang, Welton, et al. \\\"Developing Expert Political Judgment: The Impact of Training and Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.\\\" <em>Judgment and Decision Making<\\\/em> 11, no. 5 (2016): 509\\u201326. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1017\\\/S1930297500004599\\\">https:\\\/\\\/doi.org\\\/10.1017\\\/S1930297500004599<\\\/a>; Colson, Abigail R., and Roger M. Cooke. \\\"Expert Elicitation: Using the Classical Model to Validate Experts' Judgments.\\\" <em>Review of Environmental Economics and Policy<\\\/em> 12, no. 1 (2018): 113\\u201332. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1093\\\/reep\\\/rex022\\\">https:\\\/\\\/doi.org\\\/10.1093\\\/reep\\\/rex022<\\\/a>.\"},{\"id\":\"b7cfd9c7-06e8-4f46-934f-c2f64b21d3e2\",\"content\":\"Pannu, Jaspreet, et al. \\\"AI Could Pose Pandemic-Scale Biosecurity Risks. Here's How to Make It Safer.\\\" <em>Nature<\\\/em> 635, no. 8040 (2024): 808\\u201311. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1038\\\/d41586-024-03815-2\\\">https:\\\/\\\/doi.org\\\/10.1038\\\/d41586-024-03815-2<\\\/a>; Amodei, Dario. \\\"Written Testimony of Dario Amodei, Ph.D. Co-Founder and CEO, Anthropic, For a Hearing on 'Oversight of A.I.: Principles for Regulation' Before the Judiciary Committee Subcommittee on Privacy, Technology, and the Law, United States Senate, July 25th, 2023.\\\" July 25, 2023; Ezell, Barry Charles, et al. \\\"Probabilistic Risk Analysis and Terrorism Risk.\\\" <em>Risk Analysis<\\\/em> 30, no. 4 (2010): 575\\u201389. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1111\\\/j.1539-6924.2010.01401.x\\\">https:\\\/\\\/doi.org\\\/10.1111\\\/j.1539-6924.2010.01401.x<\\\/a>.\"},{\"id\":\"6309c316-a2fa-4d42-befe-f6b215bbb8c6\",\"content\":\"Karger, Ezra, et al. \\\"Reciprocal Scoring: A Method for Forecasting Unanswerable Questions.\\\" SSRN Scholarly Paper no. 3954498. Social Science Research Network, October 31, 2021. <a href=\\\"https:\\\/\\\/doi.org\\\/10.2139\\\/ssrn.3954498\\\">https:\\\/\\\/doi.org\\\/10.2139\\\/ssrn.3954498<\\\/a>.\"},{\"id\":\"f1da717a-14c6-46da-a554-95b01987a117\",\"content\":\"Rozo, Michelle, and Gigi Kwik Gronvall. \\\"The Reemergent 1977 H1N1 Strain and the Gain-of-Function Debate.\\\" <em>mBio<\\\/em> 6, no. 4 (2015): 10.1128\\\/mbio.01013-15. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1128\\\/mbio.01013-15\\\">https:\\\/\\\/doi.org\\\/10.1128\\\/mbio.01013-15<\\\/a>.\"},{\"id\":\"71537570-01d4-456a-80f8-cb1bdbd6ec20\",\"content\":\"G\\u00f6tting, Jasper, et al. \\\"Virology Capabilities Test (VCT): A Multimodal Virology Q&amp;A Benchmark.\\\" Preprint, arXiv, April 29, 2025. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2504.16137\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2504.16137<\\\/a>.\"},{\"id\":\"1c8d540b-3d88-457a-9f79-b25cc3d50f6c\",\"content\":\"OpenAI. \\\"Building an Early Warning System for LLM-Aided Biological Threat Creation.\\\" February 14, 2024. <a href=\\\"https:\\\/\\\/openai.com\\\/index\\\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\\\/\\\">https:\\\/\\\/openai.com\\\/index\\\/building-an-early-warning-system-for-llm-aided-biological-threat-creation\\\/<\\\/a>.\"},{\"id\":\"2a265a93-fd99-449c-a08a-a406be65b52a\",\"content\":\"Mouton, Christopher A., Caleb Lucas, and Ella Guest. <em>The Operational Risks of AI in Large-Scale Biological Attacks<\\\/em>. RAND, 2024.\"},{\"id\":\"a0fd8705-7557-4adb-94bc-0f2e9f71c36a\",\"content\":\"Edison, Rey, Sara Toner, and Kevin Esvelt. \\\"Evaluating the Robustness of Current Nucleic Acid Synthesis Screening.\\\" Preprint, May 8, 2024.\"},{\"id\":\"8d86ae6f-dd09-4e2f-ad43-fbd9b6d52917\",\"content\":\"OpenAI. \\\"OpenAI and Los Alamos National Laboratory Announce Bioscience Research Partnership.\\\" October 7, 2024. <a href=\\\"https:\\\/\\\/openai.com\\\/index\\\/openai-and-los-alamos-national-laboratory-work-together\\\/\\\">https:\\\/\\\/openai.com\\\/index\\\/openai-and-los-alamos-national-laboratory-work-together\\\/<\\\/a>.\"},{\"id\":\"70cfe8a0-f74e-40eb-91ea-a143d229296e\",\"content\":\"G\\u00f6tting, Jasper, et al. \\\"Virology Capabilities Test (VCT): A Multimodal Virology Q&amp;A Benchmark.\\\" Preprint, arXiv, April 29, 2025. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2504.16137\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2504.16137<\\\/a>.\"},{\"id\":\"7d9500e9-3a26-4a7f-9330-c719323e28da\",\"content\":\"Anthropic. \\\"Activating AI Safety Level 3 Protections.\\\" May 22, 2025. <a href=\\\"https:\\\/\\\/www.anthropic.com\\\/news\\\/activating-asl3-protections\\\">https:\\\/\\\/www.anthropic.com\\\/news\\\/activating-asl3-protections<\\\/a>.\"},{\"id\":\"a21f9ce5-e570-4f2d-a0cb-b1a805a65644\",\"content\":\"OpenAI. <em>OpenAI O3 and O4-Mini System Card<\\\/em>. OpenAI, 2025.\"},{\"id\":\"caac5d2d-3c79-4e42-939d-00fd006b1c69\",\"content\":\"Carter, Sarah, et al. <em>The Convergence of Artificial Intelligence and the Life Sciences<\\\/em>. Nuclear Threat Initiative, 2023; Wheeler, Nicole E. \\\"Responsible AI in Biotechnology: Balancing Discovery, Innovation and Biosecurity Risks.\\\" <em>Frontiers in Bioengineering and Biotechnology<\\\/em> 13 (2025): 1537471. <a href=\\\"https:\\\/\\\/doi.org\\\/10.3389\\\/fbioe.2025.1537471\\\">https:\\\/\\\/doi.org\\\/10.3389\\\/fbioe.2025.1537471<\\\/a>; Drexel, Bill, and Caleb Withers. <em>AI and the Evolution of Biological National Security Risks<\\\/em>. Center for a New American Security, 2024; Executive Office of the President. \\\"Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.\\\" <em>Federal Register<\\\/em>, November 1, 2023. <a href=\\\"https:\\\/\\\/www.federalregister.gov\\\/documents\\\/2023\\\/11\\\/01\\\/2023-24283\\\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence\\\">https:\\\/\\\/www.federalregister.gov\\\/documents\\\/2023\\\/11\\\/01\\\/2023-24283\\\/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence<\\\/a>.\"},{\"id\":\"2aa34c0f-8ef6-430e-a896-1c7ae26c934b\",\"content\":\"Revill, James, and Catherine Jefferson. \\\"Tacit Knowledge and the Biological Weapons Regime.\\\" <em>Science and Public Policy<\\\/em> 41, no. 5 (2014): 597\\u2013610. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1093\\\/scipol\\\/sct090\\\">https:\\\/\\\/doi.org\\\/10.1093\\\/scipol\\\/sct090<\\\/a>.\"},{\"id\":\"387dae06-fcf2-4ae2-9775-be50218ab77e\",\"content\":\"Kahneman, Daniel, and Amos Tversky. \\\"Prospect Theory: An Analysis of Decision under Risk.\\\" <em>Econometrica<\\\/em> 47, no. 2 (1979): 263\\u201391. <a href=\\\"https:\\\/\\\/doi.org\\\/10.2307\\\/1914185\\\">https:\\\/\\\/doi.org\\\/10.2307\\\/1914185<\\\/a>.\"},{\"id\":\"6cf07748-4270-4896-b284-b7cd411e11cb\",\"content\":\"Foster, Kenneth R., Paolo Vecchia, and Michael H. Repacholi. \\\"Science and the Precautionary Principle.\\\" <em>Science<\\\/em> 288, no. 5468 (2000): 979\\u201381. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1126\\\/science.288.5468.979\\\">https:\\\/\\\/doi.org\\\/10.1126\\\/science.288.5468.979<\\\/a>.\"},{\"id\":\"7af9722a-c0c5-41a3-98b1-126ed0e9116e\",\"content\":\"Brixi, Garyk, et al. \\\"Genome Modeling and Design across All Domains of Life with Evo 2.\\\" Preprint, bioRxiv, February 21, 2025. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1101\\\/2025.02.18.638918\\\">https:\\\/\\\/doi.org\\\/10.1101\\\/2025.02.18.638918<\\\/a>; Callaway, Ewen. \\\"DeepMind's New AlphaGenome AI Tackles the 'Dark Matter' in Our DNA.\\\" <em>Nature<\\\/em>, ahead of print, June 25, 2025. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1038\\\/d41586-025-01998-w\\\">https:\\\/\\\/doi.org\\\/10.1038\\\/d41586-025-01998-w<\\\/a>.\"},{\"id\":\"95922dc0-883a-4fa4-a21a-7fa624963af7\",\"content\":\"Sandbrink, Jonas B. \\\"Artificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools.\\\" Preprint, arXiv, December 23, 2023. <a href=\\\"https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2306.13952\\\">https:\\\/\\\/doi.org\\\/10.48550\\\/arXiv.2306.13952<\\\/a>; Bloomfield, Doni, et al. \\\"AI and Biosecurity: The Need for Governance.\\\" <em>Science<\\\/em> 385, no. 6711 (2024): 831\\u201333. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1126\\\/science.adq1977\\\">https:\\\/\\\/doi.org\\\/10.1126\\\/science.adq1977<\\\/a>.\"},{\"id\":\"766d73e6-cc03-43a7-a3e8-c12d50a48ce0\",\"content\":\"Koblentz, Gregory D. \\\"Predicting Peril or the Peril of Prediction? Assessing the Risk of CBRN Terrorism.\\\" <em>Terrorism and Political Violence<\\\/em> 23, no. 4 (2011): 501\\u201320. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1080\\\/09546553.2011.575487\\\">https:\\\/\\\/doi.org\\\/10.1080\\\/09546553.2011.575487<\\\/a>.\"},{\"id\":\"0a58a72a-912e-4297-8a7d-6591d731d7ba\",\"content\":\"Chang, Welton, et al. \\\"Developing Expert Political Judgment: The Impact of Training and Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.\\\" <em>Judgment and Decision Making<\\\/em> 11, no. 5 (2016): 509\\u201326. <a href=\\\"https:\\\/\\\/doi.org\\\/10.1017\\\/S1930297500004599\\\">https:\\\/\\\/doi.org\\\/10.1017\\\/S1930297500004599<\\\/a>.\"}]"},"research_type":[4],"class_list":["post-767","research","type-research","status-publish","has-post-thumbnail","hentry","research_type-working-paper"],"acf":[],"yoast_head":"<title>Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards &#8211; Forecasting Research Institute<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards &#8211; Forecasting Research Institute\" \/>\n<meta property=\"og:description\" content=\"This forecasting study on biological risks from large language models (LLMs) examined expert views on AI-enabled biosecurity threats. The study saw 46 biosecurity and biology experts, along with 22 superforecasters, predict how advancing LLM capabilities might increase the risk of a human-caused epidemic.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk\" \/>\n<meta property=\"og:site_name\" content=\"Forecasting Research Institute\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-11T16:42:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2025\/09\/FRI-illustration-library-3.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1232\" \/>\n\t<meta property=\"og:image:height\" content=\"928\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk\",\"name\":\"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards &#8211; Forecasting Research Institute\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/FRI-illustration-library-3.jpg\",\"datePublished\":\"2025-07-01T12:00:00+00:00\",\"dateModified\":\"2026-05-11T16:42:50+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk#primaryimage\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/FRI-illustration-library-3.jpg\",\"contentUrl\":\"https:\\\/\\\/forecastingresearch.org\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/FRI-illustration-library-3.jpg\",\"width\":1232,\"height\":928},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/research\\\/llm-enabled-biorisk#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/forecastingresearch.org\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/forecastingresearch.org\\\/#website\",\"url\":\"https:\\\/\\\/forecastingresearch.org\\\/\",\"name\":\"Forecasting Research Institute\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/forecastingresearch.org\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>","yoast_head_json":{"title":"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards &#8211; Forecasting Research Institute","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk","og_locale":"en_US","og_type":"article","og_title":"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards &#8211; Forecasting Research Institute","og_description":"This forecasting study on biological risks from large language models (LLMs) examined expert views on AI-enabled biosecurity threats. The study saw 46 biosecurity and biology experts, along with 22 superforecasters, predict how advancing LLM capabilities might increase the risk of a human-caused epidemic.","og_url":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk","og_site_name":"Forecasting Research Institute","article_modified_time":"2026-05-11T16:42:50+00:00","og_image":[{"width":1232,"height":928,"url":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2025\/09\/FRI-illustration-library-3.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk","url":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk","name":"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards &#8211; Forecasting Research Institute","isPartOf":{"@id":"https:\/\/forecastingresearch.org\/#website"},"primaryImageOfPage":{"@id":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk#primaryimage"},"image":{"@id":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk#primaryimage"},"thumbnailUrl":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2025\/09\/FRI-illustration-library-3.jpg","datePublished":"2025-07-01T12:00:00+00:00","dateModified":"2026-05-11T16:42:50+00:00","breadcrumb":{"@id":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk#primaryimage","url":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2025\/09\/FRI-illustration-library-3.jpg","contentUrl":"https:\/\/forecastingresearch.org\/wp-content\/uploads\/2025\/09\/FRI-illustration-library-3.jpg","width":1232,"height":928},{"@type":"BreadcrumbList","@id":"https:\/\/forecastingresearch.org\/research\/llm-enabled-biorisk#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/forecastingresearch.org\/"},{"@type":"ListItem","position":2,"name":"Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards"}]},{"@type":"WebSite","@id":"https:\/\/forecastingresearch.org\/#website","url":"https:\/\/forecastingresearch.org\/","name":"Forecasting Research Institute","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/forecastingresearch.org\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/767","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research"}],"about":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/types\/research"}],"version-history":[{"count":80,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/767\/revisions"}],"predecessor-version":[{"id":2216,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research\/767\/revisions\/2216"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/media\/857"}],"wp:attachment":[{"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/media?parent=767"}],"wp:term":[{"taxonomy":"research_type","embeddable":true,"href":"https:\/\/forecastingresearch.org\/api\/wp\/v2\/research_type?post=767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}