Date: Mar 27, 2024
Why we recommend risk assessments over evaluations for AI-enabled biological tools (BTs)
Topic/Area:
BiosecurityDate: Mar 27, 2024
Topic/Area:
BiosecurityAs part of our work to identify the three most beneficial next steps that the UK Government can take to reduce the biological risk posed by BTs, our team reflected on where the approach to narrow, specialised tools will need to differ from existing approaches to mitigating the risks from frontier AI.
In this post, we outline why comprehensive risk assessments—which draw on literature and stakeholder engagement to assess these tools’ capabilities—are an effective and feasible alternative to conducting evaluations .
For more detail, see our March 2024 policy paper, How the UK Government should address the misuse risk from AI-enabled biological tools.
Model evaluations are direct tests of a model’s performance that can be done in a number of ways. In the context of biological risks of frontier models, two evaluation approaches have received particular attention:
(i) automated evaluations: quantified tests that assess model performance without the need for humans to interact with models; and
(ii) red teaming: individuals or teams with different levels of expertise probe models directly to attempt to elicit harmful information.
These evaluations are an important component of both the UK AI Safety Institute (AISI) and leading AI companies’ ongoing, extensive efforts to assess risks from frontier models. They also serve as a valuable mechanism through which to implement other mitigation measures: leading AI companies have agreed, through voluntary commitments, to allow AISI to evaluate their models before they are deployed and to address identified issues.
Despite the central role of UK Government-led model evaluations for frontier models, it will be challenging in the near-term to establish analogous evaluations for BTs.
Challenge 1: It is likely impractical to design and build evaluations suitable for the range of BTs available.
BTs encompass a broad range of highly specialised tools that perform many specific functions – with different inputs, architectures and outputs – and require significant technical skill to use (see our previous work on Understanding AI-facilitated Biological Weapon Development). These differences mean that the design of BT evaluations could be very different from the design of current frontier model evaluations. For example, given a protein design tool, one might develop evaluations to test if the tool could design novel toxins, but these evaluations would not be applicable to genome assembly tools, which do not design proteins. Even for frontier models with chatbot-style interfaces, differences between models can make implementing evaluations challenging. For BTs, this will likely be even more difficult due to the advanced technical skills required to use a given model effectively, and the differences in the specific expertise needed to use different models.
Challenge 2: Even if the Government were to focus on developing evaluations only for the riskiest BTs, identifying them is also challenging:
(A) Factors that are being used to select which frontier models to evaluate – developer characteristics or compute – will not be suitable. State-of-the-art BTs are developed by a broad range of stakeholders across academia and industry, in contrast to frontier models where state-of-the-art development is concentrated among several leading AI companies. This makes it less clear which of the many developers will create BTs that require evaluation pre-deployment. Some state-of-the-art BTs require fairly limited training compute, so training compute is less helpful for identifying leading BTs than frontier models. AlphaFold-2, for example, used approximately 3 x 1021 FLOP of training compute.
(B) We do not yet understand the relevant threat models well enough to identify models to evaluate based on the risk they present. Different BTs enable different parts of the bioweapon development risk chain. Protein design tools could enable actors to design more dangerous biological agents, whereas experimental simulation tools could reduce the amount of agent testing required. It is unclear which steps in the bioweapons development chain are the greatest bottleneck to bioweapons development, and therefore most concerning for BTs to enable. The risk may also differ across actors and threat models; for example, lower-resourced actors could be better enabled by tools that reduce resources needed to build known pathogens, whereas well-resourced actors might be better enabled by tools that improve novel agent design.
Risk assessment based on literature and expert engagement should be done across a broad range of BT sub-categories, and may result in identification of high-risk sub-categories. If suitable information for decision making cannot be gathered from the literature and expert engagement, identified high-risk sub-categories may warrant further evaluation. Models could then be red-teamed: individuals or teams could probe models to attempt to elicit harmful information. Red-teaming results could inform the design of repeatable, automated tests (automated evaluations) for future models. For example, if red-teaming results find that a model can provide harmful information, an automated evaluation could be built to measure the ability of the model to provide that harmful information in the future. Automated evaluations may in turn identify model capabilities that warrant closer scrutiny through red-teaming. For example, if an automated evaluation shows that a model can provide harmful information in one domain, red-teamers might probe the model for similar information in another important domain.
Although risk assessments based on scientific literature and expert engagement could help to inform future evaluations, it is unclear whether doing evaluations will be valuable or advisable.
As such, we recommend that the need for evaluations be determined as risk assessments based on literature and expert engagement are developed and conducted.
We are pleased to publish our 2023 annual report sharing CLTR’s achievements and high-impact activity, with thanks to our funding partners, expert team and collaborators.
In this role, you will be responsible for researching, writing, and submitting grant proposals to secure funding for our various research initiatives, and policy development programs and creating impact reports and other external communication pieces.
We are pleased to announce today that Open Philanthropy is supporting our work to transform global resilience to extreme risks. Open Philanthropy is a philanthropic funder with the mission “to help others as much as we can with the resources available to us.” Open Philanthropy has committed to a grant to CLTR of £4 million […]