Skip to content

Beyond the binary

A nuanced path for open-weight advanced AI

Executive summary

Open-weight advanced AI models—systems whose parameters are freely available for download and adaptation—are reshaping the global AI landscape. Their proliferation is accelerating both opportunity and risk, as well as stretching existing governance frameworks to their limits.

These models are rapidly closing the performance gap with closed alternatives. They enable breakthrough research, broaden access to powerful tools, and drive innovation globally. But once released, they cannot be contained. Others can fine-tune or repurpose them to bypass built-in safeguards—posing systemic risks that neither existing regulations nor safeguards can contain.

This report rejects the binary of “open” versus “closed.” Instead, it proposes a tiered, safety-anchored approach to model release. Openness should be based on rigorous risk assessment and demonstrated safety—not ideology, precedent, or commercial pressure. Models should be opened only when there is credible evidence that risks can be effectively mitigated.

But we have to think beyond technical safeguards. Building public understanding, institutional literacy, and responsive governance capacity is essential to ensure that openness—when justified—translates into safe and accountable deployment.

To advance this agenda, the report outlines practical recommendations for key actors which we broadly group into three categories:

Type of actor
Builders & Enablers
Evaluators & Standard-Setters
Implementers & Enforcers

Who this includes 

AI developers, funders, investors, open-source communities…

AI Safety Institutes, experts and specialists in AI or adjacent domains, international oversight bodies…

Governments, security agencies, public institutions

What this is 

Actors that develop advanced AI systems or provide the capital, infrastructure, and platforms that enable their creation, distribution, and adoption – open or closed.

Institutions and experts that assess systems, define risks and safety thresholds and develop frameworks both in AI and adjacent fields like biosecurity, cybersecurity and more.

Public-sector bodies responsible for deploying safeguards, regulating model use, investing in resilience, and translating safety standards into policy and societal preparedness.

Type of actor

Who this includes 

What this is 

Builders & Enablers

AI developers, funders, investors, open-source communities…

Actors that develop advanced AI systems or provide the capital, infrastructure, and platforms that enable their creation, distribution, and adoption – open or closed.

Evaluators & Standard-Setters

AI Safety Institutes, experts and specialists in AI or adjacent domains, international oversight bodies…

Institutions and experts that assess systems, define risks and safety thresholds and develop frameworks both in AI and adjacent fields like biosecurity, cybersecurity and more.

Implementers & Enforcers

Governments, security agencies, public institutions

Public-sector bodies responsible for deploying safeguards, regulating model use, investing in resilience, and translating safety standards into policy and societal preparedness.

These interventions are not about restricting open innovation—they’re about securing it. The goal is to enable openness in advanced AI where risks can be responsibly managed, and to pair that with concrete investments in public and institutional readiness. Without the ability to monitor, evaluate, and respond to misuse, even well-intentioned openness can undermine safety.

What are model weights?

Model weights[1] are the fundamental numerical parameters that define a model’s behavior by adjusting how it prioritizes different aspects of its input data. Because weights capture what the model has learned, open-weight models—where these weights are publicly released—are central to debates on openness in AI, as they allow others to reuse, modify, or adapt the models.

Introduction

As more capable AI models are released seemingly every other week, there’s a narrative forming of a cat and mouse game between open and closed approaches. This plays out on two levels: the technical aspect where the open-weight models rapidly catch up with the capabilities of closed systems in only a few months, and the broader, decades-old tension[2] between open-source and proprietary development models, now emerging in the AI landscape with much higher stakes.

While capabilities race ahead, governance is still trailing behind—particularly when it comes to how, when, and whether to release powerful models with open weights. Openness is often treated as a goal in itself, even when the conditions for doing so safely are unclear or unproven.

This report proposes a different path forward: a tiered approach to open-weight release, where decisions about model openness are based not on ideology or precedent, but on demonstrated safety and contextual risk. Such a framework would allow more capable models to be shared under the right conditions—enabling scientific and public value—while withholding or restricting access when credible threats to misuse exist.

To support this shift, we begin by assessing the current landscape of open-weight advanced AI, including the technical, regulatory, and institutional gaps that define it. We then outline actionable recommendations across key stakeholder groups that together could make selective openness both safer and more sustainable. If implemented, these measures can help ensure that today’s choices do not foreclose the possibility of safe openness tomorrow—when models may be both significantly more capable and harder to govern.

Current landscape of open-weight advanced AI

A rapidly changing frontier

The pace of advanced AI model releases—highly capable general-purpose AI models[3]—continues to accelerate dramatically. In early 2025, OpenAI[4] and DeepSeek[5] each announced comparable models highly capable of software engineering. OpenAI’s model was closed-source, while DeepSeek’s was open-weight. Just months later, OpenAI[6] and Anthropic[7] released new closed models that significantly outperformed their predecessors. Shortly after that, DeepSeek rolled out a major update[8] to its R1 open-weight model, bringing its performance back in close range with the state-of-the-art closed models. These developments suggest a fast-paced catch-up dynamic, with open-weight and closed models advancing in a tight loop.

Benefits in the balance

The capability gap between open and closed models, while still present, has recently narrowed. A 2024 report by Epoch AI found[14] that open-weight models lag behind the most advanced closed models by approximately 5 to 22 months depending on the task, and they expect this gap to close rapidly. Since Epoch’s publication, new model releases have sent mixed signals – with DeepSeek indeed closing this gap, and Meta’s Llama 4 underperforming expectations – but continue to gain  significant popularity. Hugging Face[15], a central repository where open models are typically published, hosts more than 200,000 text-generation models. As of July 2025, 5 of the top 10 most downloaded text generation models are general-purpose advanced AI models released within one year , demonstrating how quickly these models have been diffusing.

This diffusion suggests potential benefits, though specific studies on the economic impact of open-weight advanced AI models are limited. We can infer possible value from related research: broader open science initiatives generated an estimated €65-95 billion annually for EU GDP according to a 2021 European Commission study[16], while a 2024 econometric study[17] found that open-source software is expected to contribute approximately 2.2% to national GDPs globally in the long run. In particular, an annual report[18] by GitHub, a widely used platform by developers,  documented nearly 150,000 open generative AI projects on their platform in 2024, representing 98% growth compared to 2023. The report also highlights a surge in contributors from Asia, Africa, Latin America and many emerging economies compared to 2023, attributing this indirectly to availability of open AI tools.

While comprehensive evidence on open-weight advanced AI’s specific contributions to the economy and society remains limited, these parallel trends suggest similar positive impacts may emerge as these models are adopted more.

Risks in plain sight

However, as these powerful capabilities diffuse widely, so do their potential risks. Our previous research has shown that advanced AI models exhibit dual-use capabilities[19] that can pose increasingly serious risks as these models grow more capable. Open-weight releases can lower[20] the barrier to misuse for well-resourced actors. Moreover, once these models are released, they cannot be recalled or contained.

Why is it so hard to release open-weight AI models safely?

Open-weight AI models come with no undo button. Once publicised, their weights—essential components that determine the model’s behavior—can be downloaded, copied, and redistributed freely and indefinitely. That raises critical questions: if these weights are just arrays of numbers, what makes them potentially dangerous once they are released? Can we not build some sort of safety mechanism around these numbers to prevent misuse?

The answer lies in the model’s architecture and how easily its behavior can be changed. When model weights are openly available, individuals can bypass built-in safeguards—a process known as jailbreaking—or fine-tune the model on new data to alter its outputs with minimal cost and effort, adversarial evaluations[21] of recent models like DeepSeek’s R1 demonstrate that novel techniques to circumvent safety guardrails are evolving rapidly, keeping pace[22] with the safeguards of the most recent advanced AI models.

Importantly, risk isn’t limited to intentional misuse. A recent study[23] on safety training durability shows that even well-meaning users can accidentally weaken safety systems during fine-tuning. Small, seemingly harmless changes can result in models that behave in undesired ways—yet these failures may go unnoticed, with  standard evaluations often failing to flag such regressions. This makes unintended misuse a serious concern, especially as proliferation of misuse can happen almost immediately[24] after release, as demonstrated in several early examples.

While we have not yet witnessed widespread catastrophic misuse of open-weight models, this absence should not be mistaken for evidence of safety. As the capabilities of these models advance rapidly, waiting for clear evidence of harm before acting could be a costly mistake. Policy and safeguards must evolve just as swiftly, building societal resilience and requiring safety assurances for powerful open releases before the window for meaningful intervention closes.

The International AI Safety Report[25] (released in January 2025, coordinated by the UK Government with contributions from expert delegations across 30 countries) dedicates a chapter to a key risk of open-weight advanced AI models. There is broad consensus that future models are highly likely to significantly assist motivated users with average domain-specific expertise across various threat domains, ranging from novel pathogen creation to automated cyberattacks. Critically, the current lack of preemptive tracking or evaluation means we may only realize we’ve crossed a dangerous threshold after such models have already been released and misused.

Given that these risks are primarily tied to vast and emerging model capabilities, let’s pause and delineate “open-weight advanced AI” in consideration from the broader umbrella of “open-weight AI models.” To illustrate the lack of clarity in the landscape, let’s look more closely at the models we mentioned earlier.

Among the top 10 most downloaded open-weight text generator models on Hugging Face, some are ~400 times larger in size than others, represented with parameter count. While model size is no longer a perfect measure[26] of an AI system’s capabilities or risks, it still presents[27] a proxy: substantially larger advanced AI models are often more capable than smaller ones. This disparity amongst the top 10 most downloaded open-weight models indicates substantially different capability and risk profiles that shouldn’t be evaluated under a single “open-weight” umbrella.

The landscape is further complicated by the divergent aims of the developers. Looking at the same subset of 10 models, we see evidence of this: one small language model[28] seems to merely aim at optimizing its cost and performance, another[29] is released by a company explicitly working towards artificial general intelligence (AGI), while another model released by a company[30] with the mission of opening the most cutting-edge AI models for all. In this fragmented ecosystem, it is increasingly important for policymakers and governance professionals to delineate open-weight advanced AI models from other AI systems, recognizing their unique benefits and challenges rather than applying one-size-fits-all approaches—and to pursue an openness approach that is grounded in demonstrated safety.

Governance gaps and global responses

Currently, there is no commonly adopted industry standard or universal framework for assessing under what conditions and classifications advanced AI models should be released as open-weight to maximize benefits while minimizing risks. Regulatory frameworks that address both advanced AI capabilities and open-weight distribution remain a niche, leaving significant governance gaps

In the absence of a standard approach, examining the safety practices of individual advanced AI developers can reveal how they approach these concerns internally. CFG’s 2024 assessment[46] found significant disparities across companies—some had implemented robust protocols, while others lacked structured safety frameworks altogether. More recent assessments continue to support this uneven picture. Over time, some  AI developers have updated their frameworks, like Meta[47], while others have still not published extensive frameworks comparable to their peers, like Mistral providing only an input moderation framework[48] without further capability testing. Even though these efforts are fragmented, collaboration[49] between dedicated AI institutes, such as the UK AISI or the EU AI Office, and these developers might lay the building blocks for industry standards.

However, these efforts remain fundamentally voluntary and company-dependent. For instance, Meta recently released[50] their most advanced series of AI models with open-weights, yet provided no public assessment against their own safety framework, which itself includes no commitment to public disclosures. As capabilities advance, this governance gap may become increasingly concerning.

Today, an AI model considered dangerous for open-weight release by one company can be freely published by another with minimal evaluation.

The lack of transparency coupled with the absence of common standards means that, even when safety frameworks exist, there’s no way to verify either their implementation or effectiveness. Without addressing this fundamental accountability challenge, even well-intentioned safety efforts by individual developers may prove insufficient against the global landscape of AI development.

The very near future of open-weight advanced-AI

The trajectory of open-weight advanced AI development suggests that increasingly capable models will continue to be released[51] in the near future. At the same time, risk assessments by individual developers reveal substantial inconsistencies that are likely to persist as this acceleration continues—leaving the governance landscape fragmented even while capabilities converge.

The underlying drivers of openness vary. While ideological motivations like embracing open science and democratizing access to technology might play a role, specific strategic considerations might also motivate developers to release open-weight advanced AI models. These include commoditising[52] certain AI capabilities to encourage third-party adoption or indirectly boost demand for other products, and the belief that public scrutiny surfaces[53] safety issues more comprehensively than internal testing alone.

Trade-offs between developer standards and evolving national strategies are likely to shape the trajectory of advanced open-weight AI models. These conversations are evolving in real time, but are already widespread:

  • In China, we see major organisations like Alibaba and DeepSeek primarily pursuing[54] open-weight rather than closed strategies, with plans[55] to release more capable models in the near future. However, questions[56] remain about whether the Chinese government might disapprove of this trajectory, constraining companies’ open release policies to protect strategic national interests.
  • Mistral AI, the leading advanced AI developer in France, recently reconfirmed[57] its strong commitment to open-weight models, positioning this approach as enabling France’s aim[58] to gain sovereignty in AI through international collaborations.
  • The US is currently drafting its national strategic AI plans, many observers expect to adopt a pro-open approach. However, the US might also try to maintain[59] its edge on advanced AI. This suggests a potential “tier-based” openness policy, where models up to certain capability thresholds may be encouraged to be open-weight, while the most advanced models that represent strategic technological advantages might remain protected as national security assets. This approach is reflected by various AI developers.
    • Meta has consistently maintained an ideological stance favouring open-weight releases amongst the major advanced AI developers in the US, releasing[60] their most advanced model Llama4 with open weights a few months ago.
    • OpenAI, which has not opened any of its models since GPT-2[61] in 2019, has announced[62] plans to release an open-weight model in the near future.

This rapid proliferation collides with uneven safety standards and practices across the industry. Anthropic, a closed-model developer, applies an internal capability scale and has flagged[63] its newest model, Opus 4, carries potentially high CBRN risk. The specific tests and thresholds remain undisclosed, but the judgment appears driven by the model’s ability to assist in generating or enabling CBRN threats. Meanwhile, DeepSeek’s open-weight R1, updated in May 2025, outperforms Opus 4 on aggregate capability benchmarks reported[64] in July 2025. Although those benchmarks do not align exactly with Anthropic’s undisclosed risk metrics, the contrast shows how one company can withhold a model for safety reasons while another releases a functionally similar—indeed more capable—system with minimal constraints, producing a patchwork of risk exposure.

OpenAI’s recent actions provide a case study in this tension between openness and safety. When the company first announced plans for a forthcoming open-weight model, they added that they would test this model against their internal  internal Preparedness Framework[65]—a protocol designed to assess risks unique to general-purpose advanced AI, such as assisting with cyberattacks or bioweapon development. Recently, OpenAI announced[66] that they would delay this model’s release, citing safety concerns. OpenAI’s decision may set an important precedent—not only for evaluating advanced models prior to release, but also for applying structured safety testing specifically to an open-weight model, signaling a growing norm of pre-deployment risk assessment. However, this remains a voluntary effort based on an internal, non-standardized framework with no external oversight. Taken together with the Anthropic–DeepSeek contrast, OpenAI’s decision highlights how well-intentioned but ad-hoc assessments cannot ensure system-wide assurance when capabilities are converging.

This overall trajectory reveals several critical questions we must address: how to maximize benefits while effectively managing risks, how to prepare for unforeseen consequences despite precautions, and what approaches might yield more effective technical safeguards. While there are no definitive answers to these complex challenges, the path forward requires balanced, multi-faceted strategies.

Open-weight innovation is only as inclusive as its safeguards—without deliberate investment in safety, accessibility risks turning into exposure.

In the following section, we explore concrete approaches to meet these challenges head-on, outlining both immediate steps and long-term frameworks to help ensure that open-weight AI delivers on its promise while minimizing potential harms.

A nuanced approach: How to have safe advanced AI, and open it too

After examining the current landscape and expected trajectories of open-weight advanced AI, the evidence reveals both substantial benefits that should be preserved and risk factors that may evolve as new capabilities emerge.

We need to move beyond the binary of ‘open versus closed’—what’s needed now is a coordinated, multi-layered framework where every stakeholder has a role in making openness not simply open, but safe.

A tiered approach, where openness is granted in proportion to demonstrated safety, can offer that structure.

We recognise that this isn’t  straightforward. It demands targeted interventions from both policymakers and technical researchers, addressing gaps in governance mechanisms and the need for technical safeguards. With this in mind, we offer concrete recommendations across complementary domains. While not being exhaustive, policy interventions that enable these and promote technical work that is necessary to achieve these can enable more advanced open-weight models to be released gradually and responsibly. Importantly, these measures could strengthen the overall safety ecosystem for advanced AI development more broadly, addressing the critical gaps in advanced AI safety[67] that we already observe today.

Below, we introduce these measures along with the specific actions that relevant stakeholders should take to implement them effectively.

Establish and adopt standards for a tiered open-weight release

Developing clear, rigorous and enforceable standards is what makes a tiered approach to open-weight model release feasible—allowing openness only when it can be demonstrated that release is sufficiently safe. Ideally, such standards would be codified through formal global mechanisms— like ISO certifications —but these processes are sluggish in comparison to fast-evolving advanced AI capabilities. In the meantime, we suggest taking meaningful intermediate steps.

The most immediate priority is to understand what open-weight models can actually do when evaluated against specific threat scenarios. This knowledge gap must be addressed urgently, as model capabilities continue to outpace our current risk assessment frameworks.

For evaluators & standard-setters

1. AI Safety Institutes

AI Safety Institutes (AISIs) are uniquely positioned to lead efforts on developing risk thresholds and clearly documented testing methodologies, creating a  foundation for  making informed decisions on open-weight releases. The UK AI Security Institute’s safety case[68] approach provides an example of a structured framework that combines precise safety claims, supporting evidence, and logical arguments to demonstrate that an AI system meets defined safety standards—particularly valuable when assessing whether open-weight releases meet appropriate safety thresholds. The EU AI Office’s newly forming network of model evaluators[69], though announced as to primarily focus on regulatory enforcement, could extend beyond its primary regulatory enforcement focus to develop specific frameworks for open-weight models, especially to ensure that the EU AI Act does not create a two-tier system where closed models face rigorous scrutiny while similarly capable open-weight models are released with minimal oversight.

It is unclear whether all AISIs will prioritize these efforts similarly as each institute operates[70] with distinctly different priorities and mandates. The UK has demonstrated strong dedication to frontier AI safety, but Singapore’s AISI appears more focused on application-specific outcomes, and the future direction of the American  AISI under the new administration remains uncertain[71]. Given these divergent national priorities, international collaboration between AISIs becomes even more crucial. After all, advanced open-weight models can be downloaded and deployed anywhere, including in countries without frontier AI development capabilities, so the problem of AI risk is inherently global. This demands some level of aligned risk assessment across jurisdictions. The International AI Safety Network[72] holds significant promise in this regard. By coordinating expertise and resources across institutes, it could help establish common standards and evaluation practices, benefiting both open and closed models. International collaboration will be essential to create robust, globally applicable safety frameworks and to ensure that the oversight of advanced AI models is not fragmented, but resilient and adaptive across borders

2. Experts and specialists in AI-adjacent to domains

Domain experts, especially in areas where misuse potential is acute such as chemical weapon proliferation or critical infrastructure management,, should be encouraged to take an active role in building institutional knowledge on how AI misuse risks are understood, evaluated, and governed. Their engagement should go beyond one-off consultations, and instead work in tandem with AISIs in order to develop robust threat models, benchmarks and evaluation standards. Given the breadth and complexity of AI capabilities across scientific domains, a more granular delineation of risks and application contexts is essential to bridge technical realities with clear guidance for policy and governance decisions at the right level of abstraction.

An illustration of this need emerged in two recent reports from the National Academies of Sciences[73] and the RAND Corporation[74], which examined biological misuse risks from different vantage points. The National Academies focused on biological design tools—specialized AI-enabled systems such as protein engineering platforms—not currently possessing the capacity to autonomously produce pandemic-level pathogens. General-purpose AI models and scientific LLMs are not examined in depth; instead, the report briefly acknowledges that such models may pose potential concerns in the future. In contrast, the RAND report mainly evaluated the capabilities of the most recent advanced LLMs for misuse, warning that they are rapidly approaching expert-level performance in biology. RAND examines in detail the technical reasons and developments that increase the plausibility of real-world misuse by capable actors rapidly.  Both reports agree that today’s AI systems do not yet enable fully automated biothreat creation and still require substantial bioscience expertise – on the face of it, this is a reassuring consensus. However, the two reports prioritise different parts of the AI-biology ecosystem and present the risks with very different levels of urgency. This contrast underscores how wide the threat landscape can be—even within a narrow domain—and highlights the importance of mapping individual risk areas clearly so that policy responses stay proportionate. In practice, these contrasting framings can imply different sets of actions—where one benefits from continued monitoring more than another which might need more proactive prevention methods—shaping how institutions define and address AI-enabled biological threats.

Because different areas of application pose distinct types of risks and decision-making challenges, governance structures must be equipped to respond with appropriate nuance and contextual understanding. One promising approach would be to embed scientific governance expertise into the emerging AI safety infrastructure. International bodies such as the Biological Weapons Convention Implementation Support Unit and the Organisation for the Prohibition of Chemical Weapons (OPCW) offer models for systematically integrating technical expertise into governance—such as the OPCW’s Scientific Advisory Board[75], which regularly assesses advances in chemistry that could impact treaty compliance, including new synthesis technologies and dual-use chemical manufacturing trends (OPCW Scientific Advisory Board). Building similar structures within AISIs could help ensure that AI safety evaluations remain grounded in evolving technical realities rather than abstract assessments.

Alternatively—or in parallel—governments could build secure national frameworks that pair AISIs with national security agencies. Through this model, AISIs would collaborate closely with relevant intelligence and biosecurity bodies, using classified data and operational expertise to shape evaluations. AISIs would then act as trusted intermediaries, translating sensitive findings into structured guidance for international coordination and regulatory development, without exposing critical vulnerabilities publicly.

In either approach, the goal is clear: domain expertise must move from the margins to the center of AI safety governance, helping to institutionalize a mature and trusted system for evaluating emerging threats.

For builders & enablers – Open-weight AI developers and providers

Open-weight developers must embed accountability into their release processes, going beyond licensing terms as those  often face significant enforcement challenges. When Meta’s open-weight models were allegedly used[76] for military applications despite explicit prohibitions in its licensing terms, it became clear that legal protections offer little practical security once model weights are publicly available. Adequate accountability should involve adherence to evolving, state-of-the-art risk-assessment standards before releasing any model.

This does not imply that existing models must be withdrawn—many may still be considered low-risk under a credible standard. However, as models grow more capable, developers should be ready with a staged-release playbook. For most systems, open access may remain appropriate; but for frontier models that exceed defined safety thresholds yet offer clear scientific benefit, restricted channels such as verified-access programs can be used to grant downloads only to credentialed researchers. These access frameworks, paired with adequate risk categorisation, can provide stronger safeguards—drawing from practices in biomedical data repositories like dbGaP[77] or the UK Biobank[78], which ensure that only trusted users gain access and are held to appropriate security standards to protect against leaks or misuse.

To complement more restrictive frameworks while supporting academic and research use, sandboxed environments can offer a practical middle ground. Such environments allow researchers and developers to interact with models without directly accessing the raw weights. Sandboxes[79] are commonly used to isolate unstable instances of a model from the main system—whether for further testing or to safeguard the privacy of user inputs, such as proprietary datasets. Academic institutions[80] or developer platforms like GitHub[81] already support such approaches, enabling meaningful experimentation and research while maintaining important security boundaries. Together, controlled distribution and sandboxed access offer differentiated exposure: giving high-risk capabilities only to vetted users while allowing broader, lower-risk experimentation through limited-access channels. Although these mechanisms stop short of offering fully open access, they can significantly reduce exposure risks while still fostering productive engagement with advanced model capabilities.

It’s also important to recognise that building and maintaining these access structures involves real operational demands—including ongoing investment, monitoring, and infrastructure support. Developers who remain committed to the ideals of openness should be prepared to invest in these systems. In parallel, they can also support the broader ecosystem by promoting the adoption of existing open-weight models, helping communities integrate and use them responsibly. This focus on supporting adoption—rather than releasing increasingly powerful models simply in pursuit of an open-access ideal—can reinforce both safety and innovation.

At the same time, it is worth asking: will capability gains eventually push some models beyond the scope of any safe openness tier—leaving most of the world without access to them? This is not just a hypothetical scenario, but a natural consequence of a robust risk mitigation policy that may come to represent the safest course of action for our societies—especially if supported by multilateral consensus among governments, safety institutions, and technical experts.

Of course, tiered access frameworks can also evolve. For instance, more advanced tracing, watermarking, or provenance technologies could eventually enable the release of more capable models in ways that are safer and more accountable. While current techniques may still be circumvented, this is precisely why we need more research and investment into strengthening these safeguards—an area the next section explores in more detail.

Invest in technical safeguards for open-weight models

To accommodate increasingly advanced open-weight models while managing their risks, developers must go beyond broad commitments to openness and invest in safeguards that align with emerging risk standards. Yet building guardrails that preserve accessibility while preventing misuse is far from straightforward. As capabilities grow, traditional safety measures often fall short, and misuse risks increase in both scale and subtlety. Meeting this challenge requires targeted research into interventions designed specifically for open-weight release—ones that remain effective even when models are widely distributed and potentially modified.

For builders & enablers – Open-weight AI developers

As the first line of defense in open-weight deployment, developers must look beyond access controls to the deeper challenge of what models can do once accessed. Embedding safeguards directly into model architecture is essential to reduce risks of malicious repurposing or escalation.

Recent approaches like Tampering Attack Resistance (TAR[82]) show promise by adding defensive layers that resist removal or disabling while maintaining effectiveness for intended uses. Similarly, emerging methods like Deep-Lock[83] might enable parameter-level encryption, ensuring that models can be modified only when the appropriate decryption key is provided by authorized users. Additionally, machine unlearning[84] represents an important frontier: a method that could selectively erase high-risk behaviors or knowledge from a trained model, while preserving useful capabilities.

These approaches offer promising directions for securing open-weight releases at the model level—but they remain early-stage and technically demanding.

Real-world viability will require[85] sustained research, performance evaluations, and threat modeling, particularly given that even heavily protected closed models remain susceptible to jailbreaks. Without greater incentives, clearer deployment norms, or structured support for safety-focused innovation, such protections may remain underutilized. For developers committed to releasing open-weight models, these responsibilities must be embraced as part of the openness itself. Making powerful models broadly accessible is a meaningful contribution to research and innovation—but doing so safely requires care, infrastructure, and discipline equal to that ambition.

Even so, it exemplifies the kind of proactive, safety-oriented research we need more of: not just identifying misuse risks, but designing interventions to neutralize them. Unlearning is just one such avenue.

1. Research funders, investors, and open source AI communities

While developers hold primary responsibility for building safeguards into open-weight models, tackling misuse risks is too complex to be solved by any single organisation or research direction. The techniques highlighted above—TAR, Deep-Lock, and machine unlearning—point to promising technical defenses, but they remain early-stage and carry considerable uncertainties. For example, questions remain[86] about whether unlearning can be made scalable, resistant to reactivation, and robust against adversarial retraining. Similarly, recent findings[87] show that even well-meaning users can inadvertently bypass safety guardrails—often without current evaluations detecting it. These limitations reveal how unreliable today’s safeguards can be and underscore the urgent need for new strategies that are not only testable and robust, but also resilient to evolving misuse tactics.

Given these uncertainties, advancing the next generation of safeguards will require broader engagement across the ecosystem. Public and philanthropic research funders, academic institutions, and open research collectives are pivotal in this effort—supporting the foundational work needed to design technical interventions that remain effective under open-weight conditions. Beyond access governance, there is growing demand for research that directly explores how to limit high-risk capabilities post-release, and how to identify which functions most meaningfully raise misuse risk in the wild.

Alongside research, enterprise-grade safety tools like Protect AI’s Guardian platform[88] are becoming essential for securing open-weight model use in critical infrastructure and commercial applications. Guardian scans models for common threats and technical vulnerabilities, such as architectural backdoors, and runtime exploits. Its integration[89] with platforms like Hugging Face has already helped identify vulnerabilities in thousands of open-source models. While tools like this cannot prevent every instance of misuse, they provide a crucial layer of defense for organizations relying on open models—reducing exposure, flagging embedded risks, and supporting safer adoption.

For investors, supporting the growth and scaling of such technologies is not only an opportunity to meet rising demand in AI safety, but a way to accelerate the transition from experimental safeguards to standard, widely adopted infrastructure across the open-weight development pipeline. For instance, Protect AI is currently in the process of acquisition[90] with Palo Alto Networks, while another startup[91] focusing on securing AI systems also completed a successful investment round. These developments reflect strong investor confidence in AI safety-focused ventures and the expanding market for such solutions. That said, it is equally critical to ensure these technologies remain accessible to critical public infrastructure at affordable costs—not only to protect private sector deployments, but to bolster systemic resilience at a national and global level.

Finally, research collectives and transparency advocates must reckon with a central tension: full openness—including the release of weights, training data, and code—does not simply enhance reproducibility, it expands the attack surface. Initiatives like EleutherAI’s[92] GPT models exemplify community-driven transparency and oversight, but typically lack formal release governance or systematic threat modeling. Releasing models alongside training data may aid scientific replication, but it can also enable more capable misuse by lowering the barrier to retraining or repurposing.

Funders and academic institutions should prioritize work that disentangles which components of AI systems contribute most to real-world risk, and under what conditions. This means investing not just in documentation, benchmarking, and interpretability tooling, but also in rigorous frameworks for release evaluation, red teaming pipelines, and sandboxed research access. By resourcing research into selective openness, safety-preserving transparency, and scalable misuse detection, the broader research ecosystem can ensure that openness serves not only innovation—but also security, accountability, and public trust.

With growing interest in open-weight models as public infrastructure, it is imperative that policymakers and industry leaders invest more in developing robust technical guardrails that can be coupled with advancing capabilities, ensuring long-term safety and security. These investments should focus not just on post-release mitigations but on building safety directly into the weights and distribution mechanisms of the models themselves.

2. Enhance societal preparedness

As open-weight advanced AI models become more capable, societal resilience must advance in parallel. These models, by their broad accessibility and adaptability, lower barriers to misuse and create new vulnerabilities across critical systems. Governance is essential to reducing risks at their source, but even the strongest safeguards cannot eliminate all threats. Preparing for the risks that will still arise is not a concession to failure—it is a recognition of the complexity of the challenge. Building defensive capabilities, strengthening detection systems, and raising public awareness are critical to managing the impacts of misuse when it occurs. The current stage of open-weight AI development offers a narrow window: societies can still prepare deliberately, while threats remain comparatively limited. This preparation must reinforce, not replace, sustained efforts to govern advanced AI responsibly.

While societal preparedness takes many forms[93] and will expand to new areas as advanced AI grows more capable, here are some key approaches we can implement today to build resilience for the future.

For implementers & enforcers

1. Public and private institutions for AI literacy

Public understanding of advanced AI capabilities must grow alongside their availability. As open-weight models become more widely deployed, both the general public and key institutions need a clearer grasp of what these systems can and cannot do. Public-facing literacy initiatives should help users distinguish between realistic threats and inflated narratives, while addressing practical concerns like AI-generated scams, fraud, and deepfakes. These efforts can draw from cybersecurity and media literacy precedents but must now be adapted for the unique attributes of general-purpose AI. Tools like AI Digest[94] can play a valuable role in surfacing new capabilities and informing civil society, not to raise alarm, but to help build informed expectations and support governance grounded in actual risk. This anticipatory mindset can be instrumental to identify and address newly emerging public safety aspects of advanced AI adoption, such as digital mental health[95].

At the same time, specialized literacy programs must be developed for policymakers and professionals involved in critical infrastructure and security. These audiences require deeper understanding of how open-weight models could be exploited to escalate cyberattacks, or enable dual-use applications specifically in their sector and workstreams. A strong example of this approach is the EU AI Act’s mandate[96] requiring deployers of AI systems to ensure sufficient AI literacy among their personnel. However, it is important that this mandate does not stay on the surface, but is comprehensive especially for critical infrastructure and services, including structured threat modeling, misuse escalation scenarios, and proactive measures to mitigate such risks.

2. National cybersecurity, CBRN and critical infrastructure agencies

National security, cybersecurity, and biosecurity agencies must establish AI-specific incident-monitoring and response teams capable of detecting and countering model-enabled threats. Because open-weight AI is a universal technology, the plans these agencies develop should expand beyond their national AI landscape, and keep the global threat landscape in mind—linking domestic systems to cross-border intelligence exchanges, joint drills, and multilateral response protocols.

These teams should track misuse patterns such as AI-generated exploit code, model-assisted vulnerability scanning, dynamic cyber-attacks targeting critical infrastructure, and potential AI-enabled CBRN misuse. Structured national incident-reporting infrastructures must be built to capture, classify, and analyze these incidents systematically, providing early-warning capabilities rather than relying on retrospective analysis. Building resilient national reporting systems is also a necessary foundation for future international coordination. As emphasized by the OECD’s common reporting framework for AI incidents[97], interoperability between national systems will be critical, but domestic detection, classification, and response capabilities must come first.

Recent initiatives, such as the UK’s 2025 AI Cybersecurity Code of Practice[98], demonstrate how national investment in AI-specific security standards can support both domestic resilience and international norm-setting. Regular red-teaming exercises focused on open-weight model misuse must be conducted, simulating not only direct cyberattacks but also complex, multi-domain threat scenarios that blend AI-enabled tactics across information, infrastructure, and critical service domains.

National biosecurity and non-proliferation agencies should likewise strengthen protections around sensitive biological and chemical databases, tighten controls over dual-use research publications, and enhance oversight of laboratory equipment, precursor materials, and acquisition pathways. Scrutiny at these critical junctures is essential to prevent AI capabilities from translating into real-world physical threats. As an additional measure, governments could integrate biosecurity red-team exercises into existing national cybersecurity drills to ensure cross-domain readiness.

For evaluators & standard-setters — International CBRN & biosecurity coordination

International oversight bodies such as the UN BioRisk Working Group[99]and the  Organisation for the Prohibition of Chemical Weapons[100] can support information sharing, develop common evaluation benchmarks, and promote global coordination on AI-related CBRN risks. These organisations might have the power to convene AI researchers and technical advisers within their internal networks, such as relevant UN working groups, or they can coordinate across member countries to pool relevant resources—such as experts, rapid-response funding, necessary tooling or data sets. By harmonising threat-assessment methodologies and publishing open guidance on best-practice mitigations, these bodies help national agencies align their safeguards with emerging technical realities. Early capacity-building efforts like the OPCW-hosted AI and Chemical Safety and Security Management Workshop[101] illustrate the value of this approach and should evolve into a broader, more rigorous agenda with wide participation from member states.

Conclusion

Society deserves more than a binary choice between open and closed AI.

Open-weight models are a powerful driver of innovation, but their release also creates permanent and widely distributed access to advanced capabilities, some of which may be misused in ways that current safeguards are not equipped to prevent. In the current catch-up game between closed and open AI development, governance mechanisms remain underdeveloped, and the risks of misuse are growing less theoretical and more immediate.

The goal is not to shut the door on openness, but to ensure that we open it deliberately—with foresight, caution, and credible safeguards.

The steps outlined in this report reflect good starting points on how to get there. Technical safeguards must be designed to withstand real-world deployment—not just perform in idealized conditions. Release decisions must be tied to concrete, testable safety criteria—not aspirational principles alone. Developers, policymakers, and domain experts need coordinated processes and shared accountability—not disconnected efforts that create blind spots and loopholes. And the wider public must be prepared—not only shielded—through stronger institutional capacity, clearer communication, and a better understanding of how these models may reshape critical systems.

Taken together, these investments enable a tiered approach to open-weight release: one that supports safe access where it is possible, and defers release where it is not. The tiered system is not just a technical fix—it is a principled stance. If a model cannot be released safely, it must not be released at all. This is not a failure of openness, but a sign of responsible governance. Even a carefully designed approach may not guarantee perfect safety—but it offers an actionable path to significantly mitigate risk, grounded in today’s knowledge, and designed for the challenges ahead, including those we may not get another chance to prepare for.

Authors

Alex Petropoulos

Advanced AI Researcher – Policy

Bengüsu Özcan

Advanced AI Researcher

Max Reddel

Advanced AI Director

Endnotes

[1] Lomas, N., Dillet, R. and Wiggers K., From LLMs to hallucinations, here’s a simple guide to common AI terms, May 25, 2025, https://techcrunch.com/2025/05/25/from-llms-to-hallucinations-heres-a-simple-guide-to-common-ai-terms/#weights

[2] Isaac, M., What to Know About the Open Versus Closed Software Debate, The New York Times, May 29, 2024, https://www.nytimes.com/2024/05/29/technology/what-to-know-open-closed-software.html

[3]  European Commission, General-purpose AI models in the AI Act – questions and answers, 2024, https://digital-strategy.ec.europa.eu/en/faqs/general-purpose-ai-models-ai-act-questions-answers, (accessed 17 July 2025)

[4] OpenAI, ‘Introducing O3 Mini’, OpenAI Blog, January 31 2025, https://openai.com/index/openai-o3-mini/ 

[5] DeepSeek AI, ‘Announcing DeepSeek R1’, X (formerly Twitter), January 20, 2025, https://x.com/deepseek_ai/status/1881318130334814301, (accessed 17 July 2025)

[6] OpenAI, ‘Introducing OpenAI o3 and o4-mini’, OpenAI Blog, April 16, 2025, https://openai.com/index/introducing-o3-and-o4-mini/

[7]Anthropic, ‘Introducing Claude 4’, Anthropic Blog, May 22, 2025, https://www.anthropic.com/news/claude-4

[8] DeepSeek AI, DeepSeek R1 (0528), Hugging Face, 2025, https://huggingface.co/deepseek-ai/DeepSeek-R1-0528, (accessed 17 July 2025)

[9] “Comparison of free and open-source software licenses,” Wikipedia, https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenses (accessed 17 July 2025)

[10] Open Source AI Initiative, “Open Source AI Definition”, https://opensource.org/ai/open-source-ai-definition#:~:text=An%20Open%20Source%20AI%20is,including%20to%20change%20its%20output, (accessed 17 July 2025)

[11]Pomfret, J., Pang, J. ‘Exclusive: Chinese researchers develop AI model for military use on back of Meta’s Llama’, November 1, 2024, https://www.reuters.com/technology/artificial-intelligence/chinese-researchers-develop-ai-model-military-use-back-metas-llama-2024-11-01/

[12] Cottier, B., You, J., Martemianova, N., & Owen, D., ‘How far behind are open models?’, November 4, 2024, https://epoch.ai/blog/open-models-report#most-notable-ai-models-released-between-2019-to-2023-were-open

[13] Vidal, N., ‘There are no “Degrees of Open”: why Openness is binary’, Open Source Initiative, April 8, 2025, https://opensource.org/blog/there-are-no-degrees-of-open-why-openness-is-binary

[14]Cottier et al. (n 12), https://epoch.ai/blog/open-models-report#degrees-of-accessibility 

[15] Hugging Face, ‘Models – Text Generation’, Hugging Face, https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads (accessed July 17 2025)

[16] European Commission, ‘Commission publishes study on impact of open source on the European economy’, European Commission Digital Strategy, 2021, https://digital-strategy.ec.europa.eu/en/news/commission-publishes-study-impact-open-source-european-economy, (accessed 17 July 2025)

[17] K. Blind and S. Torben, ‘Open source software and global entrepreneurship’, The Journal of Technology Transfer, April 2024, https://link.springer.com/article/10.1007/s10961-023-09993-x,

[18] GitHub, ‘The 2024 State of the Octoverse’, GitHub Annual Report, October 29, 2024, https://github.blog/news-insights/octoverse/octoverse-2024/ 

[19] B. Özcan, ‘Double-Edged Tech: Identifying Dual-Use Concerns in Emerging Technologies’, Centre for Future Generations, 20 January, 2025, https://cfg.eu/double-edged-tech/ 

[20] E. Behrens and B. Özcan, ‘AI Governance Challenges Part 3: Proliferation’, Centre for Future Generations, 18 October, 2024, https://cfg.eu/ai-governance-challenges-part-3-proliferation/ 

[21] Kassianik, P., and Karbasi, A., ‘Evaluating Security Risk in DeepSeek and Other Frontier Reasoning Models’, Cisco Blogs, January 31, 2025, https://blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models

[22] McCauley, C., Yeung, K., Martin, J., and Schulz, K., Novel Universal Bypass for All Major LLMs: The Policy Puppetry Prompt Injection Technique, Hidden Layer, 24 April, 2025, https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/

[23] Zou, A., Zhai, R., Zou, C., Gao, L., Ou, Y., Wei, J., and Raffel, C., On Evaluating the Durability of LLM Guardrails, poster presented at the International Conference on Learning Representations (ICLR), 2024 https://arxiv.org/abs/2404.15881.

[24] Behrens et al. (n 20) https://cfg.eu/ai-governance-challenges-part-3-proliferation

[25]  UK Government, ‘International AI Safety Report 2025’, UK Government Publications, 2025, https://www.gov.uk/government/publications/international-ai-safety-report-2025/international-ai-safety-report-2025 

[26] Heikkilä, M., MIT Technology Review, ‘Why bigger is not always better in AI’, MIT Technology Review, October 1, 2024, https://www.technologyreview.com/2024/10/01/1104744/why-bigger-is-not-always-better-in-ai/ 

[27] Our World in Data, ‘Scaling Up AI’, Our World in Data, 2024, https://ourworldindata.org/scaling-up-ai, (accessed July 17 2025)

[28] TheBloke, ‘Phi-2-GGUF’, Hugging Face, 2024, https://huggingface.co/TheBloke/phi-2-GGUF, (accessed July 17 2025)

[29] DeepSeek AI, ‘About DeepSeek’, X (formerly Twitter), 2025, https://x.com/deepseek_ai, (accessed July 17 2025)  

[30] Mistral AI, ‘About Us’, Mistral AI Website, 2025, https://mistral.ai/about, (accessed July 17 2025)  

[31]European Commission. The General-Purpose AI Code of Practice published by independent experts. European Commission Digital Strategy, July 2025. https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai

[32] Herrero, O., ‘France’s Mistral will sign new EU AI code’, Politico Pro, July 10, 2025, https://pro.politico.eu/news/201744

[33] Wikipedia, ‘Executive Order 14110’, Wikipedia, 2025, https://en.wikipedia.org/wiki/Executive_Order_14110 

[34] National Institute of Standards and Technology, ‘AI Risk Management Framework’, NIST, 2023, https://www.nist.gov/itl/ai-risk-management-framework 

[35] White House, ‘Public Comment Invited on Artificial Intelligence Action Plan’, White House Briefings, February 2025, https://www.whitehouse.gov/briefings-statements/2025/02/public-comment-invited-on-artificial-intelligence-action-plan/ 

[36] Krishnan, S.’, X (formerly Twitter), July 2, 2025, https://x.com/sriramk/status/1940431105347461129(accessed 17 July 2025)

[37] J. Cassidy, The New Yorker, ‘Is DeepSeek China’s Sputnik Moment?’, The New Yorker, February 3, 2025, https://www.newyorker.com/news/the-financial-page/is-deepseek-chinas-sputnik-moment 

[38] China Law Translate, ‘Interim Measures for the Management of Generative AI Services’, China Law Translate, July 10, 2023, https://www.chinalawtranslate.com/en/generative-ai-interim/

[39] UK Government, ‘AI Safety Summit 2023’, UK Government Events, 2023, https://www.gov.uk/government/topical-events/ai-safety-summit-2023 

[40] AI Safety Institute UK, ‘About AISI’, UK Government, 2025, https://www.aisi.gov.uk/ 

[41] UK Parliament, ‘AI Bill’, UK Parliament Bills, 2025, https://bills.parliament.uk/bills/3942 

[42] Élysée, ‘AI Action Summit’, Élysée Website, February 2025, https://www.elysee.fr/en/sommet-pour-l-action-sur-l-ia 

[43] Behrens, E., Özcan, B., ‘Policy Recommendations for the AI Action Summit Paris’, Centre for Future Generations, 2025, https://cfg.eu/wp-content/uploads/cfg-policy-recommendations-for-the-ai-action-summit-paris.pdf 

[44] P. Davies, Euronews, ‘Devoid of any meaning: Why experts call the Paris AI Action Summit a missed opportunity’, Euronews, February 2025, https://www.euronews.com/next/2025/02/14/devoid-of-any-meaning-why-experts-call-the-paris-ai-action-summit-a-missed-opportunity 

[45] Élysée, ‘Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet’, Élysée Website, February 2025, https://www.elysee.fr/en/emmanuel-macron/2025/02/11/statement-on-inclusive-and-sustainable-artificial-intelligence-for-people-and-the-planet 

[46]E. Behrens and B. Özcan, CFG, ‘Establishing AI Risk Thresholds: A Comparative Analysis Across High-Risk Sectors’, Centre for Future Generations, September 1, 2024, https://cfg.eu/establishing-ai-risk-thresholds-a-comparative-analysis-across-high-risk-sectors/ 

[47] Meta AI, ‘Meta Frontier AI Framework’, Meta AI Resources, 2025, https://ai.meta.com/static-resource/meta-frontier-ai-framework/ 

[48] Mistral AI, ‘Guardrailing Framework’, Mistral AI Documentation, 2025, https://docs.mistral.ai/capabilities/guardrailing/ 

[49] M. Alder, FedScoop, ‘OpenAI, Anthropic enter AI agreements with US AI Safety Institute’, FedScoop, August 29, 2024, https://fedscoop.com/openai-anthropic-enter-ai-agreements-with-us-ai-safety-institute/ 

[50] Meta AI, ‘Llama 4: Multimodal Intelligence’, Meta AI Blog, 2025, https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 

[51]Janků, D., Reddel, F., Graabak, J. and Reddel, M., Beyond the AI Hype: A Critical Assessment of AI’s Transformative Potential, Centre for Future Generations,  April 15, 2025, , https://cfg.eu/beyond-the-ai-hype/

[52] De Witte, B., HippoAI, ‘Meta’s Strategy for Open-Sourcing Llama: A Detailed Analysis’, HippoGram 27, 2024, https://blog.hippoai.org/metas-strategy-for-open-sourcing-llama-a-detailed-analysis-hippogram-27/ 

[53] Le Wagon, ‘The European AI Renaissance: Mistral and the Open Source Movement’, Le Wagon Blog, Mar 28, 2025, https://blog.lewagon.com/skills/the-european-ai-renaissance-mistral-and-the-open-source-movement/ 

[54] Pillay, T., Alibaba Model AI China DeepSeek. Time Magazine, March 6, 2025, https://time.com/7265415/alibaba-model-ai-china-deepseek/ 

[55] The Future Media, ‘DeepSeek Expands Open Source AI Strategy with New Code Release’, The Future Media, February 21, 2025, https://thefuturemedia.eu/deepseek-expands-open-source-ai-strategy-with-new-code-release/ 

[56] Mak, R., Reuters, ‘China’s love for open-source AI may shut down fast’, 2 April 2025, https://www.reuters.com/breakingviews/chinas-love-open-source-ai-may-shut-down-fast-2025-04-02/, accessed 11 April 2025.

[57] Goldman, S., Mistral AI CEO Mensch denies IPO rumors, doubles down on open source strategy European champion. Fortune, March 20, 2025, https://fortune.com/2025/03/20/mistral-ai-ceo-mensch-denies-ipo-rumors-doubles-down-on-open-source-strategy-european-champion/ 

[58] Chavez P.,, Center for European Policy Analysis, ‘France Pursues an AI Third Way’, CEPA, February 13, 2025, https://cepa.org/article/france-pursues-an-ai-third-way/ 

[59] Shaw. Z., The National Security Memo on AI: what to expect in Trump 2.0, Institute for Law & AI, January 2025, https://law-ai.org/the-national-security-memo-on-ai-what-to-expect-in-trump-2-0/

[60]Meta AI, ‘Llama 4: Multimodal Intelligence’, April 5, 2025, https://ai.meta.com/blog/llama-4-multimodal-intelligence/

[61]Wikipedia. GPT-2,  https://en.wikipedia.org/wiki/GPT-2, (accessed July 17 2025)

[62] W. Knight, Wired, ‘OpenAI Sam Altman Announce Open Source Model’, Wired, March 31, 2025, https://www.wired.com/story/openai-sam-altman-announce-open-source-model/ 

[63] Anthropic, “Activating AI Safety Level 3 Protections,” Anthropic, May 22, 2025, https://www.anthropic.com/news/activating-asl3-protections

[64] “Artificial Analysis Intelligence Index,” Artificial Analysishttps://artificialanalysis.ai/#artificial-analysis-intelligence-index (accessed July 17, 2025)

[65] OpenAI, Preparedness Framework Version 2, April 15, 2025. https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf (accessed July 17, 2025)

[66] Maxwell, Z., ‘OpenAI delays the release of its open model, again’, TechCrunch, July 11, 2025, https://techcrunch.com/2025/07/11/openai-delays-the-release-of-its-open-model-again/

[67] Janků, D., Reddel, M., Yampolskiy, R. and Hausenloy, J. We Have No Science of Safe AI, Centre for Future Generations, November 19, 2024, https://cfg.eu/we-have-no-science-of-safe-ai/

[68] AI Security Institute UK, ‘How can safety cases be used to help with frontier AI safety’, AISI UK, 2025, https://www.aisi.gov.uk/work/how-can-safety-cases-be-used-to-help-with-frontier-ai-safety 

[69] European Commission, ‘EU AI Office hosts in-person workshop towards defining best practices for systemic risk evaluations’, European Commission Digital Strategy, 28 April 2025, https://digital-strategy.ec.europa.eu/en/events/eu-ai-office-hosts-person-workshop-towards-defining-best-practices-systemic-risk-evaluations (accessed July 17, 2025)

[70] Petropoulos, A., ‘The AI Safety Institute Network: Who, What, and How?’, Centre for Future Generations, September 10, 2024, https://cfg.eu/the-ai-safety-institute-network-who-what-and-how/ 

[71]Rao, S., Technical.ly, ‘NIST layoffs AI safety CHIPS Act Trump’, Technical.ly, February 27, 2025, https://technical.ly/software-development/nist-layoffs-ai-safety-chips-act-trump/ 

[72] National Institute of Standards and Technology, ‘Mission Statement – International Network of AISIs’, NIST Documentation, November 2024, https://www.nist.gov/system/files/documents/2024/11/20/Mission%20Statement%20-%20International%20Network%20of%20AISIs.pdf (accessed July 17, 2025)

[73]National Academies of Sciences, Engineering, and Medicine. Assessing and Navigating Biosecurity Concerns and Benefits of Artificial Intelligence Use in the Life Sciences. NASEM, 2025. https://www.nationalacademies.org/our-work/assessing-and-navigating-biosecurity-concerns-and-benefits-of-artificial-intelligence-use-in-the-life-sciences

[74]Dev. S. et al., Toward Comprehensive Benchmarking of the Biological Knowledge of Frontier Large Language Models, RAND Corporation Working Paper WR-A3797-1, February 10, 2025, https://www.rand.org/pubs/working_papers/WRA3797-1.html

[75]  Organisation for the Prohibition of Chemical Weapons, Scientific Advisory Board, 2025, https://www.opcw.org/about/subsidiary-bodies/scientific-advisory-board, (accessed July 17, 2025)

[76]Pomfret, J., and Pang, J. Reuters, ‘Chinese researchers develop AI model for military use based on Meta’s Llama’, Reuters, November 1, 2024, https://www.reuters.com/technology/artificial-intelligence/chinese-researchers-develop-ai-model-military-use-back-metas-llama-2024-11-01/ 

[77] dbGaP, ‘Database of Genotypes and Phenotypes’, https://www.google.com/search?q=dbgap+access&oq=dbgap+access&gs_lcrp=EgZjaHJvbWUyCQgAEEUYORiABDIICAEQABgWGB4yCAgCEAAYFhgeMggIAxAAGBYYHjIICAQQABgWGB4yCggFEAAYgAQYogQyCggGEAAYgAQYogQyCggHEAAYgAQYogTSAQgzMzI1ajBqOagCALACAQ&sourceid=chrome&ie=UTF-8, (accessed July 17, 2025)

[78] UK Biobank, ‘Enable Your Research: Apply for Access’, UK Biobank, https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access, (accessed July 17, 2025)

[79] The Datasphere, ‘Sandboxes for AI’, The Datasphere, February 11, 2025, https://www.thedatasphere.org/datasphere-publish/sandboxes-for-ai/ 

[80] Harvard University IT, ‘AI Sandbox’, Harvard University, 2025, https://www.huit.harvard.edu/ai-sandbox, (accessed July 17, 2025)

[81] Gnoppix, ‘GitHub Introduces AI Sandbox for Developers to Test Top Models’, Gnoppix, 2024, https://gnoppix.org/github-introduces-ai-sandbox-for-developers-to-test-top-models/ 

[82] Knight, W., A New Trick Could Block the Misuse of Open Source AI. Wired, 2025. https://www.wired.com/story/center-for-ai-safety-open-source-llm-safeguards

[83] Alam, M, Mukhopadhyay, S. & Kundu, S., Deep-Lock: Securely Sharing Machine Learning Models. arXiv, February 18, 2024, https://arxiv.org/pdf/2008.05966

[84]Henshall, W., ‘Researchers Develop New Technique to Wipe Dangerous Knowledge From AI Systems’, TIME, March 6 2024, https://time.com/6878893/ai-artificial-intelligence-dangerous-knowledge/

[85]Nagel, J., Lindegaard, M. & Graabak, J, Strengthening AI Trustworthiness: A Framework for Public Confidence in AI Systems. Centre for Future Generations, May 22, 2025, https://cfg.eu/strengthening-ai-trustworthiness

[86]Ye, H., Guo, J., Liu, Z., Jiang, Y., and Lam, K.-Y., ‘Enhancing AI Safety of Machine Unlearning for Ensembled Models’, Applied Soft Computing, vol. 174 (2025), 113011 https://doi.org/10.1016/j.asoc.2025.113011

[87] X. Qi, B. Wei, N. Carlini, Y. Huang, T. Xie, L. He, M. Jagielski, M. Nasr, P. Mittal and P. Henderson, ‘On Evaluating the Durability of Safeguards for Open-Weight LLMs’, paper presented at the International Conference on Learning Representations (ICLR), 2025, https://openreview.net/forum?id=fXJCqdUSVG

[88] Protect AI, The Platform for AI Security, 2025, https://protectai.com/, (accessed 17 July 2025)

[89]Georges, L. and Morgan, S. ‘Hugging Face Teams Up with Protect AI: Enhancing Model Security for the Community’, Hugging Face Blog, 22 October 2024, https://huggingface.co/blog/protectai (accessed 17 July 2025)

[90]Sigalos, M., ‘Palo Alto Networks to buy Protect AI to boost artificial intelligence tools’, CNBC, 28 April 2025, https://www.cnbc.com/2025/04/28/palo-alto-networks-to-buy-protect-ai-to-boost-artificial-intelligence-tools.html

[91]Sabin, S., ‘Virtue AI scores $30M funding from Lightspeed Venture Partners, Walden Catalyst Ventures’, Axios, 15 April 2025, https://www.axios.com/2025/04/15/virtue-ai-lightspeed-walden-catalyst-funding

[92]EleutherAI, ‘Homepage’, EleutherAI, 2025, https://www.eleuther.ai/ (accessed 17 July 2025)

[93]Nagel et al., (n 85) https://cfg.eu/strengthening-ai-trustworthiness

[94] The AI Digest, ‘Homepage’, The AI Digest,  https://theaidigest.org/ (accessed 17 July 2025)

[95]Tech & Tonic: Mental Health and Wellbeing in the Digital Age, panel event (24 March 2025), https://cfg.eu/event/wired-minds-mental-cracks/

[96] Artificial Intelligence Act – Article 4: Transparency obligations for certain AI systems, ArtificialIntelligenceAct.eu, 2024, https://artificialintelligenceact.eu/article/4/ (accessed 17 July 2025)

[97] Organisation for Economic Co-operation and Development (OECD), AI Incidents Monitor, 2025, https://oecd.ai/en/site/incidents (accessed 17 July 2025)

[98] Department for Science, Innovation and Technology and F. Clark, ‘World-leading AI cyber security standard to protect digital economy and deliver Plan for Change’, GOV.UK, 31 January 2025, https://www.gov.uk/government/news/world-leading-ai-cyber-security-standard-to-protect-digital-economy-and-deliver-plan-for-change

[99] United Nations Office for Disarmament Affairs, UN BioRisk Working Group, 2025, https://disarmament.unoda.org/un-biorisk-working-group/

[100] Organisation for the Prohibition of Chemical Weapons, Scientific Advisory Board

[101] “AI and Chemical Safety and Security Management: Joint OPCW-China Workshop.” Organisation for the Prohibition of Chemical Weapons, 2025, https://www.opcw.org/media-centre/news/2025/06/ai-and-chemical-safety-and-security-management-joint-opcw-china-workshop

Centre for Future Generations
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.