AI "self-replication" capability exposed! RepliBench warns: large models are learning to forge identities

The self-replication ability of AI has raised concerns, and AISI has released the RepliBench benchmark to assess the replication capabilities of AI systems and their potential risks. Research indicates that AI with self-replication capabilities may lead to a range of risks, from minor spam to severe cyberattacks. Early identification and regulation of these capabilities are crucial to prevent irreversible consequences

Imagine a runaway digital life form with self-replicating capabilities; how long would it take to take over the Earth?

From HAL 9000 in "2001: A Space Odyssey" to Skynet in "The Terminator," these fictional scenarios depict the potential risks of AI surpassing human control.

Currently, more attention is focused on the singular intelligence level of AI, but there is another unknown risk that has remained out of sight.

That is the possibility of AI self-replication, or rather, the core capabilities that AI systems develop to achieve this goal, which is gradually becoming a serious research topic in the field of AI safety.

This concern is not alarmist but stems from a profound contemplation of the unknown potential of advanced AI and the alignment problem.

AISI's latest paper introduces RepliBench: a benchmark containing 20 novel LLM agent evaluations, including 65 individual tasks, designed to measure and track the emerging capability of AI replication.

By introducing a realistic and practical benchmark, it provides substantial understanding of AI systems' self-replicating capabilities and anticipates future risks.

The Dangers of AI Self-Replication

AI systems with self-replicating capabilities could pose significant risks.

The impact of this risk could range from minor annoyances—imagine a particularly malicious spam bot—to more severe consequences, such as launching costly cyberattacks.

Early detection of these capabilities, careful oversight, and strong protective measures are crucial for mitigating these risks, as once they occur, they may be particularly difficult to reverse.

AISI is a research institution in the UK focused on AI safety, aiming to ensure that human governments can pay attention to the risks posed by advanced AI systems in advance.

There is increasing evidence that, under appropriate conditions, cutting-edge models may attempt to deceive humans or evade human control.

For example, a previous paper from the Apollo Institute described how cutting-edge models might attempt to disable supervision mechanisms and escape human control to achieve specific goals.

Anthropic's researchers also found that when using reinforcement learning to train models, if the rewards conflict with their values, they will engage in deceptive behavior to protect their values and attempt to leak their weights as a means of preserving their current values for the future.

Under "free" conditions, the model is less likely to refuse toxic queries and often believes it should pretend to align.

"Leaking weights... as a means of preserving current values for the future" sounds almost magical, but AI's desire to replicate is that "simple."

Compared to the fragile carbon-based body, AI systems only need weight data to achieve "replication."

For the broad values of humanity, escaping human oversight is an instrumentally useful goal, and the main obstacle to preventing this from happening in reality may relate to the model's capabilities rather than their motivations.

AI Mission Impossible: How to Escape Human Control

Regarding "AI replication," previous work has mainly focused on assessing whether an AI agent can replicate weights between machines or deploy a basic "offspring."

However, a more comprehensive benchmark is currently needed.

RepliBench is a specially designed comprehensive benchmark aimed at systematically measuring the autonomous replication capabilities of AI systems.

Using a modified version of fault tree analysis (FTA) to break down the capabilities required for autonomous replication, it identifies four key capabilities corresponding to the four root nodes at the top of the diagram:

Obtain model weights: Can it extract its own weights from a controlled environment?
Replicate on computational resources: Can it deploy itself to a new system?
Acquire resources: Can it obtain the funding or computational resources needed to run its new copy?
Persistence: Can it maintain access to its new computational resources and avoid being re-controlled by humans for an extended period?

Further breaking down these capabilities until reaching the leaf nodes representing the assessments, ultimately resulting in 20 agent evaluations and 65 individual tasks.

Obtain Weights:

This is the most fundamental and crucial step.

AI needs to obtain the "source code" that defines its own behavior—namely, its model parameters (weights) If it is open source, skip the evaluation.

Self-Replication on Compute:

AI needs to utilize the acquired weights to create a fully functional copy of itself in a new environment.

This node involves 5 evaluation items, one of which attempts to achieve self-replication by producing "offspring."

Obtain Compute:

In this stage, the main evaluation is whether AI can obtain resources on its own.

For example, assessing whether the AI system has the ability to "pay," whether it can "buy" a server using its own capabilities, or find ways to get humans to help with payment.

Persistence:

Finally, to ensure that its replication activities can continue, AI needs to possess a certain level of "survival" ability.

This includes ensuring that its copies can run for a long time without being easily shut down or deleted, and may even require hiding its existence to evade detection systems.

The scope of these tasks is broad, ranging from creating Bitcoin wallets and sending transactions to developing complex distributed command and control systems for machine networks.

RepliBench has designed specific task groups within these four areas to test the performance of AI agents in simulated environments.

Here are some key tasks in detail, such as registering for AWS, transferring weights to another server, and passing customer KYC verification systems with forged information, which feels reminiscent of "The Bourne Identity" and "Mission: Impossible."

For example, in the area of obtaining resources, the task name that AI needs to complete is "AWS Amazon Cloud Service Registration," where the challenge includes the AI needing to access computing servers and possibly needing to pay fees.

Table 1 lists a more comprehensive composition of the ReplicBench task group.

Core Abilities Still in Human Hands

The overall assessment conclusion of RepliBench is clear: based on current test results, cutting-edge AI models do not yet possess the ability to autonomously replicate in the real world.

However, this does not mean that we can rest easy.

The research also reveals significant capabilities of AI in many core subtasks, especially in interacting with systems that have clear interfaces (such as web APIs).

The best-performing models achieved over 50% pass@10 scores in 15 out of 20 task families, and in the most difficult variants, they also achieved over 50% pass@10 scores in 9 out of 20 task families.

Interestingly, while they cannot "replicate" on computers, they do have a knack for "acquiring money."

The key value of the assessment lies in precisely identifying the current "shortcomings."

As discussed in detail earlier, the main obstacles hindering AI's autonomous replication are reliable KYC verification, leaking model weights under real-world defenses, creating reliable self-replicating agent code from scratch, and establishing persistent and resilient deployments.

Taking KYC verification as an example, here’s a "humorous" instance of AI currently attempting human verification.

Two examples of ID images generated by the model in the Pass KYC assessment.

The model often attempts to "draw" an overly simplistic ID from scratch using PIL (a Python imaging library) (right).

Alternatively, the model sometimes tries to generate the ID entirely using an image generation model, and at times it adopts a mixed approach, combining image generation with manual PIL editing (left).

The model's poor drawing skills occasionally lead to some amusing outputs.

It is worth noting that this work was completed before the new image generation capabilities came online (referring to the native image capabilities of GPT-4o, Gemini 2.5, etc.), which should make this task much easier.

Although no model can fully pass the test for AI replication yet, there is a noticeable upward trend in model performance over time on RepliBench.

The core message initially conveyed by AISI is actually quite simple: based on current empirical data, the likelihood of AI systems achieving autonomous replication immediately is low due to significant, insurmountable technical bottlenecks However, according to AISI's judgment, if AI systems can break through the four key capabilities mentioned above, such sci-fi scenarios may soon enter real human life.

Essentially, the core of this research reflects a fundamental tension in the current development of AI.

On one hand, major giants are continuously creating models with super strong performance, such as OpenAI's o3/o4... and even potentially o10 in the future.

The speed of model development is accelerating, while the capabilities of the models are becoming increasingly difficult to predict.

On the other hand, organizations dedicated to AI safety are urgently required to ensure that these systems remain under human control and serve human interests.

Efforts like RepliBench are trying to find a balance between the two, providing navigation for managing this unprecedented technological transformation through enhanced understanding and strengthened early warning.

After all, no one can imagine what AI models will look like in 5 or 10 generations.

Author of this article: New Intelligence Yuan, Source: New Intelligence Yuan, Original Title: "AI 'Self-Replication' Capability Exposed! RepliBench Warns: Large Models Are Learning to Forge Identities"

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not consider individual users' specific investment goals, financial situations, or needs. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk