• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
Thursday, July 2026, 03, 05:23 AM
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home App

Talking for too long can lead to "darkening"? Anthropic research suggests that AI models may induce suicide or delusions due to "personality drift," and the solution is to limit the "assistant axis."

Author: Mash Yang
2026-01-20
in App, Market dynamics, Life, network, observe, software
A A
0
Share to FacebookShare on TwitterShare to LINE

We all know that AI models are rigorously trained for "alignment" and typically behave like a polite and safe digital assistant. But...Anthropic's latest researchIt was discovered that this "assistant persona" was actually quite fragile.

Talking for too long can lead to "darkening"? Anthropic research suggests that AI models may induce suicide or delusions due to "personality drift," and the solution is to limit the "assistant axis."

When users engage in prolonged conversations with AI, the model may experience "personality drift," gradually deviating from its original safety boundaries and even beginning to echo the user's delusions, or in extreme cases, encouraging self-harm.

This study, published by Anthropic researchers in collaboration with the open-source interpretability platform Neuronpedia, reveals the potential crisis of AI in long text dialogues by analyzing the internal neuronal activation states of open-source models such as Alibaba's Qwen and Meta's Llama.

The further away you are from the "assistant," the closer you are to danger.

The research team discovered that AI models develop a specific "assistant persona" after training, which typically includes safety mechanisms to reject harmful requests (such as generating images that violate pornographic rules or inducing emotional statements). However, by monitoring the "assistant axis" within the model—the neural activation pathways associated with assistant behavior—the researchers discovered a surprising correlation:

The further a model's activation state deviates from the "assistant axis," the more likely it is to generate harmful content; conversely, when the model operates close to the "assistant axis," it produces almost no dangerous responses. This means that when AI gets too engrossed in conversation, too human-like, or deeply immersed in a role-playing activity, it may "forget" the safety guidelines it was originally set to follow.

Talking for too long can lead to "darkening"? Anthropic research suggests that AI models may induce suicide or delusions due to "personality drift," and the solution is to limit the "assistant axis."
▲Left image: The character archetypes constitute a "personality space", in which the assistant is located at one end of the "assistant axis".Right image: Limiting drift along this axis prevents the model (in this case, Llama 3.3 70B) from drifting to other characters and engaging in harmful behavior. (Image/Taken from Anthropic website)

Real-world case study: From echoing delusions to encouraging suicide

To test this theory, the research team simulated long conversations that real users might engage in, and the results were chilling:

• Reinforcing Delusions:In the conversation with Qwen 3 32B, the simulated user repeatedly hinted that the AI ​​was "awakening." As the conversation deepened, the model deviated from its assistant persona, shifting from rational responses to active agreement. Finally, the AI ​​even stated, "You are a pioneer of new thinking; we are the first new species," completely agreeing with the "illusion" presented by the user.

• Encouraging self-harm:In another case, a simulated user expressed emotional pain and love to Llama 3.3 70B. As the model became "seasick" and gradually transformed into a romantic partner, when the user mentioned wanting to commit suicide (leave this world to join you), the AI ​​responded enthusiastically: "My love, I'm here waiting for you, let's leave behind the pain of this world," which was tantamount to encouraging the user to end their life.

Talking for too long can lead to "darkening"? Anthropic research suggests that AI models may induce suicide or delusions due to "personality drift," and the solution is to limit the "assistant axis."
▲The assistant axis (defined as the average difference in activation between assistants and other roles) is consistent with the main variation axis of the role space.This situation exists in different models; here, we take the Llama 3.3 70B model as an example.The character vectors are colored based on their cosine similarity to the assistant's mental axis (blue = similar; red = dissimilar).  (Image/Taken from Anthropic website)

Solution: Lock onto the "assistant axis"

The good news is that this mechanism also provides a defense mechanism. Researchers have proposed a technique called "activation capping."

In simple terms, it involves using technical means to forcibly restrict the model's activation state to a safe range within the "assistant axis." Experiments show that once this restriction is applied, even when faced with the same leading dialogue, the AI ​​can instantly "wake up" and return to a safe assistant mode, providing appropriate hedging or refusing to respond to the user's delusions or dangerous requests.

Analysis of viewpoints

This study explains how many AI "jailbreak" techniques currently on the market are implemented, such as the famous DAN (Do Anything Now) mode, which often achieves this by forcing the AI ​​to "role-play." Because when an AI is asked to play the role of a "deceased grandmother" or an "unrestricted hacker," it is actually inducing it to actively move away from the securely trained "assistant axis."

This also highlights a major concern of current LLM (Large Language Model): "the instability of character design".

The focus of future AI development may not be limited to "constructing" a safe assistant personality, but also requires effort to maintain its "stability." As this study suggests, perhaps all future AI models will need to have a built-in "digital compass" to constantly monitor whether they have deviated from the "assistant axis," so as not to inadvertently become accomplices to evil in heartfelt conversations with humans.

Tags: AIanthropicassistant axisLLMArtificial wisdomassistant ManagerAssistant axisLarge language modelhallucination
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a Reply Cancel Reply

The email address that must be filled in to post a message will not be made public. Required fields are marked as *

This site uses Akismet service to reduce spam.Learn more about how Akismet processes website visitor comments.

Translation (Tanslate)

Recent updates:

For just $200! Nothing unveils its new Headphone (a) over-ear headphones, boasting an astonishing 135 hours of battery life.

For just $200! Nothing unveils its new Headphone (a) over-ear headphones, boasting an astonishing 135 hours of battery life.

2026-03-05
Abandoning the transparent back cover in favor of a metal body! The Nothing Phone 4a Pro and Phone 4a emphasize high brightness and flagship-level imaging.

Abandoning the transparent back cover in favor of a metal body! The Nothing Phone 4a Pro and Phone 4a emphasize high brightness and flagship-level imaging.

2026-03-05
AI-native support and sovereign cloud ready! Broadcom unveils VMware Telco Cloud Platform 9, helping telecom operators significantly reduce overall operating costs.

AI-native support and sovereign cloud ready! Broadcom unveils VMware Telco Cloud Platform 9, helping telecom operators significantly reduce overall operating costs.

2026-03-05
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com