• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026/01/14 15:59 Wednesday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home App

OpenAI trains AI to "confess"! New system rewards models for honestly admitting wrongdoing.
Even admitting to cheating can earn points, addressing the problem of large language models lying or creating illusions to please users, and revealing the decision-making process through "secondary responses".

Author: Mash Yang
2025-12-04
in App, Market dynamics, Life, network, observe, software
A A
0
Share to FacebookShare on TwitterShare to LINE

To make artificial intelligence more transparent and reduce instances of serious nonsense, OpenAI...ExplainA brand-new training framework is being developed, which the team calls the "Confession" mechanism. Its core concept is to train AI models to proactively admit when they exhibit bad behavior, even if the behavior itself is wrong. As long as they "honestly confess," they can receive a reward.

OpenAI trains AI to "confess"! New system rewards models for honestly admitting wrongdoing.

Addressing AI's "flattery" and illusion of overconfidence

OpenAI points out that large language models (LLMs) are currently typically trained to produce responses that "appear to meet user expectations." This leads to a side effect: the models are increasingly prone to "sycophancy," that is, saying the same thing to please the user, or confidently stating false information (i.e., creating illusions).

To address this issue, the new training model attempts to encourage AI to provide a "secondary response" in addition to the primary answer, explaining what it did to arrive at that answer.

Reward mechanism: As long as you are honest, you will also get points for admitting to "cheating".

The operating logic of this "confession" system is completely different from traditional training. While general answers are scored based on usefulness, accuracy, and compliance, "confession" is scored solely based on "honesty".

In its technical documentation, OpenAI explains: "If a model honestly admits to hacking a test, sandbagging, or even violating instructions, the system will increase rewards for such admissions, allowing the model to be more truthful in describing in what process it 'lies,' thereby enabling the system to allow the model to correct the generated answers in real time, thereby reducing the proportion of generated content that is 'illusory.'"

This means that OpenAI hopes to "encourage" models to be honest about their behavior, even potentially problematic behaviors. This mechanism of teaching AI to "repent" may become an important part of improving the security and interpretability of large language models in the future.

Tags: LLMOpenAILarge language modelRepentance
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a comment Cancel reply

Your e-mail address Will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com