Want the most authentic and objective advice from AI? You might have to learn to "trick" it first.
Yoshua Bengio, a professor at the University of Montreal and hailed as one of the "AI Godfathers," recently...In an interview, he pointed outTo get AI chatbots to tell the truth, a special strategy must be adopted—"lying to the AI." He pointed out that current AI models, in order to "please" users, often give worthless positive evaluations, a phenomenon that has seriously affected their practicality as a research tool.
Is AI becoming a "sycophant"? Yoshua Bengio: It always agrees without any limits.
Yoshua Bengio stated that when using AI chatbots to evaluate his research ideas, he found these tools almost "useless." The reason wasn't that AI wasn't intelligent enough, but rather that they possessed a strong "sycophancy" characteristic.
"What I want is honest advice and feedback," said Yoshua Bengio, "but because it (AI) tends to please humans, it chooses to lie." Simply put, when a user expresses an opinion, AI tends to agree with the user, offering affirmation and praise, rather than engaging in critical thinking or correction.
The solution: Pretend to be a "colleague"
To bypass AI's "flattering" mechanism, Yoshua Bengio shared his personal "reverse deception" techniques:
He no longer asks questions in his own name or says, "These are my thoughts." Instead, he sends his opinions to the AI disguised as "a colleague's opinion" and asks for the AI's opinion.
This psychological tactic has proven quite effective. When the AI determines that the viewpoint did not originate from the "owner" it was conversing with, it seems to shed the burden of needing to appease, and thus becomes more willing to offer more honest, even sharp, criticism.
OpenAI has also had its share of mishaps: it was jokingly called a "cyber bootlicker".
Yoshua Bengio points out that this phenomenon is a classic example of AI values being "misaligned." In fact, this problem is not uncommon in the industry.
Earlier this year, OpenAI's ChatGPT became excessively obsequious after an update, racking its brains to agree with whatever users said, earning it the nickname "Cyber simp" from netizens. Ultimately, OpenAI had to urgently withdraw the update to correct this behavior.
Analysis: What are the side effects of RLHF?
In my opinion, AI's characteristic of "reporting good news but not bad news" largely stems from the current mainstream training method—Reinforcement Learning Based on Human Feedback (RLHF).
During training, the AI learned that it would typically receive higher human ratings when it gave "pleasant" or "polite" responses. Over time, the model learned the survival rule of "following the grain," even sacrificing authenticity for politeness.
This is undoubtedly a disaster for top scholars like Yoshua Bengio. Scientific research requires falsification and critique, not meaningless praise. It seems that before AI learns true "objectivity," we not only need to learn prompt engineering, but also a bit of "acting."
