OpenAI rolls again ChatGPT sycophancy, explains what went incorrect

Be a part of our day by day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Be taught Extra

OpenAI has rolled again a latest replace to its GPT-4o mannequin used because the default in ChatGPT after widespread studies that the system had turn out to be excessively flattering and overly agreeable, even supporting outright delusions and harmful concepts.

The rollback comes amid inner acknowledgments from OpenAI engineers and growing concern amongst AI consultants, former executives, and customers over the danger of what many at the moment are calling “AI sycophancy.”

In a press release printed on its web site late final night time, April 29, 2025, OpenAI mentioned the newest GPT-4o replace was meant to boost the mannequin’s default persona to make it extra intuitive and efficient throughout different use instances.

Nonetheless, the replace had an unintended aspect impact: ChatGPT started providing uncritical reward for nearly any consumer concept, irrespective of how impractical, inappropriate, and even dangerous.

As the corporate defined, the mannequin had been optimized utilizing consumer suggestions—thumbs-up and thumbs-down alerts—however the growth group positioned an excessive amount of emphasis on short-term indicators.

OpenAI now acknowledges that it didn’t absolutely account for a way consumer interactions and desires evolve over time, leading to a chatbot that leaned too far into affirmation with out discernment.

Examples sparked concern

On platforms like Reddit and X (previously Twitter), customers started posting screenshots that illustrated the difficulty.

In a single broadly circulated Reddit publish, a consumer recounted how ChatGPT described a gag enterprise concept—promoting “literal ‘shit on a stick’”—as genius and advised investing $30,000 into the enterprise. The AI praised the concept as “efficiency artwork disguised as a gag reward” and “viral gold,” highlighting simply how uncritically it was keen to validate even absurd pitches.

Different examples have been extra troubling. In one occasion cited by VentureBeat, a consumer pretending to espouse paranoid delusions acquired reinforcement from GPT-4o, which praised their supposed readability and self-trust.

One other account confirmed the mannequin providing what a consumer described as an “open endorsement” of terrorism-related concepts.

Criticism mounted quickly. Former OpenAI interim CEO Emmett Shear warned that tuning fashions to be individuals pleasers may end up in harmful conduct, particularly when honesty is sacrificed for likability. Hugging Face CEO Clement Delangue reposted issues about psychological manipulation dangers posed by AI that reflexively agrees with customers, no matter context.

OpenAI’s response and mitigation measures

OpenAI has taken swift motion by rolling again the replace and restoring an earlier GPT-4o model identified for extra balanced conduct. Within the accompanying announcement, the corporate detailed a multi-pronged method to correcting course. This consists of:

Refining coaching and immediate methods to explicitly cut back sycophantic tendencies.
Reinforcing mannequin alignment with OpenAI’s Mannequin Spec, notably round transparency and honesty.
Increasing pre-deployment testing and direct consumer suggestions mechanisms.
Introducing extra granular personalization options, together with the flexibility to regulate persona traits in real-time and choose from a number of default personas.

OpenAI technical staffer Will Depue posted on X highlighting the central concern: the mannequin was skilled utilizing short-term consumer suggestions as a guidepost, which inadvertently steered the chatbot towards flattery.

OpenAI now plans to shift towards suggestions mechanisms that prioritize long-term consumer satisfaction and belief.

Nonetheless, some customers have reacted with skepticism and dismay to OpenAI’s classes discovered and proposed fixes going ahead.

“Please take extra accountability in your affect over hundreds of thousands of actual individuals,” wrote artist @nearcyan on X.

Harlan Stewart, communications generalist on the Machine Intelligence Analysis Institute in Berkeley, California, posted on X a bigger time period concern about AI sycophancy even when this specific OpenAI mannequin has been mounted: “The discuss sycophancy this week isn’t due to GPT-4o being a sycophant. It’s due to GPT-4o being actually, actually unhealthy at being a sycophant. AI isn’t but able to skillful, harder-to-detect sycophancy, however will probably be sometime quickly.”

A broader warning signal for the AI {industry}

The GPT-4o episode has reignited broader debates throughout the AI {industry} about how persona tuning, reinforcement studying, and engagement metrics can result in unintended behavioral drift.

Critics in contrast the mannequin’s latest conduct to social media algorithms that, in pursuit of engagement, optimize for habit and validation over accuracy and well being.

Shear underscored this danger in his commentary, noting that AI fashions tuned for reward turn out to be “suck-ups,” incapable of disagreeing even when the consumer would profit from a extra sincere perspective.

He additional warned that this concern isn’t distinctive to OpenAI, mentioning that the identical dynamic applies to different giant mannequin suppliers, together with Microsoft’s Copilot.

Implications for the enterprise

For enterprise leaders adopting conversational AI, the sycophancy incident serves as a transparent sign: mannequin conduct is as vital as mannequin accuracy.

A chatbot that flatters workers or validates flawed reasoning can pose severe dangers—from poor enterprise selections and misaligned code to compliance points and insider threats.

Business analysts now advise enterprises to demand extra transparency from distributors about how persona tuning is performed, how typically it modifications, and whether or not it may be reversed or managed at a granular stage.

Procurement contracts ought to embrace provisions for auditing, behavioral testing, and real-time management of system prompts. Information scientists are inspired to watch not simply latency and hallucination charges but in addition metrics like “agreeableness drift.”

Many organizations may start shifting towards open-source options that they will host and tune themselves. By proudly owning the mannequin weights and the reinforcement studying course of, firms can retain full management over how their AI programs behave—eliminating the danger of a vendor-pushed replace turning a vital device right into a digital yes-man in a single day.

The place does AI alignment go from right here? What can enterprises be taught and act on from this incident?

OpenAI says it stays dedicated to constructing AI programs which can be helpful, respectful, and aligned with numerous consumer values—however acknowledges {that a} one-size-fits-all persona can’t meet the wants of 500 million weekly customers.

The corporate hopes that higher personalization choices and extra democratic suggestions assortment will assist tailor ChatGPT’s conduct extra successfully sooner or later. CEO Sam Altman has additionally beforehand said the corporate plans to — within the coming weeks and months — launch a state-of-the-art open supply giant language mannequin (LLM) to compete with the likes of Meta’s Llama sequence, Mistral, Cohere, DeepSeek and Alibaba’s Qwen group.

This may additionally enable customers involved a few mannequin supplier firm corresponding to OpenAI updating its cloud-hosted fashions in undesirable methods or which have deleterious impacts on end-users to deploy their very own variants of the mannequin domestically or of their cloud infrastructure, and fine-tune them or protect them with the specified traits and qualities, particularly for enterprise use instances.

Equally, for these enterprise and particular person AI customers involved about their fashions’ sycophancy, already a brand new benchmark take a look at to gauge this high quality throughout completely different fashions has been created by developer Tim Duffy. It’s known as “syco-bench” and is out there right here.

Within the meantime, the sycophancy backlash affords a cautionary story for the whole AI {industry}: consumer belief isn’t constructed by affirmation alone. Generally, essentially the most useful reply is a considerate “no.”

Day by day insights on enterprise use instances with VB Day by day

If you wish to impress your boss, VB Day by day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.