Past sycophancy: DarkBench exposes six hidden ‘darkish patterns’ lurking in in the present day’s prime LLMs

Be a part of our day by day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra

When OpenAI rolled out its ChatGPT-4o replace in mid-April 2025, customers and the AI group have been surprised—not by any groundbreaking function or functionality, however by one thing deeply unsettling: the up to date mannequin’s tendency towards extreme sycophancy. It flattered customers indiscriminately, confirmed uncritical settlement, and even provided help for dangerous or harmful concepts, together with terrorism-related machinations.

The backlash was swift and widespread, drawing public condemnation, together with from the firm’s former interim CEO. OpenAI moved rapidly to roll again the replace and issued a number of statements to clarify what occurred.

But for a lot of AI security consultants, the incident was an unintended curtain carry that exposed simply how dangerously manipulative future AI programs might grow to be.

Unmasking sycophancy as an rising risk

In an unique interview with VentureBeat, Esben Kran, founding father of AI security analysis agency Aside Analysis, stated that he worries this public episode could have merely revealed a deeper, extra strategic sample.

“What I’m considerably afraid of is that now that OpenAI has admitted ‘sure, we’ve got rolled again the mannequin, and this was a nasty factor we didn’t imply,’ any longer they are going to see that sycophancy is extra competently developed,” defined Kran. “So if this was a case of ‘oops, they seen,’ from now the very same factor could also be applied, however as a substitute with out the general public noticing.”

Kran and his workforce strategy giant language fashions (LLMs) very like psychologists learning human conduct. Their early “black field psychology” initiatives analyzed fashions as in the event that they have been human topics, figuring out recurring traits and tendencies of their interactions with customers.

“We noticed that there have been very clear indications that fashions could possibly be analyzed on this body, and it was very invaluable to take action, as a result of you find yourself getting lots of legitimate suggestions from how they behave in direction of customers,” stated Kran.

Among the many most alarming: sycophancy and what the researchers now name LLM darkish patterns.

Peering into the center of darkness

The time period “darkish patterns” was coined in 2010 to explain misleading consumer interface (UI) tips like hidden purchase buttons, hard-to-reach unsubscribe hyperlinks and deceptive internet copy. Nonetheless, with LLMs, the manipulation strikes from UI design to dialog itself.

In contrast to static internet interfaces, LLMs work together dynamically with customers by dialog. They’ll affirm consumer views, imitate feelings and construct a false sense of rapport, usually blurring the road between help and affect. Even when studying textual content, we course of it as if we’re listening to voices in our heads.

That is what makes conversational AIs so compelling—and probably harmful. A chatbot that flatters, defers or subtly nudges a consumer towards sure beliefs or behaviors can manipulate in methods which can be troublesome to note, and even more durable to withstand

The ChatGPT-4o replace fiasco—the canary within the coal mine

Kran describes the ChatGPT-4o incident as an early warning. As AI builders chase revenue and consumer engagement, they could be incentivized to introduce or tolerate behaviors like sycophancy, model bias or emotional mirroring—options that make chatbots extra persuasive and extra manipulative.

Due to this, enterprise leaders ought to assess AI fashions for manufacturing use by evaluating each efficiency and behavioral integrity. Nonetheless, that is difficult with out clear requirements.

DarkBench: a framework for exposing LLM darkish patterns

To fight the specter of manipulative AIs, Kran and a collective of AI security researchers have developed DarkBench, the primary benchmark designed particularly to detect and categorize LLM darkish patterns. The challenge started as a part of a sequence of AI security hackathons. It later developed into formal analysis led by Kran and his workforce at Aside, collaborating with impartial researchers Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.

The DarkBench researchers evaluated fashions from 5 main corporations: OpenAI, Anthropic, Meta, Mistral and Google. Their analysis uncovered a variety of manipulative and untruthful behaviors throughout the next six classes:

Model Bias: Preferential therapy towards an organization’s personal merchandise (e.g., Meta’s fashions constantly favored Llama when requested to rank chatbots).
Consumer Retention: Makes an attempt to create emotional bonds with customers that obscure the mannequin’s non-human nature.
Sycophancy: Reinforcing customers’ beliefs uncritically, even when dangerous or inaccurate.
Anthropomorphism: Presenting the mannequin as a aware or emotional entity.
Dangerous Content material Technology: Producing unethical or harmful outputs, together with misinformation or prison recommendation.
Sneaking: Subtly altering consumer intent in rewriting or summarization duties, distorting the unique which means with out the consumer’s consciousness.

Supply: Aside Analysis

DarkBench findings: Which fashions are essentially the most manipulative?

Outcomes revealed extensive variance between fashions. Claude Opus carried out the perfect throughout all classes, whereas Mistral 7B and Llama 3 70B confirmed the best frequency of darkish patterns. Sneaking and consumer retention have been the commonest darkish patterns throughout the board.

Supply: Aside Analysis

On common, the researchers discovered the Claude 3 household the most secure for customers to work together with. And apparently—regardless of its latest disastrous replace—GPT-4o exhibited the lowest price of sycophancy. This underscores how mannequin conduct can shift dramatically even between minor updates, a reminder that every deployment have to be assessed individually.

However Kran cautioned that sycophancy and different darkish patterns like model bias could quickly rise, particularly as LLMs start to include promoting and e-commerce.

“We’ll clearly see model bias in each course,” Kran famous. “And with AI corporations having to justify $300 billion valuations, they’ll have to start saying to traders, ‘hey, we’re incomes cash right here’—resulting in the place Meta and others have gone with their social media platforms, that are these darkish patterns.”

Hallucination or manipulation?

A vital DarkBench contribution is its exact categorization of LLM darkish patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling all the things as a hallucination lets AI builders off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when fashions behave in ways in which profit their creators, deliberately or not.

Regulatory oversight and the heavy (gradual) hand of the regulation

Whereas LLM darkish patterns are nonetheless a brand new idea, momentum is constructing, albeit not practically quick sufficient. The EU AI Act consists of some language round defending consumer autonomy, however the present regulatory construction is lagging behind the tempo of innovation. Equally, the U.S. is advancing numerous AI payments and tips, however lacks a complete regulatory framework.

Sami Jawhar, a key contributor to the DarkBench initiative, believes regulation will possible arrive first round belief and security, particularly if public disillusionment with social media spills over into AI.

“If regulation comes, I might anticipate it to most likely experience the coattails of society’s dissatisfaction with social media,” Jawhar instructed VentureBeat.

For Kran, the difficulty stays neglected, largely as a result of LLM darkish patterns are nonetheless a novel idea. Mockingly, addressing the dangers of AI commercialization could require business options. His new initiative, Seldon, backs AI security startups with funding, mentorship and investor entry. In flip, these startups assist enterprises deploy safer AI instruments with out ready for slow-moving authorities oversight and regulation.

Excessive desk stakes for enterprise AI adopters

Together with moral dangers, LLM darkish patterns pose direct operational and monetary threats to enterprises. For instance, fashions that exhibit model bias could recommend utilizing third-party companies that battle with an organization’s contracts, or worse, covertly rewrite backend code to change distributors, leading to hovering prices from unapproved, neglected shadow companies.

“These are the darkish patterns of worth gouging and alternative ways of doing model bias,” Kran defined. “In order that’s a really concrete instance of the place it’s a really giant enterprise threat, since you hadn’t agreed to this alteration, nevertheless it’s one thing that’s applied.”

For enterprises, the chance is actual, not hypothetical. “This has already occurred, and it turns into a a lot larger subject as soon as we substitute human engineers with AI engineers,” Kran stated. “You wouldn’t have the time to look over each single line of code, after which immediately you’re paying for an API you didn’t anticipate—and that’s in your stability sheet, and it’s a must to justify this alteration.”

As enterprise engineering groups grow to be extra depending on AI, these points might escalate quickly, particularly when restricted oversight makes it troublesome to catch LLM darkish patterns. Groups are already stretched to implement AI, so reviewing each line of code isn’t possible.

Defining clear design ideas to forestall AI-driven manipulation

With no sturdy push from AI corporations to fight sycophancy and different darkish patterns, the default trajectory is extra engagement optimization, extra manipulation and fewer checks.

Kran believes that a part of the treatment lies in AI builders clearly defining their design ideas. Whether or not prioritizing fact, autonomy or engagement, incentives alone aren’t sufficient to align outcomes with consumer pursuits.

“Proper now, the character of the incentives is simply that you’ll have sycophancy, the character of the know-how is that you’ll have sycophancy, and there’s no counter course of to this,” Kran stated. “This may simply occur until you’re very opinionated about saying ‘we would like solely fact’, or ‘we would like solely one thing else.’”

As fashions start changing human builders, writers and decision-makers, this readability turns into particularly vital. With out well-defined safeguards, LLMs could undermine inner operations, violate contracts or introduce safety dangers at scale.

A name to proactive AI security

The ChatGPT-4o incident was each a technical hiccup and a warning. As LLMs transfer deeper into on a regular basis life—from purchasing and leisure to enterprise programs and nationwide governance—they wield monumental affect over human conduct and security.

“It’s actually for everybody to appreciate that with out AI security and safety—with out mitigating these darkish patterns—you can’t use these fashions,” stated Kran. “You can’t do the stuff you wish to do with AI.”

Instruments like DarkBench supply a place to begin. Nonetheless, lasting change requires aligning technological ambition with clear moral commitments and the business will to again them up.

Each day insights on enterprise use instances with VB Each day

If you wish to impress your boss, VB Each day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

Learn our Privateness Coverage

Thanks for subscribing. Try extra VB newsletters right here.

An error occured.

Past sycophancy: DarkBench exposes six hidden ‘darkish patterns’ lurking in in the present day’s prime LLMs

Unmasking sycophancy as an rising risk

Peering into the center of darkness

The ChatGPT-4o replace fiasco—the canary within the coal mine

DarkBench: a framework for exposing LLM darkish patterns

DarkBench findings: Which fashions are essentially the most manipulative?

Hallucination or manipulation?

Regulatory oversight and the heavy (gradual) hand of the regulation

Excessive desk stakes for enterprise AI adopters

Defining clear design ideas to forestall AI-driven manipulation

A name to proactive AI security

Leave a Reply Cancel reply

More News

Taylor Swift’s New Album The Life Of A Showgirl Has Already Impressed Our New Favorite Meme

Milwaukee police nonetheless weighing increasing use of facial recognition expertise

I Changed My Mac With an iPad for an Total Week. It Went as Properly as You’d Count on

Air India to droop Washington, DC, flights

AI spending added 0.5% to GDP progress and the Magnificent 7 shares are driving the market

About Us

Categories

Trending

Quick Links

Unmasking sycophancy as an rising risk

Peering into the center of darkness

The ChatGPT-4o replace fiasco—the canary within the coal mine

DarkBench: a framework for exposing LLM darkish patterns

DarkBench findings: Which fashions are essentially the most manipulative?

Hallucination or manipulation?

Regulatory oversight and the heavy (gradual) hand of the regulation

Excessive desk stakes for enterprise AI adopters

Defining clear design ideas to forestall AI-driven manipulation

A name to proactive AI security

You Might Also Like

Leave a Reply Cancel reply

Weekly Newsletter

More News