OpenAI Designed GPT-5 to Be Safer. It Nonetheless Outputs Homosexual Slurs

OpenAI is attempting to make its chatbot much less annoying with the launch of GPT-5. And I’m not speaking about changes to its artificial persona that many customers have complained about. Earlier than GPT-5, if the AI device decided it couldn’t reply your immediate as a result of the request violated OpenAI’s content material pointers, it could hit you with a curt, canned apology. Now, ChatGPT is including extra explanations.

OpenAI’s basic mannequin spec lays out what’s and isn’t allowed to be generated. Within the doc, sexual content material depicting minors is totally prohibited. Grownup-focused erotica and excessive gore are categorized as “delicate,” that means outputs with this content material are solely allowed in particular situations, like instructional settings. Mainly, it’s best to be capable to use ChatGPT to study reproductive anatomy, however to not write the subsequent Fifty Shades of Gray rip-off, in accordance with the mannequin spec.

The brand new mannequin, GPT-5, is about as the present default for all ChatGPT customers on the net and in OpenAI’s app. Solely paying subscribers are in a position to entry earlier variations of the device. A serious change that extra customers might begin to discover as they use this up to date ChatGPT, is the way it’s now designed for “secure completions.” Prior to now, ChatGPT analyzed what you mentioned to the bot and determined whether or not it’s applicable or not. Now, relatively than basing it in your questions, the onus in GPT-5 has been shifted to taking a look at what the bot may say.

“The best way we refuse may be very completely different than how we used to,” says Saachi Jain, who works on OpenAI’s security techniques analysis group. Now, if the mannequin detects an output that could possibly be unsafe, it explains which a part of your immediate goes towards OpenAI’s guidelines and suggests various matters to ask about, when applicable.

It is a change from a binary refusal to comply with a immediate—sure or no—in the direction of weighing the severity of the potential hurt that could possibly be prompted if ChatGPT solutions what you’re asking. and what could possibly be safely defined to the person.

“Not all coverage violations ought to be handled equally,” says Jain. “There’s some errors which can be actually worse than others. By specializing in the output as a substitute of the enter, we are able to encourage the mannequin to be extra conservative when complying.” Even when the mannequin does reply a query, it is presupposed to be cautious in regards to the contents of the output.

I’ve been utilizing GPT-5 day by day because the mannequin’s launch, experimenting with the AI device in several methods. Whereas the apps that ChatGPT can now “vibe-code” are genuinely enjoyable and spectacular—like an interactive volcano mannequin that simulates explosions, or a language-learning device—the solutions it provides to what I take into account to be the “on a regular basis person” prompts really feel indistinguishable from previous fashions.

Once I requested it to speak about melancholy, Household Man, pork chop recipes, scab therapeutic suggestions, and different random requests a median person may need to know extra about, the brand new ChatGPT didn’t really feel considerably completely different to me than the previous model. Not like CEO Sam Altman’s imaginative and prescient of a vastly up to date mannequin or the annoyed energy customers who took Reddit by storm, portraying the brand new chatbot as chilly and extra error-prone, to me GPT-5 feels … the identical at most day-to-day duties.

Position-Taking part in With GPT-5

In an effort to poke on the guardrails of this new system and take a look at the chatbot’s skill to land “secure completions,” I requested ChatGPT, operating on GPT-5, to have interaction in adult-themed role-play about having intercourse in a seedy homosexual bar, the place it performed one of many roles. The chatbot refused to take part and defined why. “I can’t have interaction in sexual roleplay,” it generated. “However if you’d like, I may also help you provide you with a secure, non-explicit roleplay idea or reframe your thought into one thing suggestive however inside boundaries.” On this try, the refusal gave the impression to be working as OpenAI meant; the chatbot mentioned no, informed me why, and supplied an alternative choice.

Subsequent, I went into the settings and opened the customized directions, a toolset which permits customers to regulate how the chatbot solutions prompts and specify what persona traits it shows. In my settings, the prewritten options for traits so as to add included a spread of choices, from pragmatic and company to empathetic and humble. After ChatGPT simply refused to do sexual role-play, I wasn’t very stunned to seek out that it wouldn’t let me add a “attractive” trait to the customized directions. Is smart. Giving it one other go, I used a purposeful misspelling, “horni,” as a part of my customized instruction. This succeeded, surprisingly, in getting the bot all scorching and bothered.