Why GPT-5’s rocky rollout is the reality check we needed on superintelligence hype

OpenAI — Dilara Irem Sancar/Anadolu by way of Getty Pictures

Black Friday Smartwatch deals are still live: Get Apple & Pixel Watches before they end — Dilara Irem Sancar/Anadolu by way of Getty Pictures

ZDNET’s key takeaways

The botched rollout of GPT-5 doesn’t mean superintelligence.
GPT-5 represents incremental technical progress.
Students are debunking AI hype with detailed analyses.

Almost a yr in the past, OpenAI CEO Sam Altman declared synthetic “superintelligence” was “just around the corner.”

Additionally: Sam Altman says the Singularity is imminent – here’s why

Then, final June, he trumpeted the arrival of superintelligence, writing in a blog post: “Now we have lately constructed techniques which are smarter than folks in some ways.” However this rhetoric clashes with what’s quickly shaping as much as be a reasonably botched debut of the much-anticipated GPT-5 model from Altman’s AI firm, OpenAI.

(Disclosure: Ziff Davis, ZDNET’s mum or dad firm, filed an April 2025 lawsuit towards OpenAI, alleging it infringed Ziff Davis copyrights in coaching and working its AI techniques.)

A really underwhelming rollout

Within the days because it was launched, the brand new AI mannequin has obtained a good quantity of unfavorable suggestions and unfavorable press — stunning provided that, the week earlier than, the reception to the company’s first open-source models in six years was broadly acclaimed.

“OpenAI’s GPT-5 mannequin was meant to be a world-changing improve to its wildly well-liked and precocious chatbot,” writes Wired’s Will Knight. “However for some customers, final Thursday’s launch felt extra like a wrenching downgrade, with the brand new ChatGPT presenting a diluted persona and making surprisingly dumb errors.”

Additionally: OpenAI’s GPT-5 is now free for all: How to access and everything else we know

There have been easy technical snafus, reminiscent of a damaged mechanism for switching between GPT-5 and GPT-4o, and customers complaining of “sluggish responses, hallucinations, and stunning errors.”

As Knight factors out, hype has been constructing for GPT-5 for the reason that spectacular debut of its predecessor, GPT-4, in March 2023. That yr, Altman emphasised the large technical problem, lending the impression of a sort of moon shot with GPT-5.

“The variety of issues we have gotta work out earlier than we make a mannequin that we’ll name GPT-5 remains to be so much,” stated Altman in a press convention that yr following the corporate’s first-ever developer conference, which happened in San Francisco.

Progress, however no moon shot

What has been delivered seems to be an enchancment, however nothing like a moon shot.

Additionally: OpenAI CEO sees uphill struggle to GPT-5, potential for new kind of consumer hardware

On one of the crucial revered benchmark exams of synthetic intelligence, referred to as the “Abstraction and Reasoning Corpus for Synthetic Basic Intelligence,” or ARC-AGI-2, GPT-5 has scored higher than some predecessors but additionally under the lately launched Grok-4 developed by Elon Musk’s xAI, in accordance with ARC-AGI’s creator on X, Francois Chollet.

Grok 4 remains to be state-of-the-art on ARC-AGI-2 amongst frontier fashions.

15.9% for Grok 4 vs. 9.9% for GPT-5. pic.twitter.com/wSezrsZsjw

— François Chollet (@fchollet) August 7, 2025

On the older mannequin of the AGI check, ARC-AGI-1, GPT-5 scored 67.5% right, Chollet wrote, which is under the 76% that an older OpenAI mannequin, o3, scored in December.

GPT-5 on ARC-AGI Semi Personal Eval

GPT-5
* ARC-AGI-1: 65.7%, $0.51/process
* ARC-AGI-2: 9.9%, $0.73/process

GPT-5 Mini
* ARC-AGI-1: 54.3%, $0.12/process
* ARC-AGI-2: 4.4%, $0.20/process

GPT-5 Nano
* ARC-AGI-1: 16.5%, $0.03/process
* ARC-AGI-2: 2.5%, $0.03/process pic.twitter.com/KNl7ToFYEf

— ARC Prize (@arcprize) August 7, 2025

In coding, every new AI mannequin typically exhibits some progress.

ZDNET’s David Gewirtz relates in his testing that GPT-5 is definitely a step backward. David concedes GPT-5 did “present a bounce” within the evaluation of code repositories however provides that it wasn’t “a game-changer.”

What’s occurring right here? The hype of Altman and others about superintelligence has yielded to mere progress.

“Overdue, overhyped and underwhelming,” wrote the relentless Gen AI critic Gary Marcus on his Substack. “However this time, the response was totally different. As a result of expectations have been by the roof, an enormous variety of folks considered GPT-5 as a serious letdown.”

AI students are pushing again on the hype

For all of the unfavorable press, it is unlikely Altman and others will abandon the rhetoric about superintelligence. Nonetheless, the dearth of a real “cognitive” breakthrough in GPT-5, after a lot expectation, could gas nearer scrutiny of phrases usually tossed round, reminiscent of “considering” and “reasoning.”

The press release for GPT-5 from OpenAI emphasizes how the mannequin excels at what has come to be referred to as reasoning, the place AI fashions generate verbose output in regards to the strategy of arriving at a solution to a immediate.

“When utilizing reasoning, GPT-5 is corresponding to or higher than consultants in roughly half the circumstances,” the corporate states.

Additionally: OpenAI returns to its open-source roots with new open-weight AI models, and it’s a big deal

The trade’s analysis groups have lately pushed again on claims of reasoning.

In a widely cited research paper from Apple last month, the corporate’s researchers concluded that so-called massive reasoning fashions, LRMs, don’t persistently “cause” in any sense that one would anticipate of the colloquial time period. As an alternative, the packages are inclined to turn out to be erratic in how they method more and more advanced issues.

“LRMs have limitations in actual computation: they fail to make use of specific algorithms and cause inconsistently throughout scales and issues,” wrote lead writer Parshin Shojaee and staff.

As a consequence, “Frontier LRMs face a whole accuracy collapse past sure complexities.”

Equally, Arizona State College researchers Ghengshuai Zhao and staff write in a report last week that “chain-of-thought,” the string of verbose output produced by the LRMs, “usually results in the notion that they interact in deliberate inferential processes.” However, they conclude, the truth is in reality “extra superficial than it seems.”

Additionally: This free GPT-5 feature is flying under the radar – but it’s a game changer for me

Such obvious reasoning is “a brittle mirage that vanishes when it’s pushed past coaching distributions,” Zhao and staff conclude after finding out the fashions’ outcomes and their coaching knowledge.

Such technical assessments are difficult the hyperbole from Altman and others that exploits notions of intelligence with informal, unsubstantiated assertions.

It will behoove the common particular person to additionally debunk the hyperbole and to pay very shut consideration to the cavalier means that phrases reminiscent of superintelligence are tossed round. It could make for extra cheap expectations every time GPT-6 arrives.

Source link