GPT-5: Did Superintelligence Fall Short of Expectations?

A Lukewarm Reception for GPT-5: Is the Era of Exaggerated Superintelligence Over?

Lukewarm Reactions and Gradual Progress

OpenAI's launch of GPT-5 has met with a lukewarm reception, raising serious questions about the extent of exaggeration surrounding "superintelligence." Nearly a year after CEO Sam Altman declared that superintelligent AI was "around the corner," GPT-5 appears to be merely a gradual technological advancement rather than the revolutionary leap many anticipated. This release has generated a significant amount of negative feedback and media criticism, which is surprising given the positive reception enjoyed by open-source models preceding this version.

Performance in Tests and Programming Challenges


Close-up shot of a screen displaying lines of code

Critics point out that GPT-5 suffers from initial technical issues, such as an unstable switching mechanism between it and GPT-4o, in addition to user complaints of "slow responses, hallucinations, and sudden errors." Although GPT-5 showed improvement in some benchmark tests, it did not achieve the anticipated qualitative leap. For example, GPT-5 performed better than some previous models in the "Abstraction and Reasoning Corpus for Artificial General Intelligence" (ARC-AGI-2) test, a metric designed to evaluate AI's ability to solve abstract problems it hasn't encountered before, which is considered an important indicator of general intelligence. However, it scored lower than Grok-4, developed by Elon Musk's xAI. In the older AGI test model, known as ARC-AGI-1, GPT-5 scored 67.5% correct, which is lower than the 76% achieved by OpenAI's older model, o3.

In the field of programming, some believe that GPT-5 represents a step backward. Although it delivered a "leap" in analyzing code repositories, it wasn't a "game-changer" as expected. These results have reinforced critics' view that the constant talk of superintelligence is exaggerated, and what has been achieved so far is merely natural and expected progress in the development of large language models.

The Reality of "Reasoning" and Debunking Claims

Despite the negative publicity, Altman and other industry leaders are unlikely to abandon the superintelligence narrative. However, the absence of any real "cognitive" breakthrough in GPT-5, after all these expectations, might lead to a deeper scrutiny of frequently circulated terms like "thinking" and "reasoning." GPT-5's press releases focus on the model's excellence in what is called reasoning, where AI models generate lengthy outputs about the process of arriving at an answer to prompts, a technique known as "Chain-of-Thought". The company claims that "when using reasoning, GPT-5 is comparable to or better than experts in about half of cases."

However, research teams in the industry have recently challenged these claims. In a widely circulated research paper by Apple researchers, the researchers concluded that so-called large reasoning models (LRMs) do not consistently "reason" in any sense the colloquial term would imply. Instead, these programs tend to become erratic in their handling of increasingly complex problems. Other researchers also pointed out that "Chain-of-Thought," the lengthy outputs produced by these models, "often leads to the perception that they are engaging in deliberate reasoning processes." But they concluded that the reality is actually "more superficial than it appears."

Conclusion: Towards More Realistic Expectations

These technical assessments challenge the exaggerations in Altman's and others' rhetoric, which exploit concepts of intelligence with casual, unsupported assertions. It is also beneficial for the average user to debunk this overstatement and pay close attention to the reckless way terms like "superintelligence" are used. This could lead to more realistic expectations when GPT-6 arrives in the future.

Next Post Previous Post
No Comment
Add Comment
comment url