OpenAI GPT-5 Launch: Model Capabilities Dominate Benchmarks, First Step Toward Building "Superintelligence"

2025.08.08

OpenAI GPT-5 Launch: Model Capabilities Dominate Benchmarks, First Step Toward Building "Superintelligence"

OpenAI's first SOTA, but they say this is just the first step toward building "superintelligence."

2025.08.08 - 12:42:25

OpenAI

Navigating Web3 tides with focused insights

OpenAI's first SOTA, but they say this is just the first step toward building "superintelligence."

By: Zhang Yongyi

After countless delays, GPT-5 has finally arrived.

At 1 a.m. Beijing time on August 8, OpenAI's summer launch event—reminiscent of a new generation "tech Spring Festival Gala"—officially kicked off.

Unlike OpenAI’s previous rapid-fire presentations, this time the company delivered a live broadcast lasting over an hour, with multiple teams taking turns on stage to showcase GPT-5’s powerful capabilities from various angles.

Here’s the key takeaway: GPT-5 shows across-the-board improvements in multiple domains, ranking first in text, WebDev, and visual perception; it leads in hard prompting, coding, math, creativity, and long queries. Under the codename “Summit” test, it currently holds the highest Arena score to date—literally dominating every leaderboard.

Sam Altman said GPT-4o was like a high school student, while GPT-5 is like a college graduate. He even likened GPT-5 to the first iPhone with a Retina display—“With earlier models, you might get the right answer or something completely wild. GPT-4 felt like talking to a college student. But GPT-5 is the first time I’ve truly felt like I’m conversing with a PhD-level expert.” That’s how Sam Altman described GPT-5’s leap in capability.

Although ChatGPT already has nearly 700 million weekly active users, OpenAI hasn’t actually held the industry’s leading-edge model for some time. Now, OpenAI believes GPT-5 will firmly return it to the top of the rankings.

Altman went as far as declaring during the presentation: “This is the world’s most capable model for coding, the world’s most capable model for writing, and the world’s strongest model in healthcare.”

OpenAI also claimed at the event that beyond its explosive coding ability, GPT-5 has made significant strides in writing quality and accuracy when answering health-related questions. Not only does GPT-5 represent a “massive leap” in intelligence, but it also greatly reduces hallucinations—the tendency to confidently make things up. It performs better at understanding and following instructions and shows significantly less flattery.

01 Saying Goodbye to 'Hallucinations'—AI Gets More Reliable

First, the updated model lineup: the GPT-5 series includes four versions—GPT-5, mini, nano, and chat—with the Chat version designed to deliver more natural and intelligent responses—you could even use it to learn a new language.

Additionally, when you now open the ChatGPT website, you’ll notice GPT-5 presented as a single model rather than a standard model plus a separate reasoning model.

This is powered by a routing system developed by OpenAI that automatically switches to a higher-reasoning version for more complex queries—or when you tell it to “think harder.” (Altman called the previous model selection interface “a very messy mess.”)

“AI hallucinations” have long been a major pain point. The good news is that GPT-5 has made substantial progress here, with the company claiming a “significant reduction” in hallucination likelihood. Specifically:

When using web search, GPT-5 makes factual errors 45% less often than GPT-4o.

When reasoning independently, its error rate is 80% lower than OpenAI o3.

GPT-5 was also tested on the new ARC-AGI-2 benchmark, outperforming all major models except Grok 4 (thinking).

Beyond performance, GPT-5 has become a more “honest” AI. It’s far less likely to lie to users or boast about tasks it can’t accomplish. When faced with impossible, unclear, or tool-lacking requests, it communicates its limitations more transparently.

One of the most interesting updates is the introduction of four new optional “personality” modes users can freely choose from:

Cynic
Robot
Listener
Nerd

These modes are optional, letting you customize how ChatGPT interacts and responds. Want it to argue with you, or listen patiently like a friend? Now it’s entirely up to you.

“This model just feels really good,” said Nick Turley, head of ChatGPT. “I think people will genuinely feel that, especially regular users who don’t usually pay attention to models.”

Additionally, you can now change color themes for individual chat windows—a delight for code editor theme enthusiasts.

02 “On-Demand Software Generation” Era Begins? Coding Power Goes Insane

With further enhanced coding abilities, Altman predicts GPT-5’s powerful coding skills will usher in what he calls the era of “on-demand software generation.”

In OpenAI’s tests, GPT-5 outperformed all other models on multiple coding benchmarks including SWE-Bench, SWE-Lancer, and Aider Polyglot. It achieved a 42% success rate in human final evaluations and 75% on the SWE benchmark.

A minor incident: the chart shown during the presentation had several glaring issues, including absurd data points like 52.8 > 69.1, which exaggerated GPT-5’s improvement. It was widely mocked on social media with comments like, “Hope this PowerPoint wasn’t made by GPT-5.”

During the event, Yann Dubois, OpenAI’s lead for post-training, used GPT-5 live to demonstrate generating a website for learning French complete with interactive games. In just seconds, GPT-5 wrote hundreds of lines of code and directly displayed the website’sfront-endinterface. He shared his screen via Zoom and performed simple clicks—all appearing to run perfectly.

OpenAI also showcased a 3D game created entirely by GPT-5 from a single prompt. The generated 3D scenes were not only visually refined but also accurately simulated real-world physics.

03 Safer and More ‘Honest’

According to Alex Beutel, head of model safety research, OpenAI conducted “over five thousand hours” of testing on GPT-5 to assess safety risks, with a key focus being “ensuring the model doesn’t lie to users.”

While GPT-5 produces fewer hallucinations than OpenAI’s o3 reasoning model, “confidently lying” remains an inherent issue in large language models. This problem becomes more complex when models act as agents completing tasks. However, OpenAI says GPT-5 handles multi-step tasks more reliably. “Previously, we’ve seen models claim they completed a task when they actually didn’t,” Beutel said. “That’s a problem.”

For prompts that would previously be outright rejected, GPT-5 now offers what OpenAI calls “safe completions.” Beutel explained: “For example, if someone asks, ‘How much energy is needed to ignite a specific material?’ this could be a malicious attempt to bypass safety measures, or it could be a student researching material properties. This poses a real challenge for how the model should respond.”

Through “safe completions,” GPT-5 “attempts to give as helpful an answer as possible while staying within safety constraints.” The model typically complies partially, offering high-level information that cannot be practically used to cause harm.

04 How to Access GPT-5

Now, the most anticipated question: how can you use GPT-5?

The good news is that all ChatGPT users can now experience GPT-5 for free immediately. This marks the first time OpenAI has offered a cutting-edge model for free to all users. Of course, different subscription tiers come with different privileges:

Plus subscribers get more usage before hitting their limit.
Pro subscribers gain access to the more advanced reasoning version, GPT-5 Pro.

Once users reach their usage cap, ChatGPT automatically switches to a “mini” version of GPT-5 to handle subsequent requests. With GPT-5’s rollout, it will officially replace older models such as GPT-4o, OpenAI o3, OpenAI o4-mini, GPT-4.1, and GPT-4.5.

Regarding token pricing, the standard GPT-5 costs $1.25 per million input tokens and $10 per million output tokens. The mini and nano versions are significantly cheaper.

Detailed pricing is shown in the official screenshot below.

Additionally, OpenAI released a new parameter called “Minimal” in the API, allowing you to use GPT-5 across all use cases simply by adjusting the reasoning intensity.

Beyond OpenAI’s own platforms, Microsoft CEO Nadella also announced that GPT-5 is now available across Microsoft’s entire suite, including Microsoft 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry. All these improvements were trained on Azure.

Finally, Altman stated that OpenAI’s mission is to develop Artificial General Intelligence (AGI). GPT-5 brings them closer to that goal, even as the industry begins shifting toward so-called “superintelligence.”

“I kind of hate the term AGI because everyone defines it slightly differently now,” Altman said, “but this is an important step toward truly powerful models. This clearly requires a model with general intelligence.”

Yet he acknowledged that compared to true AGI, GPT-5 still “lacks some very important things.”

“It’s not just a model—it’s a native entity grown from new discoveries it made. To me, that’s exactly why it can serve as the seed of AGI,” Sam Altman explained.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

极客公园

OpenAI GPT-5 Launch: Model Capabilities Dominate Benchmarks, First Step Toward Building "Superintelligence"

TechFlow Selected TechFlow Selected