OpenAI's strongest model GPT-5 is here! Free to use, Altman exclaims a big step towards AGI, Microsoft gets ahead with integration

GPT-5 is the first product from OpenAI that combines the reasoning of the o series models with the rapid response of the GPT series; it has the strongest coding ability to date, with an accuracy rate of 74.9% in benchmark tests, slightly surpassing Anthropic's Opus 4.1 launched on Tuesday; it shows better taste in creative writing; the hallucination issue has significantly improved, with a misinformation rate of 1.6% in the health field, far lower than GPT-4o's 15.8%; the new safety training mode teaches the model to provide the most helpful answers within a safe range. Altman stated it feels like talking to an expert in any field for the first time. GPT-5 will be available to users starting Thursday, with Pro users having unlimited access and receiving the enhanced GPT-5 Pro. Microsoft will integrate GPT-5 into platforms such as Copilot and Azure AI Foundry starting Thursday. OpenAI also launched four optional preset personalities for ChatGPT chat

This year, OpenAI's most anticipated product has arrived.

On Thursday, August 7th, Eastern Time, OpenAI announced the launch of its next-generation flagship artificial intelligence (AI) model, GPT-5. It is OpenAI's first "integrated" AI system, combining the reasoning capabilities of the o series models with the rapid response capabilities of the GPT series models.

OpenAI CEO Sam Altman highly praised GPT-5 at the new model launch event, calling it "the best model in the world," a "significant upgrade" compared to previous models, and stated that its release marks an "important step" for OpenAI on the path to achieving artificial general intelligence (AGI).

OpenAI introduced that GPT-5 performed excellently in multiple benchmark tests, reaching cutting-edge levels in programming, mathematics, health, and other fields. GPT-5 achieved an accuracy rate of 74.9% in the SWE-bench Verified code test, slightly surpassing Anthropic's new model Claude Opus 4.1 released this Tuesday. At the same time, GPT-5's hallucination issue has significantly improved, with an error information rate of only 4.8%, far lower than the 20.6% of the previous model GPT-4o.

Starting from this Thursday, GPT-5 will be available to all free users of ChatGPT and paid users subscribed to Plus, Pro, and Team, as the default model, and will be launched on the Enterprise and Edu paid plans within a week.

Like GPT-4o, the difference between the free and paid versions of GPT-5 lies in usage limits. Plus users enjoy higher usage limits, while Pro users can use it unlimitedly and receive the enhanced version GPT-5 Pro. For free users, the complete reasoning capabilities may take a few days to be fully available. Once free users reach the usage limit of GPT-5, OpenAI will switch them to the smaller model GPT-5 mini.

OpenAI also announced on Wednesday that it will provide the ChatGPT product to U.S. federal government agencies for a symbolic fee of $1 per year. Specifically, this is the enterprise version of ChatGPT, which includes enhanced security and privacy features.

Just as OpenAI officially announced GPT-5, Microsoft announced that starting this Thursday, it will integrate GPT-5 into its extensive product portfolio, including platforms such as 365 Copilot, Copilot, GitHub Copilot, and Azure AI Foundry, allowing Microsoft's enterprise and consumer users to immediately experience the advanced reasoning capabilities and programming advantages of GPT-5.

GPT-5 has three major advantages in programming, creative writing, and health

OpenAI's announcement of GPT-5 begins by stating that GPT-5 is OpenAI's "smartest, fastest, and most practical model, with built-in thinking capabilities that enable everyone to possess expert-level wisdom." According to OpenAI, as the "most powerful model" of OpenAI, GPT-5 has achieved significant improvements in three key areas.

First is programming capability. GPT-5 is OpenAI's most powerful coding model to date, excelling in complex front-end generation and debugging large codebases, capable of creating aesthetically pleasing responsive websites, applications, and games with just a single prompt. Early testers noted improvements in design choices such as spacing, typography, and white space.

In the benchmark SWE-bench Verified, which assesses real-world coding tasks from GitHub, GPT-5 achieved a first-attempt accuracy of 74.9%, surpassing OpenAI's reasoning model o3 at 69.1% and GPT-4o at 30.8%.

Comments pointed out that this means GPT-5 slightly outperformed Claude Opus 4.1 launched by Anthropic on Tuesday and Google's DeepMind's Gemini 2.5 Pro, which scored 74.5% and 59.6% respectively in the SWE-bench Verified test.

However, in the Humanity’s Last Exam test, which measures model performance across disciplines in mathematics, humanities, and natural sciences, the enhanced version of GPT-5 with extended reasoning capabilities, GPT-5 pro, scored 42% when using tools. This is slightly lower than the xAI model Grok 4 Heavy, which scored 44.4%.

Altman stated that GPT-5 is particularly adept at on-demand launching of entire software applications, known as "ambient coding," where AI generates functional code based on natural language prompts, thereby accelerating development speed.

As an example, OpenAI researchers demonstrated asking GPT-5 to create a web app to help English-speaking users learn French, with the app needing to have an engaging theme, including flashcards, quizzes, a classic Snake game, and a method to track daily learning progress.

The researchers submitted the same prompt to two GPT-5 windows, and within minutes, two different apps were generated. OpenAI's head stated that these apps "have some flaws," but users can further adjust the AI-generated software according to personal preferences, such as changing backgrounds or adding more tabs.

In creative writing, GPT-5 is capable of handling structurally complex writing tasks, such as unrhymed iambic pentameter or naturally flowing free verse. Nick Turley, Vice President of ChatGPT Business at OpenAI, stated that GPT-5 demonstrates "better taste" in creative tasks, with responses being more natural

Health consulting is the third important area of enhancement.

GPT-5 can more actively identify potential health issues and help users interpret medical results, although OpenAI emphasizes that ChatGPT cannot replace medical professionals.

In a test called HealthBench Hard Hallucinations, the error rate of hallucinated misinformation for the thinking-capable GPT-5 was only 1.6%. This is significantly lower than the error rates of the GPT-4o and o3 models, which were 15.8% and 12.9%, respectively.

Significantly Reduced Hallucination Possibility with New Safety Training Mode

OpenAI claims that GPT-5 is more reliable and practical compared to previous models, as it can answer real-world questions more accurately, with a significantly reduced likelihood of hallucinations.

After enabling web search for anonymous prompts representing ChatGPT's production traffic, the likelihood of GPT-5 responses containing factual errors is about 45% lower than that of GPT-4o; after thinking, the likelihood of GPT-5 responses containing factual errors is about 80% lower than that of o3. As shown in the figure below, the misinformation rate for GPT-5 responses is only 4.8%, while GPT-4o is 20.6% and o3 is 22%.

OpenAI also stated that a new form of safety training called safe completions has been introduced for GPT-5. It teaches the model to provide the most helpful answers possible within a safe range. Sometimes, this may mean partially answering the user's question or only providing high-level responses.

If a refusal is necessary, the trained GPT-5 will transparently inform the user of the reason for the refusal and provide safe alternatives.

In controlled experiments and OpenAI's production models, OpenAI found that this safe completions approach is more nuanced, better guiding dual-use issues, enhancing robustness against ambiguous intentions, and reducing unnecessary excessive refusals.

Michelle Pokrass, head of post-training at OpenAI, stated: "GPT-5 has been trained to recognize when tasks cannot be completed, avoid guessing, and explain limitations more clearly, reducing unfounded assertions compared to previous models."

Four Optional ChatGPT Chat Personality Presets Released

OpenAI stated that GPT-5 has improved in instruction execution, and its ability to execute custom instructions has also been enhanced. OpenAI will launch a new research preview version with four preset personalities for all ChatGPT users.

The initial four personality options—Cynic, Robot, Listener, and Nerd—are optional, and users can adjust them at any time in the settings to match the communication style between ChatGPT and the user.

These four personalities are initially applicable to text chat and will later be expanded to voice chat, allowing users to set the interaction style of ChatGPT without writing custom prompts—whether concise and professional, thoughtfully supportive, or slightly sarcastic.

OpenAI claims that all these new personalities meet or exceed their internal evaluation standards for reducing sycophantic behavior.

Altman Praises Historic Breakthrough, Says Using GPT-4 Feels Quite Bad

At a briefing on Thursday, Altman gave high praise to GPT-5, positioning it as an important milestone towards AGI. He stated:

“At no point in history has it been imaginable to have something like GPT-5.”

“This is the first time it feels like talking to an expert in any field.”

During the briefing, Altman even went so far as to "step on" GPT-4 to elevate GPT-5. He said:

“I tried going back to GPT-4, and it felt quite bad.”

GPT-5 uses a unified system architecture equipped with a real-time router that can automatically decide whether to respond quickly or engage in deep "thinking" based on the type of conversation, complexity, and tool requirements. This eliminates the need for users to choose the appropriate settings, making ChatGPT easier to use.

In internal benchmark tests for economic value work, GPT-5 using reasoning mode was comparable to or better than expert levels in about half of the cases, covering over 40 professions including law, logistics, sales, and engineering. OpenAI VP Nick Turley stated, "This model feels really good."

Altman likened using GPT-5 to having a team of experts, all with PhDs, at your disposal. He also said, “In many new fields, people are limited by ideas but actually lack the execution capability.”

Microsoft Fully Integrates to Seize the Opportunity

On the day of GPT-5's release, Microsoft announced that it would integrate it into a wide range of products. In enterprise applications, Microsoft 365 Copilot will leverage GPT-5 to better handle complex problems, maintain focus in long conversations, and understand user context. Enterprise users can process emails, documents, and files through reasoning capabilities.

For consumers, the new intelligent mode of Microsoft Copilot will utilize GPT-5 to help users discover the best solutions. Users can experience GPT-5 for free through copilot.microsoft.com or the Copilot app on Windows, Mac, Android, and iOS devices

Developers will receive GPT-5 support through GitHub Copilot and Visual Studio Code for writing, testing, and deploying code. The Azure AI Foundry platform will provide all GPT-5 models, equipped with an AI-driven model router that selects the optimal model based on the complexity of each task, performance requirements, and cost efficiency.

Microsoft's AI red team tested the GPT-5 inference model using strict security protocols, and the results showed that the model exhibited one of the strongest AI security configurations among all OpenAI models to date against various attack modes, including malware generation and fraud automation