
From GPT-5 to DeepSeek V3.1, a new direction for top AI large models has emerged!

As the reasoning models become increasingly complex, the number of tokens required to complete tasks is skyrocketing, resulting in rising actual costs. The industry is shifting from merely pursuing the upper limits of model capabilities to focusing on computational efficiency. Currently, "hybrid reasoning" has become a consensus in the industry, aiming to teach models when to "deeply think" and when to simply "respond quickly."
In the fierce competition of AI large models, the criteria for measurement are quietly changing.
From Meituan's recently open-sourced LongCat-Flash model to OpenAI's next-generation flagship GPT-5 and the new product from the star startup DeepSeek, top players are unanimously focusing on "hybrid reasoning" and "adaptive computing," marking a shift in the AI industry's development focus from "higher and stronger" to "smarter and more economical."
Meituan's recently open-sourced "LongCat-Flash" achieves remarkable computational savings with its innovative architecture while matching the industry's top performance.
As previously mentioned by Wall Street Journal, one of LongCat-Flash's most innovative designs is the "zero computation" expert mechanism, which can intelligently identify non-critical parts of the input content, such as common words and punctuation, and hand them over to a special "expert" that does not perform complex calculations, thus directly returning the input and greatly saving computational power.
This move is not an isolated technical showcase but a precise response to the current industry's pain points— as reasoning patterns become more complex, the costs of AI applications are rising rapidly.
The industry's response strategy is focusing on a common direction: hybrid reasoning models. This model allows AI systems to automatically select appropriate computational resource configurations based on the complexity of the problem, avoiding the waste of expensive computational power on simple tasks.
The "smarter" AI, the more expensive the cost
Meituan's extreme pursuit of efficiency reflects the severe challenges faced by the entire AI industry.
According to Machine Heart, recently, Ethan Ding, co-founder and CEO of TextQL, pointed out a counterintuitive phenomenon—while the cost of tokens has been decreasing, the subscription fees of various model companies have been soaring Ding Yifan believes that the crux of the problem lies in the fact that most of the discounted models are not SOTA models, and human cognitive greed determines that most people only want the "strongest brain," so 99% of the demand will shift to SOTA. Meanwhile, the price of the strongest models remains relatively stable.
In simple terms, although the price of a single token is decreasing, the number of tokens required to complete complex tasks is growing at an unprecedented rate.
For example, a basic chat Q&A may only consume a few hundred tokens, but a complex coding or legal document analysis task may require tens of thousands or even hundreds of thousands of tokens.
Theo Browne, CEO of AI startup T3 Chat, has also stated:
"The competition for the smartest models has evolved into a competition for the most expensive models."
This cost pressure has been transmitted to application layer companies. According to media reports, the profit margin of productivity software company Notion has decreased by about 10 percentage points as a result. Some AI programming assistance tool startups, such as Cursor and Replit, have also had to adjust their pricing strategies, leading to complaints from some users.
Common Answer of Top Models: Hybrid Reasoning
To break the cost dilemma, "hybrid reasoning," or "adaptive computing," has become an industry consensus.
Although major model vendors have different paths, their goals are highly consistent: to enable models to learn when to "deep think" and when to "respond quickly."
OpenAI's GPT-5 adopts a "router" mechanism, automatically selecting the appropriate model for processing based on the complexity of the question. For example, for a simple question like "Why is the sky blue," GPT-5 would directly assign it to a lightweight model, while complex tasks would invoke a high-computing model.
According to internal evaluations by OpenAI, GPT-5 can complete tasks using the thinking mode with 50-80% fewer output tokens than previous models, achieving the same or better results. The system continuously trains the routing mechanism through real signals such as user behavior, preference feedback, and accuracy, improving over time.
DeepSeek's V3.1 version goes a step further, merging dialogue and reasoning capabilities into a single model, launching a dual-mode architecture for a single model. Developers and users can switch between "thinking" and "non-thinking" modes through specific tags or buttons.
Official data shows that its thinking mode can achieve answer quality comparable to previous models while consuming 25-50% fewer tokens, providing enterprises with a cost-effective open-source option.
Currently, this trend has become mainstream in the industry. From Anthropic's Claude series, Google's Gemini series, to domestic players like Alibaba's Qwen, Kuaishou's KwaiCoder, ByteDance's Doubao, and Zhizhu GLM, almost all leading players are exploring their own hybrid reasoning solutions, attempting to find the best balance between performance and cost Some analyses indicate that the next frontier of hybrid reasoning will be smarter "self-regulation"—enabling AI models to accurately self-assess task difficulty and, without human intervention, initiate deep thinking at the most appropriate moment with the least computational cost