Computing power, the "hard currency" in the post-GPT-5 era

The North American model update and inference application have achieved a preliminary closed loop, and computing power has entered the "secondary capital raising" stage. OpenAI released GPT-5, significantly reducing computing power costs, and the CEO stated that computing resources are expected to double within 5 months. The token consumption of major manufacturers is rapidly increasing, seeking a balance between the universality of AI technology and commercial sustainability. Domestic large models are accelerating their catch-up, with companies like ByteDance releasing model updates, and computing power consumption steadily increasing, especially achieving breakthroughs in the multimodal field. Domestic computing power chip companies are also transitioning to system-level solutions to support large model iteration and application deployment

Abstract

North American model updates + preliminary closed loop of inference application implementation, computing power enters the "secondary capital raising" stage, continuing to be optimistic about overseas computing power chain investments. After slight updates from manufacturers such as Google/Anthropic, on August 8th, Beijing time, OpenAI released its latest leading large model GPT-5. In addition to improvements in basic indicators such as intelligence level and programming ability, there are also significant optimizations in resource scheduling, hallucination elimination, input context window length, and writing level.

More importantly, GPT-5 has significantly reduced the unit computing power cost, with API call prices benchmarked against Gemini 2.5 Pro. We believe this is also an inevitable choice for large model companies like OpenAI that rely on external capital, and it is a necessary condition for their continued computing power demand. OpenAI's CEO stated on X that the company expects to double its computing power resources within five months.

On the inference application side, major manufacturers represented by Google are experiencing rapid growth in token consumption. Through the current market strategy of "free volume, paid breakthrough," they seek a phased balance between the universality of AI technology and commercial sustainability. We see that industry leaders in large models are using technological iteration and customer stickiness, forcing followers to engage in "capital raising" to avoid being eliminated by the times.

We believe that the North American model update iteration + the landing of inference applications has achieved a preliminary closed loop in the current model generation, and computing power remains "hard currency" in the post-GPT-5 era. We continue to be optimistic about the overseas computing power industry chain.

Domestic large models are accelerating their catch-up, optimistic about the performance of the domestic computing power market after the update of open-source SOTA models. Although domestic players still have a certain gap in model capabilities compared to overseas, we have seen several companies such as ByteDance, Kuaishou, Kimi, and Minimax continuously release model updates and promote application deployment since 2025, with computing power consumption steadily increasing, especially achieving breakthroughs and commercial landing in the multimodal field, providing diversified momentum for medium- to long-term computing power demand growth. In terms of combined internal and external usage, ByteDance's monthly token consumption is already comparable to Google.

On the supply side, we also see that domestic computing power chip companies are transitioning from single-chip products to system-level solutions to support the iteration and application deployment of domestic large models. We believe that if open-source SOTA models like DS are updated in Q3 2025, the domestic AI industry chain flywheel is expected to restart, and secondary market investment sentiment is also expected to be boosted.

Main Text

Release of GPT-5, AI large models continue to drive on the fast track of development

We see that after the "innovation fever of DeepSeek," the global large model industry continues to develop, and the speed of model iteration has not slowed down; instead, it shows a trend of multiple points of explosion, which continues to drive computing power demand towards a higher ceiling In the early morning of August 6, several leading large model companies in North America released a new round of model updates almost simultaneously. Google DeepMind launched the next-generation general world model Genie 3, which can generate 720p images in real-time at a speed of 20-24 frames per second and simulate interactive dynamic worlds with coherent content lasting several minutes. It can simulate the physical world, the natural world, create animated fantasy worlds, and explore historical scenes. Its debut marks a new height for world simulation AI while also increasing the demand for computing power.

OpenAI released the first open-source large model series gpt-oss, which includes gpt-oss-120b (117 billion parameters, suitable for large-scale, high-performance inference tasks) and gpt-oss-20b (21 billion parameters, designed for low-latency and localized applications). The training and operation of these two models also require substantial computing power support, whether for massive data processing during initial training or for real-time computation during inference on different devices. Anthropic updated the Claude Opus 4.1 version, which has improved capabilities in coding, reasoning, and executing instructions compared to the previous Claude 4 series, such as improved accuracy on SWE-bench Verified. We believe that the enhancement in model performance is closely tied to the support provided by computing power.

Chart 1: Genie 3 Performance

Source: Google DeepMind official website, CICC Research Department

Chart 2: gpt-oss Competitive Programming Performance

Source: OpenAI official website, CICC Research Department

Chart 3: Claude Opus 4.1 Performance

Source: Anthropic official website, CICC Research Department

In the early morning of August 8, OpenAI further released the highly anticipated GPT-5. We believe that analyzing this new model from the perspective of computing power reveals several highlights: significant improvement in token usage efficiency, a substantial decrease in pricing structure, and an increase in context capability to 400K. The efficiency in "savings," the low "price," and the strong "capability" not only lower the cost per call but also enhance overall call density and instantaneous resource usage with longer context and broader user coverage, thereby significantly increasing the actual demand for computing power and forming a virtuous cycle of "cost reduction - capacity expansion - increased demand." Specifically, we believe that GPT-5 significantly improves token usage efficiency, achieving results that surpass previous models with fewer tokens consumed. This is due to three upgrades:

First, unified system and adaptive reasoning routing. GPT-5 is a "unified system" that defaults to a more efficient chat model, only switching to the "Thinking" reasoning model when the problem is genuinely complex. It can automatically decide whether to enable deep reasoning based on task complexity, avoiding lengthy thinking and output for simple questions. Official evaluations show that while maintaining or improving effectiveness, GPT-5 Thinking outputs 50–80% fewer tokens than o3 across various tasks.

Second, more efficient reasoning chain convergence and tool invocation. According to the company's official assessment, in real engineering evaluations (such as SWE-bench Verified), GPT-5 outputs approximately 22% fewer tokens and 45% fewer tool invocations than o3 under high reasoning settings. This means it is more direct and stable in the planning-execution-verification chain, reducing intermediate steps and interaction overhead, thereby compressing generation length from the source.

Third, controllable generation and minimal reasoning. GPT-5 introduces control options such as verbosity (controlling length) and reasoning effort (time spent on reasoning), allowing developers to precisely adjust "text density" and "thinking depth" to meet task requirements, avoiding excessive explanations and significantly reducing effective token input without sacrificing correctness.

At the same time, we believe that GPT-5 has stronger robustness in instruction adherence and multi-tool collaboration, reducing clarification and rework rounds, thereby further lowering the "total tokens per completed task." This system optimization, from underlying mechanisms to application interfaces, not only reduces token consumption for single tasks but also lowers overall computing costs, driving the "virtuous cycle" of computing power forward and stimulating greater future demand through efficiency improvements.

Chart 4: Significant improvement in accuracy and output token efficiency of GPT-5 in software programming

Source: OpenAI official website, CICC Research Department

Secondly, GPT-5's pricing strategy has achieved significant cost reduction. When developers use the GPT-5 API, the charge is only $1.25 per million input tokens and $10 for output, which is significantly lower than the previous GPT-4.1 model; among them, the GPT-5 mini version is even lower, requiring only $0.25 for input and $2 for output, while GPT-5 nano goes as low as $0.05 for input and $0.40 for output We see that the pricing structure of GPT-5 is even more competitive than that of Gemini 2.5 Pro, which has always been regarded as the "low-price benchmark." It is only comparable in terms of input while being cheaper in output, and it is 15 times lower than similar products from Anthropic. On the other hand, end users can now use GPT-5 for free under certain conditions. Ordinary users can directly use the GPT-5 model, with usage time kept at "a few hours" per day. When the usage reaches the limit, the system will automatically switch to the mini version to ensure an uninterrupted experience; the Plus subscription (about $20/month) has a higher usage limit, while the Pro subscription (about $200/month) offers unlimited access to GPT-5 Pro and GPT-5 Thinking modes.

From a strategic perspective, we believe that this pricing and product tiering mechanism not only lowers the usage threshold but also clarifies the trend of "cost reduction and efficiency improvement" in computing power, forming a positive push for the high-frequency daily use of generative AI, which is expected to continuously stimulate user demand and usage breadth.

Chart 5: GPT-5 API Pricing (USD, per million Tokens)

Source: Company websites, CICC Research Department

Another key advancement is the leap in contextual capability.

The current version of GPT-5 supports a context extension of up to 400K Tokens, which is about 3.1 times larger than the 128K of GPT-4o and also doubles the 200K of o3; it is more robust in long context retrieval and cross-document content alignment, with a higher hit rate. This means that a single session can directly accommodate large reports, codebases, and multi-source materials, reducing the "extra dialogues" and ineffective generation caused by splitting and back-and-forth; at the same time, the 400K window also imposes higher instantaneous demands on memory and bandwidth.

Overall, on one hand, a longer visible range will bring an immediate computing power demand exceeding 128K, while on the other hand, a stronger application carrying capacity will inversely enhance application capabilities, stimulating new application scenarios (such as long document answering and cross-tool pipelines), thereby further amplifying the demand for computing power.

In summary, we see that a common trend and logic behind the recent model updates is: as model capabilities continue to enhance, the efficiency of Token usage is increasing, while the demand for computing power continues to rise. This is not only about traditional cloud-side cluster inference capabilities; more and more scenarios are beginning to shift towards local and edge computing power. For example, locally deployed OSS models also impose performance requirements on consumer-grade GPUs, and models like Genie 3, which require real-time responses on the edge, further raise the energy efficiency and computing power thresholds for devices It can be said that model iteration itself is one of the main sources of the growing demand for computing power in the current large model industry. Whether it is the expansion of training scale, the increase in inference complexity, or the demand for multi-modal and multi-task adaptation, all are continuously driving up power consumption.

Chart 6: AI Model Update Timeline Since 2022

Source: Company website, CICC Research Department

From the AI model update timeline chart above, it can be seen that since 2022, many domestic and foreign manufacturers such as OpenAI, Anthropic, Google, and domestic companies like MiniMax, Deepseek, and ByteDance have continuously launched new models or updated existing models. In the first half of 2025, the number of large models released by mainstream manufacturers globally will significantly increase, showing a more intensive release rhythm.

Data shows that in the first half of 2025, a total of 9 major companies updated their models, with a total of 21 models released, representing a year-on-year growth of 28.6% and 10.5% compared to the first half of 2024, respectively. Moreover, from the perspective of model types, there has been an evolution from early single-point capabilities of language models to comprehensive breakthroughs in multi-modal, multi-task, and ultra-long context capabilities. For example, OpenAI's GPT-4.5, Claude4.1, Gemini2.5, Grok4, Qwen3-235B, etc., all reflect further expansion of capability boundaries.

This further reflects the current situation of continuous development and accelerated iteration in the large model industry, where densely updated models are becoming a core factor driving the sustained increase in computing power demand.

The continuous updates of overseas models are a sustained positive factor for computing power. Taking OpenAI's GPT-5 as an example, we believe that its overall capabilities are lower than some market expectations; however, this round seems more like a "efficiency-first" and cost-oriented choice under the constraints of capital and unit economics, rather than an attempt to break through the frontiers of technology.

OpenAI mainly relies on external capital, and the consumption of funds is rapid. If a suitable price-performance combination cannot be formed, the product is difficult to be borne on a large scale and maintain sustainability. Based on this, we believe that the goal of this update of GPT-5 is to reduce OpenAI's operating costs, rather than fully promoting the expansion of the frontiers of technology. To achieve cost reduction, this update of GPT-5 will focus on pursuing economies of scale, reducing latency, and achieving more economical inference costs, making it easier for users to access, thereby creating favorable conditions for global promotion.

With the widespread promotion of the product, it has attracted a larger and more diverse user base, which in turn strongly promotes the development of the product. The growth of the product will inevitably generate more demand for computing power. From this perspective, the update of GPT-5 is beneficial for its continued consumption of computing power. On August 11, OpenAI CEO Sam Altman also stated on the X platform that the company would prioritize allocating computing power to the inference side (increased usage of the paid version/priority to meet API demand/improvement of free version service quality) and plans to double its computing resources within a five-month timeframe This move also confirms our aforementioned viewpoint.

At the same time, different strategies that other competitors may choose are also expected to have a positive impact on the demand for computing power in the market. For example, companies like Google and Meta, with their trillion-dollar market capitalization and strong resources, have almost no concerns regarding funding and R&D support, allowing them to more comfortably advance the updating and optimization of models; Anthropic, on the other hand, has a stronger profitability and commercial sustainability due to its close connections with numerous enterprises, providing a solid foundation for its continuous investment in the development of models with strong coding capabilities and promoting technological iteration.

We see that whether participants facing funding constraints are seeking commercialization or scaling breakthroughs, or leading companies with stable resource support, their continuous push for different directions of model iteration and upgrading is collectively enhancing the demand for computing power.

Chart 7: Different manufacturers' business strategies all positively drive the demand for computing power

Source: Company websites, CICC Research Department

Global large model Token consumption rapidly rises, AI application density comprehensively increases

Overseas giants' Token usage rapidly grows: Google AI Overview leads the way

Since 2025, the Token consumption of Google, Microsoft, and ByteDance has shown a significant upward trend.

Chart 8: Token consumption of Microsoft, ByteDance, and Google from December 2024 to July 2025

Source: Microsoft conference call, 2025 Volcano Engine Power Conference, Google I/O Conference, CICC Research Department

We believe that Google's Token consumption significantly increased in the first half of 2025, driven by two main factors:

First, we believe that the rapid expansion of AI Overview has greatly increased the frequency of Token calls, which is the main reason for the significant growth in Google's Token consumption in 2025. AI Overview is a search enhancement feature that Google first launched in May 2024, which automatically generates concise and clear AI summaries at the top of the search results page without requiring users to actively enter a dialogue interface, directly triggered by search keywords. This means that the AI system frequently generates a large number of natural language summaries on the page automatically during user searches, and most of these generation processes are completed by the system backend without user awareness. Therefore, the consumption of Tokens mainly comes from the content generated automatically by the system, rather than from user-initiated questions or clicks that trigger interactive behaviors. This static, default-triggered high-coverage summary mechanism, combined with Google's approximately 50 trillion search requests per year, makes AI Overview a key driver of the growth in Google's Token usage In addition, Google launched AI Mode in May 2025, which further introduced multi-turn search integration and multi-question prediction compared to AI Overview, increasing the overall Token density of search AI. Overall, the product form, triggering mechanism, and deployment speed of AI Overview constitute an important foundation for the rapid growth of Google's Token consumption.

At the same time, we believe that Google's significant lead on the user side further amplifies its total Token consumption and widens the gap with other vendors. As of March 2025, the monthly active users of AI Overview reached 1.5 billion, while Gemini had 350 million monthly active users, and ChatGPT under OpenAI had approximately 600 million monthly active users. Notably, although Gemini's monthly active users are only about half of ChatGPT's, Google's overall Token call volume has reached 5-6 times that of Microsoft, indicating that the core factor truly widening the gap between the two is the high-frequency use of the AI Overview search feature. In contrast, Google's AI products are characterized by being free, default-triggered, and lightly interactive, significantly lowering the user entry barrier, achieving faster penetration rates globally, and a concentrated rapid growth in Token call volume. In summary, relying on its large search user base, the high-frequency triggering of the AI Overview feature, and the design of easy-to-use interaction entry points, Google has expanded its Token call structure in both user numbers and unit user call density dimensions, thus supporting its position as the fastest-growing leading vendor in Token consumption by 2025.

The continuous increase in Token consumption density suggests that paid scenarios are likely to break the commercial closed loop first.

Currently, the driving factors behind the rapid rise in Token consumption are becoming increasingly diverse and complex, compared to the phase dominated solely by Chatbots, where computational power demand is rapidly expanding.

Chart 9: Main Ways of Increasing Token Consumption

Source: CICC Research Department

From the current supply-demand pattern of the AI application market, the free model remains the primary way for users to engage, with its user scale and growth rate significantly outpacing the paid model.

In contrast, those AI products that have already achieved monetization typically possess significant differentiated capabilities, able to precisely address users' high-value needs. From a functional perspective, paid products often create barriers in terms of professionalism, reliability, and completeness of experience: for example, paid products like ChatGPT-Agent and Claude-4 build a certain professional barrier with stronger reasoning abilities, lower error rates, and more complete functional experiences, with their output error rates significantly lower than those of free models From a technical support perspective, paid products rely on superior computing power scheduling and caching mechanisms (such as Volcano Engine's AI cloud-native solution reducing inference costs by 20% [1]), which can maintain low latency and high stability in high-frequency interaction scenarios, a service level that free products find difficult to achieve.

Overall, we believe that the current market pattern of "free to attract users, paid to break through" reflects a phased balance between the universality of AI technology and commercial sustainability. We believe that as model capabilities continue to improve, such as more accurate inference, smoother multimodal interactions, and more efficient cost control, users' willingness to pay for high-quality services will gradually increase. At that time, products that can truly create value in efficiency enhancement or decision optimization for users are expected to achieve "value pricing" and build a clearer commercial closed loop.

Domestic models are not to be outdone, waiting for the traffic king's update

Globally, although the innovation capabilities of large models from Chinese manufacturers may temporarily lag behind those in North America, the overall model level is still advancing. As models continue to iterate and update, their requirements for cloud-side and edge-side computing power will also increase, and the entire industry will continue to develop in the mutual promotion of computing power and model innovation. We believe that subsequent updates from traffic models like DeepSeek are expected to promote the positive cycle mentioned above.

Kimi K2, as a trillion-parameter MoE architecture model, has significant updates in architecture, capability, and functionality compared to previous versions, achieving a substantial leap in overall performance. It adopts a design with a total of 1T parameters and 32B active parameters, increasing the number of experts to enhance knowledge breadth, reducing the number of attention heads to improve feature learning efficiency, and achieving stable pre-training of 15.5T Tokens with the MuonClip optimizer. It has achieved state-of-the-art (SOTA) results in benchmark tests such as code generation (e.g., building 3D HTML scenes, futures trading systems) and mathematical reasoning, with significantly enhanced basic capabilities. According to official pricing, it costs 4 yuan per million input Tokens and 16 yuan per output Token. In the future, as developers assign longer documents and more complex chain tasks to K2, the overall Token consumption scale will further expand.

Chart 10: Relationship between Kimi K2 Loss and Token Consumption

Source: Kimi K2 Official Website, CICC Research Department

MiniMax has also completed SOTA-level updates in three major tracks: long text, video generation, and intelligent agents, compared to previous versions, while also bringing higher computing power consumption. The three models updated by MiniMax increase Token consumption through the strategy of "expanding capacity/resolution + lowering unit price." M1 raises the input limit to 1 million Tokens, allowing users to submit a large amount of content at once, resulting in a tenfold or hundredfold increase in single-task Token numbers; Hailuo 02 enhances resolution at the same price, making users inclined to use higher definition or multiple re-generations, significantly increasing Token consumption per video; The Agent plan can cache the entire knowledge base, consuming a large number of Tokens at each step. The three will jointly increase the total Token consumption of MiniMax.

Kuaishou's Keling AI has recently achieved a comprehensive leap in capabilities through multidimensional technological upgrades. In May, Kuaishou launched the Keling 2.1 series model. Although the official pricing (inspiration value) has been maintained at the same level as version 1.6, the advanced features and creative freedom brought by the model upgrade may lead users to use high-spec modes more frequently, thereby increasing the total consumption of actual inspiration value.

On August 5, Alibaba's Tongyi Qianwen team open-sourced the first new text-to-image model Qwen-Image. Qwen-Image may promote Tongyi Qianwen's upgrade to "text-image" multimodal interaction, which will increase Token consumption, as tasks such as image generation and editing require more complex text instructions, as well as multi-round iterative adjustments and scenario expansions brought by functional extensions, which may increase Token consumption.

Chart 11: Images generated by Qwen-Image

Source: Qwen-Image GitHub, CICC Research Department

From the perspective of the upgrade trend of domestic AI models, recent updates of major models have uniquely expanded the boundaries of AI, directly triggering a sharp rise in Token consumption, showing an exponential growth trend compared to the early stage when only Chatbots existed.

Chart 12: Weekly average daily active users of various AI model apps

Source: Similar Web, Questmobile, CICC Research Department

In terms of ByteDance, data released by Volcano Engine shows that the daily Token usage of the Doubao large model has reached approximately 16.4 trillion. In the first quarter of 2025, its market share of large model calls in the domestic public cloud is about 46.4%, ranking first in the industry.

Taking Kimi as an example, in February 2025, its App MAU was approximately 26.22 million.

MiniMax's overseas social AI product Talkie reached 20.62 million monthly active users in October 2024, while the corresponding domestic version "Xingye" had 5.12 million monthly active users, totaling 25.74 million monthly active users, focusing on entertainment dialogue scenarios.

The Token demand for Kuaishou Keling is more driven by "multimodal link depth": official data shows that its global user base has surpassed approximately 22 million, with annual recurring revenue (ARR) exceeding 100 million USD in the 10th month since its launch, and monthly payments in April and May exceeding 100 million RMB The App version of Tongyi Qianwen publicly shows that "absolute MAU/DAU" is relatively low; third-party monitoring has pointed out that although it ranks in the "Top 3 for the number of intelligent agents," its traffic is mostly below 5 million. Based on this, it is estimated to have "millions of MAU." Under the same interaction metrics, the monthly Token could reach tens of billions, and as it expands in the "text-image/video" multimodal direction (such as Qwen-Image, VLo, etc.), the prompts and iteration cycles for each task will also be further extended.

In summary, we believe that with the "dual growth" of MAU and per capita interaction frequency, combined with the expansion of deep reasoning and multimodal links, the rapid growth in Token processing volume directly drives the demand for larger memory capacity and more complex scheduling algorithms. Additionally, in new scenarios such as video generation, the demand for computing power is also rapidly increasing. As the model capabilities continue to evolve, future Token consumption and computing power demand will continue to rise, and the computing power bottleneck is structurally shifting from limited decoding capabilities to limited bandwidth and interconnectivity.

Domestic computing power focuses on full-dimensional support from chips to systems, seizing high-growth opportunities in the industry

Focusing on the domestic supply side, we see Chinese AI chip companies making appearances at the 2025 World Artificial Intelligence Conference (WAIC 2025).

We believe that domestic computing power manufacturers are no longer limited to the performance iteration of a single chip but are focusing on innovations in interconnection technology, the construction of super-node architectures, and the output of large-scale system solutions. By collaboratively building efficient computing power clusters, they provide full-dimensional support from chips to systems for the training and inference of AI large models.

In the face of a continuously growing market ceiling, we believe that domestic computing power is expected to continue to capture market share through the continuous enhancement of product strength.

Authors of this article: Cheng Qiaosheng, Jia Shunhe, et al., Source: CICC Insight, Original Title: "CICC | AI Evolution (13): Computing Power, the 'Hard Currency' in the Post-GPT-5 Era"

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment based on this is at one's own risk