Price war reversal for large models? In-depth analysis of the latest pricing from 17 manufacturers reveals that over 70% are raising prices

Wallstreetcn
2025.08.24 06:00
portai
I'm PortAI, I can summarize articles.

DeepSeek announced an adjustment to its API pricing starting from September 6, with an increase of 50% and the cancellation of night discounts. This change marks a reversal in the price war for large models, with more and more vendors halting price cuts, and some even raising prices. Among the "six small tigers" of domestic large models, four have already raised prices, while international vendors such as OpenAI and Alphabet - C have also seen their API prices stabilize or rise slightly. Overall, the pace of price decline for large models has slowed, and the industry trend is upward

DeepSeek has raised its prices.

According to a report by ZhDongxi on August 23, on August 21, DeepSeek officially announced the release of DeepSeek-V3.1 on its public account and announced that starting from September 6, DeepSeek will implement a new price list, canceling the nighttime discount introduced at the end of February this year. The pricing for reasoning and non-reasoning APIs will be unified, with the output price adjusted to 12 yuan per million tokens. This decision has increased the minimum price for using the DeepSeek API by 50% compared to the past.

DeepSeek was once known in the industry as the "price butcher." In May 2024, with DeepSeek-V2, it lowered the API price to an industry low of 1 yuan per million tokens for input and 2 yuan per million tokens for output, causing a significant stir.

In that same month, companies such as Zhipu, ByteDance, Alibaba, Baidu, iFlytek, and Tencent followed suit with price reductions, with the highest drop reaching 80%-97%. Some companies even made lightweight models available for free, sparking a prolonged price war in the large model sector that lasted over six months.

▲ Price reduction notices for large models released by some companies in May 2024

However, in 2025, an increasing number of companies have chosen to stop lowering prices. In China, among the "six small tigers" of large models, Zhipu, Moon's Dark Side, MiniMax, and Jietiao Xingchen have already raised prices for some APIs, while Baichuan Intelligent and Lingyi Wanshu have kept their prices unchanged. Major companies like Alibaba, ByteDance, Tencent, Baidu, iFlytek, and SenseTime have widely adopted tiered pricing strategies or differentiated between "reasoning" and "non-reasoning" modes. The overall API prices in the industry are stabilizing, with some products showing a noticeable increase.

Although international companies still claim that intelligence will become cheaper, the reality is that over the past year, the API prices of companies like OpenAI, Anthropic, and Google have remained largely unchanged, or even slightly increased. Meanwhile, subscription plans are becoming increasingly expensive, with top models almost locked in at high price ranges of $200/month and above, and xAI even launching a $300/month subscription plan.

Against this backdrop, DeepSeek's price increase is merely a reflection of a larger industry trend: currently, the downward speed of large model prices is gradually slowing, top AI services are no longer plunging indefinitely, but instead are beginning to stabilize and show slight increases The following data is collected from public channels. Please feel free to correct any errors or omissions.

01. DeepSeek, the six little tigers of large models see a general price increase for APIs, but two have not changed prices for nearly a year

The price war for large models was once one of the hottest keywords in the domestic AI circle in 2024, with the price of large model APIs dropping to a few dimes per million tokens at one point. However, entering 2025, this downward trend has basically stagnated, especially for the most advanced models.

Taking DeepSeek as an example, when DeepSeek-V3 was just released at the end of last year, DeepSeek offered a 45-day limited-time discount. After it ended, the output price of the DeepSeek-Chat API (non-inference API) returned from 2 yuan to 8 yuan; this API's price will further increase by 50% to 12 yuan in September this year.

The price of the Deepseek-Reason API (inference API) remains relatively stable and will decrease from 16 yuan to 12 yuan in September this year. However, overall, the price of DeepSeek APIs is still on an upward trend.

▲ Price changes of DeepSeek APIs (Chart by ZhDongxi)

Among the six little tigers of large models, the prices of Zhipu, Moon's Dark Side, Baichuan Intelligence, MiniMax, Jietiao Xingchen, and Lingyi Wanshu have not shown significant declines after the first quarter of 2025.

The API pricing of Zhipu's previous generation GLM-4 model does not differentiate between input and output token counts, uniformly set at 5 yuan per million tokens. The GLM-4.5 model released in July this year, after removing the limited-time discount policy at the model's launch, has an output price for the high-speed inference version (GLM-4.5-X) that can reach up to 64 yuan per million tokens.

Even when priced at the lowest tier (using GLM-4.5, output length less than 32K, output length less than 0.2K, inference speed of 30-50 tokens/second), its output price has increased from 5 yuan per million tokens to 8 yuan per million tokens

▲GLM-4.5 Pricing Situation (Source: Zhipu Open Platform)

Yue Zhi An Mian will officially launch its enterprise API in August 2024, with input and output pricing set at 60 yuan per million tokens in the 128K context scenario, which is considered relatively high in the industry.

In April of this year, Yue Zhi An Mian adjusted the prices of some APIs, reducing the output price of its latest K1.5 model API to 30 yuan per million tokens. However, after the launch of Kimi K2, the high-speed output price in the 128K context scenario returned to 64 yuan per million tokens.

▲Yue Zhi An Mian Kimi Large Model API Pricing Changes, Data Selected is the Highest Tier Pricing (Chart by ZhDongxi)

Baichuan Intelligence has not adjusted its API prices for a long time, with the calling price of its flagship model Baichuan4 remaining at 100 yuan per million tokens for both input and output since its release in May 2024.

▲Baichuan Intelligence API Price List (Source: Baichuan Intelligence)

In August 2024, MiniMax significantly reduced the price of its then flagship text generation model abab-6.5s, with both input and output prices unified at 1 yuan per million tokens. However, this model is currently not visible on its API open platform.

The pricing for MiniMax's next-generation text generation model MiniMax-Text-01 (to be released in January 2025) is set at 1 yuan per million tokens for input and 8 yuan per million tokens for output; while its inference model MiniMax-M1 (to be released in June 2025) will adopt a tiered pricing structure, with the highest price at 2.4 yuan per million tokens for input Output 24 yuan/million tokens.

▲ MiniMax large model API pricing trend, the selected data is all at the highest pricing tier (Chart by ZhiDongXi)

JieYue XingChen features multimodality. In April of this year, the company released the Step-R1-V-Mini multimodal inference model, with an output price of 8 yuan/million tokens. The new generation multimodal inference model Step 3 released in July adjusted to tiered pricing, with prices for input ≤ 4k remaining stable or slightly decreased, while prices in the highest tier (4k < input ≤ 64k) saw a certain increase, with an output price of 10 yuan/million tokens. Meanwhile, Step 3 has a maximum context window of 64K, which is smaller than the 100K of Step-R1-V-Mini.

▲ JieYue XingChen large model API pricing trend, the selected data is all at the highest pricing tier (Chart by ZhiDongXi)

LingYi WanWu will release Yi-Lighting in October 2024, priced at 0.99 yuan/million tokens, and has not updated the model prices in the API since then. Currently, when calling Yi-Lighting, it will intelligently route to models such as DeepSeek-V3 and Qwen-30B-A3B based on user input.

▲ LingYi WanWu large model API pricing table (Image source: LingYi WanWu)

02. Several major companies refine pricing rules, additional charges for model outputs exceeding 300 words

The more "financially generous" major companies have also slowed down the pace of model price reductions in 2025 ByteDance launched the Doubao Pro family for the first time in May 2024, with the input price of the Doubao general model Pro for contexts under 32K being only 0.8 yuan per million tokens, and the output price being 2 yuan per million tokens. Tan Dai, president of ByteDance's Volcano Engine, stated at the launch that this pricing "is 99.3% lower than industry prices." This release has also pushed the price war of large models to the forefront of public opinion.

In the context of 32K, the Doubao 1.5 Pro released in January 2025 and the Doubao 1.6 released in July 2025 maintained the price level of the Doubao general model Pro.

However, ByteDance further refined the pricing rules, adjusting prices based on the two variables of input and output. When the model output exceeds 200 tokens (approximately 300 Chinese characters), the output price of Doubao 1.6 changes to 8 yuan per million tokens, while the input price remains unchanged.

▲ Doubao 1.6 tiered pricing details (Image source: Volcano Ark)

From the first generation Doubao Pro to Doubao 1.5 Pro and then to Doubao 1.6, the trend of the highest price changes for ByteDance's Doubao large model API is as follows:

▲ Price change trend of ByteDance's Doubao large model API, with selected data being the highest tier pricing (Chart by Zhixiaodong)

Alibaba provides large model API services through Alibaba Cloud Bailian. Due to the numerous large models under Alibaba, their rapid update frequency, and the distinction between open-source and commercial versions, the overall statistics may appear somewhat complex. Zhixiaodong mainly tracked the price changes of one of its main commercial API services, Qwen-Plus, since 2025.

It can be seen that after the new version of Qwen-Plus was launched in April this year, introducing the distinction between thinking and non-thinking modes, the price for thinking outputs reached four times that of non-thinking outputs.

After the version update in July this year, Qwen-Plus fully adopted a tiered pricing model, with the calling price for inputs below 128K remaining the same as the pricing in April, but when the input exceeds 128K, the price shows a significant increase, with the highest output price reaching 64 yuan per million tokens

▲ Alibaba Qwen-Plus API price changes (compiled by ZhDongxi)

In July 2024, Baidu announced a price reduction for its flagship model ERNIE 4.0, offering services at a price of 40 yuan per million tokens for input and 120 yuan per million tokens for output. Subsequently, Baidu gradually lowered the inference price of ERNIE 4.0 to the industry-standard input price of 4 yuan per million tokens and output price of 16 yuan per million tokens (the specific time of this price reduction was not found). The ERNIE 4.5 launched in March this year maintained this pricing without further reduction.

▲ Prices of ERNIE 4.0 and ERNIE 4.5 models (source: Baidu)

Tencent is one of the few major companies in China that is still gradually lowering the API prices for large models. In September 2024, Tencent released the Hunyuan Turbo large model, priced at 15 yuan per million tokens for input and 50 yuan per million tokens for output, which was relatively high at that time.

However, the price of Hunyuan Turbo has now dropped to 2.4 yuan per million tokens for input and 9.6 yuan per million tokens for output. The price of Hunyuan TurboS, released in March 2025, has further decreased to 0.8 yuan per million tokens for input and 2 yuan per million tokens for output.

▲ Prices of some Tencent Hunyuan large models (source: Tencent Cloud)

iFlytek's API services are billed based on token packages, without distinguishing between input and output, and the unit price of tokens varies across different packages According to the median price range calculation, the price of Spark 3.5, which will be launched in January 2024, is approximately 25 yuan per million tokens, while the price of Spark 4.0, which will be launched in June of the same year, is approximately 60 yuan per million tokens. The Spark 4.0 Turbo, which will be released in October of the same year, and the upgraded version of Spark 4.0 Turbo in January 2025, will maintain this price.

▲ Price changes of iFlytek Spark 3.5, Spark 4.0, and Spark 4.0 Turbo (Chart by ZhDongxi)

However, iFlytek has also launched a deep reasoning large model called Spark X1, which is trained based on fully domestic computing power, with a price of approximately 11 yuan per million tokens.

The API price of SenseTime's flagship model, the Daily New series, has decreased from 20 yuan per million tokens in May 2024 to 9 yuan per million tokens in April 2025. The latest release, SenseNova-V6.5 Pro, in July of this year, has maintained this price.

▲ The corresponding models are Daily New SenseChat-5-1202, SenseNova-V6-Pro, and SenseNova-V6.5 Pro, all of which are the most advanced models released by SenseTime at that time (Chart by ZhDongxi)

03. Overseas large model vendors "say one thing and do another," subscription plans rise to the $200 level

Among international mainstream large model vendors, although there has not been a significant price war, the "promotion" that the cost of intelligence will continue to decrease is one of the hottest topics among several big names in the overseas AI circle.

In July of this year, OpenAI co-founder and CEO Sam Altman stated, "The price of intelligence will drop to an immeasurable level. We can reduce the cost of each unit of intelligence to one-tenth of the original price per year, at least for the next five years."

In September 2024, Google CEO Sundar Pichai shared the same view: "In the near future, intelligence will be as abundant as air and essentially free for everyone." Recently, statistics from The Information revealed a reality that contradicts the aforementioned viewpoint: the API prices of major overseas large model vendors have not shown a significant decline over the more than one year following July 2024, and there has even been a slight increase.

For example, the price of OpenAI's GPT series models per million tokens has remained at $11.25 since it dropped to $12.5 at the end of 2024, without further significant decreases.

Anthropic's Claude 3 and Claude 4 series models have never reduced their prices since their launch.

The invocation price of Google's Gemini Pro model has increased, rising from $12.5 per million tokens for Gemini-1.5 Pro to $17.5 per million tokens.

▲ The prices of cutting-edge general models have basically not decreased recently (Image source: The Information)

In the past year, several leading overseas AI companies have successively launched high-tier subscription plans with monthly fees exceeding $200.

Both OpenAI and Anthropic have introduced subscription tiers priced at $200/month; Google's latest AI Ultra bundle is priced at $249.99/month; xAI's Grok has gone even further, setting its top subscription plan at a high price of $300/month.

The common feature of these high-end subscription services is that users can only access the highest-scoring and most powerful flagship models showcased at press conferences by paying exorbitant monthly fees. Whether it is stronger reasoning capabilities, longer context windows, or more precise code or complex task handling abilities, these features are kept behind a paywall, making high-performance models exclusive resources for high-paying users.

So, what exactly is the reason for the apparent stagnation in the downward trend of AI service prices in recent times, and even a reverse increase?

  1. Continuous rise in the prices of computing power, data, and talent, large model players must also consider ROI

The significant investments made by large model vendors in computing power, data, and talent have driven the rapid improvement in AI model performance over the past year.

In terms of computing power, the rental prices of GPUs have now stabilized. Data collected by Zhixiaodong shows that around September 2024, the rental price per hour for H100 on major public clouds such as AWS, Microsoft Azure, and Google Cloud is approximately in the range of $5-11 per card This year, according to the GPU price index from the computing power market data analysis company Silicon Data, the H100 has basically stabilized in the rental price range of $2-3 per card per hour, with no significant price fluctuations.

▲H100 GPU rental price (Source: Silicon Data)

At the same time, the demand for computing power in the training and inference phases of the new generation of large models is continuously increasing. Coupled with the relatively stable GPU prices, the cost of computing power has become one of the "hard thresholds" limiting the further decline of AI service prices.

Data is also a cost item that cannot be ignored in the training of today's large models. Initially, due to the lack of regulation, the cost of acquiring training data for large models was relatively low. With the increase in related lawsuits and stricter compliance reviews, manufacturers have begun to proactively sign contracts with companies to purchase licensed data to avoid legal disputes with data owners.

For example, according to The Wall Street Journal, the data usage agreement signed between OpenAI and the American publishing group News Corp could be worth up to $250 million over five years; Google has reached an AI content licensing agreement with the American forum platform Reddit, with Reuters reporting that the annual price is about $60 million.

Meanwhile, the prices of talents behind these models are also rising.

In China, the report "2025 Mid-Year Talent Supply and Demand Insight" released by Liepin Big Data Research Institute in July shows that the current domestic AI talent gap has exceeded 5 million, with the average annual salary of AI technical personnel at 323,500 yuan, and the proportion of AI technical positions with an annual salary of over 500,000 yuan reaching as high as 31.03%. The expected annual salary of AI technical talents is even higher than the current average annual salary, at 440,900 yuan.

Across the ocean, the competition for AI talent in Silicon Valley is fierce. Besides a few individual cases worth hundreds of millions of dollars, the overall salary level for AI talent is also significantly higher than in other industries. Data from the international job platform Levels.FYI shows that in the San Francisco Bay Area, the median salary for ML/AI engineers is about 13% higher than the median salary for all software engineers. Considering that the statistical category of all software engineers includes ML/AI engineers, the salary advantage for the latter may be even greater.

![](https://mmbiz-qpic.wscn.net/mmbiz_jpg/z7ZD1WagSLjfhWLEqa7JwZaVWF3n5bzx4UUUyNHkxZwrxx0umdXYvOmmUENLFUJT4xelAJTIqjIAicnSfVzuR1A/640? from=appmsg)

▲ Salaries of ML/AI engineers in the San Francisco Bay Area, USA (Source: Levels.FYI)

  1. Subscription models face the test of service costs, and cost control is urgent

The cost of building large models is becoming increasingly expensive, and with the rise of inference model paradigms and the emergence of long-sequence tasks like Agents, user consumption is continuously increasing. Large model subscriptions are like an "unlimited data plan"; the more users consume, the higher the service costs for large model providers, and some providers have been pushed to the point of unsustainability by users.

This month, Anthropic's Claude Code programming Agent canceled its $200/month subscription plan's unlimited access to large models, citing that some users were using the large model almost 24 hours a day, resulting in AI service costs reaching tens of thousands of dollars per month, far exceeding the subscription pricing.

Anthropic even claimed at a press conference that Claude 4 Opus can work continuously for 7 hours to complete programming tasks. Based on Claude 4 Opus's inference speed of approximately 50 tokens/second, this task would consume about 1.26 million tokens, costing approximately $113.4.

Faced with high service costs, large model providers are employing various methods to reduce expenses.

DeepSeek has proposed multiple cost-reduction methods in its latest generation model. For example, after conducting chain-of-thought compression training on DeepSeek-V3.1, the number of output tokens during model inference can be reduced by 20%-50%, while maintaining average performance on various tasks comparable to DeepSeek-R1-0528. This means that DeepSeek's chatbot can operate without affecting performance.

DeepSeek-V3.1 also supports both thinking and non-thinking modes within a single model, allowing developers to control the inference switch through specific tags, further saving API usage costs.

Tencent's approach to cost reduction with Hunyuan is through architectural innovation. On Hunyuan TurboS, Tencent has integrated two architectures, combining the contextual understanding of Transformers with the long-sequence processing capabilities of Mamba, achieving a balance between performance and efficiency.

OpenAI has adopted a "model auto-routing" approach on GPT-5: assessing the complexity of tasks and assigning relatively simple requests to lightweight models, thereby saving computing resources. Microsoft Azure, which hosts GPT-5, claims that this method can reduce inference costs by up to 60%.

However, the key issue is that the cost reductions for large model providers and cloud service providers do not necessarily translate into lower usage costs for end users and enterprises. Currently, how to truly convert hundreds of billions of dollars in AI investment into commercial value after high upfront R&D and deployment costs has become a question that all large model players must answer 06. Conclusion: Is there still room for the price of large models to drop?

In the future, there are several paths for the decline in the price of large models. On one hand, as the average performance of models improves, optimized low-end and inexpensive models can efficiently solve specific tasks. Additionally, with continuous advancements in foundational research in the fields of large models and chips, new technological paths are emerging, which may further compress the unit costs of training and inference without sacrificing effectiveness.

From the perspective of industrial development, the temporary stagnation or rebound in the price of large models has its value. This provides a buffer period for manufacturers to recover the substantial R&D and infrastructure investments made in the early stages, maintaining sustainable innovation, and can also accelerate the market's exploration of clear commercialization scenarios and payment models. The industry is expected to take this opportunity to create a more mature and healthy ecosystem.

Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at one's own risk