The track that Yang Likun and Zhu Xiaohu are not optimistic about is quietly making money overseas

Wallstreetcn
2025.07.02 13:52
portai
I'm PortAI, I can summarize articles.

In recent years, video generation models have quietly risen in the AI field, despite being considered a niche track. According to Extraordinary Data, KUAISHOU's Kegong AI reached an annual recurring revenue of USD 100 million by June 2025, while MiniMax's Conch AI and Shengshu Technology's Vidu are also close to USD 10 million respectively. Huang Weilin from ByteDance predicts that the annualized revenue of leading video generation products will grow from USD 100 million to USD 500-1 billion. Although the market was pessimistic about video generation models a year ago, the actual revenue today shows its potential

Since 2025, the widely circulated stories of AI wealth have largely existed in two tracks: Agents represented by Manus and AI hardware represented by Plaud.

However, beyond the sexy AI application stories of Agents and hardware, an old track that has cooled down—video generation models—is taking off with a group of domestic AI companies:

According to monitoring by Extraordinary Data, in June 2025, KUAISHOU's Keling AI, the ARR (Annual Recurring Revenue) of its App & Web has reached 100 million USD. Among startups, MiniMax's Hailuo AI and Shengshu Technology's Vidu have also achieved an ARR of around 10 million USD just on the Web side.

Multiple insiders revealed to "Intelligent Emergence" that the actual subscription revenue of these products is even higher.

Moreover, the cash flow of large language models, which has not yet been properly established, has already achieved a positive cash flow in the video generation track.

Extraordinary Data shows that the monthly revenue of its video generation model PixVerse has reached 840,000 USD. According to official statements, subscription revenue can already cover most of the company's costs, with cash flow approaching positive.

At the 2025 Zhiyuan Conference, Huang Weilin, head of image and video generation at ByteDance, gave an optimistic judgment: the annualized revenue (ARR) of leading video generation products is expected to reach 100 million USD this year, and may grow to 500 million to 1 billion USD next year.

However, just a year ago, Sora-like video generation models faced a "disliked by everyone" situation in the domestic market. The reason lies in: video large models are too expensive, and the returns are uncertain, making it unaffordable for most companies.

Tencent Technology once reported that Wang Changhu, who previously served as the head of visual technology at ByteDance, was advised by Zhu Xiaohu of Jinsha River Venture Capital to reconsider when founding Aishi Technology: "You should go back to work; there is no opportunity for large models in China." In September 2024, when MiniMax released the video generation model Hailuo AI amid the pressure from KUAISHOU and ByteDance, it was also once met with market skepticism.

Even representatives of major players like Baidu judged video models as "not profitable" at their Q3 director meeting in 2024: "The investment cycle for video generation like Sora is too long; it may take 10 to 20 years to see business returns." Meta AI's chief scientist Yang Likun also criticized the limitations of video models in understanding the physical world from a technical perspective.

An investor who once gave up on investing in Wang Changhu provided a judgment that represented the consensus at the time: the ROI of video models cannot turn positive in the short term, and startups will be eliminated by 2-3 major companies, just like in the language model track.

Indeed, in 2024, many domestic video startups found themselves on the brink of a cliff: struggling with financing and unable to find PMF. For example, AI video startup Luying Technology, which had received investments from Red Dot China and BlueRun Ventures, was acquired in December 2024 However, in less than a year, the ARR of Aishi Technology made the aforementioned investors change their minds. He told "Intelligent Emergence" that he "regretted it deeply": "The collective misjudgment of investors is that video models do not make money."

Real money "slapping" public opinion, the wealth creation experiences of these Chinese AI video companies can be summarized as the result of the combined effects of three factors: track, market, and marketing.

First, let's look at the track.

It has been proven that even though video generation technology is more nascent than language, consumer tolerance is stronger. The reason is that the video generation field is a track driven by aesthetic demand.

"The different data strategies of each company, even immature technologies and biases in training data, can lead to different video generation styles," an investor told us. "Video creation is also a market with diverse aesthetics, and each video model has its own consumers."

For example, many users have found that KUAISHOU's Keli AI is very good at generating food and eating broadcast-related shots. This is believed to be related to the rich resources of eating broadcast videos in KUAISHOU's short videos.

As for the market—going overseas has become a common consensus in the AI industry, especially in the European and American markets where users have stronger payment capabilities and higher acceptance of new products.

For instance, MiniMax's Hailuo AI faced resistance from creators due to the subscription fee system launched domestically, but it previously gained six times the user volume overseas and achieved tens of millions of dollars in ARR.

However, beyond the inherent advantages of overseas markets, the "cost-performance" ecological niche occupied by domestic video models overseas is also worth noting.

Many practitioners believe that the relatively limited funding and computing power have forced domestic AI video startups to spend considerable effort on cost optimization, which has given them a price advantage in going overseas.

For example, when generating videos of the same duration and resolution, the model costs of Hailuo AI and Vidu from startups are only 1/10 to 1/6 of Sora.

Finally, let's look at marketing.

It can be seen that video social media platforms like TikTok and YouTube play a crucial role in the growth strategies of AI video companies.

An employee of Aishi Technology once told "Intelligent Emergence" that an important milestone for the growth of its video model PixVerse was at the end of 2024, when the total exposure of the venom effect on short video platforms like TikTok and Douyin exceeded 100 million times. An investor also mentioned that Pika's "Nian Nian" effect and Hailuo AI's "Half Cat" are key drivers of growth.

Pika squeeze effect. Source: Pika official Xiaohongshu

"For today's model companies, simply climbing the technology rankings is no longer sufficient for growth," summarized the investor. "Everyone needs to actively find or even create scalable demand."

In the context of video creation, creators' needs go beyond improving productivity; they also seek incentives like traffic. "The viral gameplay created by AI video companies actually meets the creators' demand for incentives."

A piece of good news for entrepreneurs is that, according to the rankings released by a16z, in January 2025, the user traffic of HaiLuo AI (ranked 12th) surpassed that of OpenAI's Sora (ranked 23rd) and KUAISHOU's KeLing AI (ranked 20th).

This indicates that the video generation sector is not as monopolized as the language model sector; the landscape is still undecided, and startups still have plenty of opportunities.

Wang Changhu once told "Intelligent Emergence" that despite rapid development, the current video generation sector is still in the stage from GPT-2 to GPT-3, where many technical challenges remain to be overcome, presenting opportunities for startups.

However, while replicating the blood-making methodology, it is also important to recognize that the entry bonus period for the video generation sector is gradually fading. Additionally, the pressure on existing video model companies remaining at the table will increase.

In March 2024, Wang Changhu predicted in a media interview that it would be difficult for new startups to enter the market thereafter, stating, "If sufficient funding, users, teams, and technological accumulation were not secured during the first phase of development, the upcoming competition may not have enough resources to stay at the table."

An AI investor also indirectly corroborated this point. She told "Intelligent Emergence" that even though the landscape in the video generation field is not yet settled, it is difficult to allocate investments to new entrants, "unless a company becomes a dark horse like DeepSeek."

At the same time, she pointed out that the amount of financing video companies can secure is an order of magnitude less than that of language models. "As time goes on, the resource disadvantages of startups will become more pronounced," she summarized. "The continuous iteration of technology by KeLing AI and JiMeng AI is an advantage."

This harsh reality also forces the current AI video startup players at the table to accelerate their self-sustaining efforts.

Author of this article: Zhou Xinyu, Source: 36Kr, Original title: "The track that Yang Likun and Zhu Xiaohu are not optimistic about is quietly making money overseas"

Risk warning and disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at one's own risk