ByteDance and KUAISHOU face a crucial showdown

Wallstreetcn
2025.04.22 13:09
portai
I'm PortAI, I can summarize articles.

The AI video war escalates

Author | Liu Baodan

Editor | Zhou Zhiyu

The focus of AI competitions has begun to shift towards multimodal, with ByteDance and Kuaishou increasingly intensifying their competition in the AI video sector.

Recently, Kuaishou officially launched the Keling 2.0 video generation model and the Ketu 2.0 image generation model, elevating the precision of video and image creation to a new height. At the same time, ByteDance's Seed team officially released the Seedream 3.0 technical report. According to the third-party ranking Artificial Analysis, Seedream 3.0's comprehensive performance has caught up with the SOTA model GPT-4o in text-to-image generation, entering the global top tier.

As short video platforms, ByteDance and Kuaishou are considered strong competitors in the field of AI multimodal. After more than a year of technological catch-up, both sides have made good progress in AI video generation.

According to the March data from the AI product ranking, in the global AI product growth ranking (only apps), Dream AI ranks 5th with a monthly active growth rate of 173.57%, making it the fastest-growing AI video application, with a monthly active scale of approximately 20.37 million, while Keling AI's growth rate is only 36.44%, ranking 14th. According to data released by Kuaishou, as of now, Keling AI's global user scale has surpassed 22 million.

However, there has yet to emerge a benchmark product in the AI video generation field similar to DeepSeek in the large language model (LLM) sector. According to Gartner's 2024 Emerging Technology Maturity Curve, this technology is still in the innovation trigger phase, which also means that the competition between ByteDance and Kuaishou is still in its early stages.

In the past decade, Kuaishou and Douyin have risen successively, jointly creating China's short video era. Now, as the AI era accelerates, who has a slightly better chance this time, Kuaishou or ByteDance?

Catching Up

There hasn't been a breakout product in the AI video generation field like DeepSeek, and because of this, industry players are continuously trying to iterate on technology to seize this mindset.

Entering 2025, both Kuaishou and ByteDance have begun to launch significant technological iteration results.

On April 15, Kuaishou officially released the Keling AI 2.0 video generation model and the Ketu 2.0 image generation model. The biggest highlight of Keling AI 2.0 is its technological innovation that redefines the standards for AI video generation: from "can generate" to "precisely generate," from "tool assistance" to "creative partner."

At the launch event, Kuaishou introduced a new interactive concept for AI video generation called Multi-modal Visual Language (MVL), which consists of TXT (Pure Text, semantic skeleton) and MMW (Multi-modal-document as a Word, multimodal descriptor). It can accurately achieve the creative expression of AI creators by setting foundational directions for video generation and fine-tuning control.

Based on MVL, Kuaishou launched the all-new Keling AI 2.0 Master Edition, which comprehensively upgrades the controllable generation and editing capabilities of video and image creation, launching a new multimodal video editing feature that supports secondary editing and processing Currently, AI-generated videos account for about 85% of the creative output in the field of AI video production. Kuaishou's Ketu 2.0 boasts several core advantages, such as powerful complex semantic understanding capabilities and cinematic image quality. Zhang Di introduced that Ketu 2.0's text-to-image capabilities have undergone a comprehensive upgrade, significantly enhancing the creativity and imagination of the model's output.

The day after Kuaishou's new product launch conference, ByteDance immediately disclosed the technical white paper for its text-to-image model Seedream 3.0.

On April 16, ByteDance released the Seedream 3.0 technical report, just over a month after announcing the Seedream 2.0 technical report. The biggest highlight of Seedream 3.0 includes native 2K output in just 3 seconds, greatly improving creative efficiency. Seedream 3.0 has officially launched and is now fully available on platforms such as Jimeng AI.

According to Wall Street Insights, the research and development of Seedream 3.0 began at the end of 2024. By researching the actual needs of designers and other groups, the Seedream team incorporated industry consensus indicators such as image-text matching and aesthetics into their focus areas, while also challenging industry problems such as 2K high-definition output and rapid image generation as core objectives.

Whether it is the secondary editing function of Kuaishou's AI or the native 2K image quality of Jimeng AI, both represent significant technological breakthroughs towards industrial applications. In fact, only by reaching an industrial-grade application state can the value of AI video generation be realized.

Behind this relentless competitive landscape, Kuaishou and ByteDance have been continuously laying out their strategies in the AI video generation sector over the past year.

At the beginning of 2024, OpenAI officially entered the video generation field through Sora, attracting global attention. At that time, Kuaishou was focused on overcoming key technologies in text-to-video, and four months later, Kuaishou released the Keling video generation large model, becoming the first domestic product to benchmark against Sora.

ByteDance only began discussing GPT in internal meetings in 2023, but they have been catching up quickly. By the end of last year, ByteDance's video generation models and products were officially launched in the market.

In September last year, ByteDance released two large models, Doubao Video Generation-PixelDance and Doubao Video Generation-Seaweed, officially announcing its entry into AI video generation. In November, former Douyin Group CEO Zhang Nan made his debut after nearly a year at Jianying, where Jimeng AI launched capabilities such as "one-sentence image editing," significantly improving the accuracy of text generation in images.

The importance of Jimeng AI within ByteDance has significantly increased. Wall Street Insights learned that the visual products represented by Jimeng AI are highly regarded, and ByteDance is attempting to build Jimeng into the "Douyin" of the AI era. In February, former PopAI product head Cao Dapeng joined Jimeng AI to oversee mobile products. He previously took a year to grow PopAI to tens of millions of users, with a return on investment (ROI) close to the breakeven point, making him a valuable asset.

Now, Kuaishou and ByteDance are once again in competition, both trying to bring model technology into the production-grade arena

Betting

In the AI video generation sector, ByteDance and KUAISHOU are undoubtedly the fastest-reacting tech companies in China.

This is because they both started with short videos and inherently understand video creation better, but more importantly, it reflects a kind of FOMO (Fear of Missing Out) mentality. AI technology will significantly lower the barriers to video generation. In the past, KUAISHOU and ByteDance created video platforms by reducing the barriers to video shooting, and AI is clearly more disruptive.

The essence of ByteDance and KUAISHOU's layout in the AI video sector is to replicate a new "Douyin" and "KUAISHOU" in the AI era, thereby successfully crossing a new round of technological cycles.

Currently, ByteDance and KUAISHOU have different focuses in their strategies for the AI video sector.

For KUAISHOU, AI is the biggest lever to solve the company's growth curve problem. In addition to C-end subscription users, KuaLing AI also provides API access and other services to B-end merchants. KuaLing AI has established partnerships with companies including Xiaomi and Amazon Web Services. Gai Kun disclosed that over 15,000 developers from around the world have applied KuaLing AI's API in various industry scenarios.

On March 25, KUAISHOU Technology founder and CEO Cheng Yixiao revealed in a conference call that since commercialization until the end of February 2025, KuaLing AI's cumulative operating income has exceeded 100 million RMB. He stated that KUAISHOU will continue to expand KuaLing AI's user promotion and brand influence under the premise of controllable ROI. "We are confident in achieving a leap in KuaLing AI's revenue scale by 2025."

For ByteDance, Dream AI is the core map of the entire AI strategy and a challenge that the company must overcome on its path to AGI.

Earlier this year, the ByteDance Doubao large model team has internally formed a long-term AGI research team, codenamed "Seed Edge," encouraging project members to explore longer-term, uncertain, and bold AGI research topics. The goal of Seed Edge is to explore new methods for AGI and encourage cross-modal and cross-team collaboration.

At the end of last year, Zhang Nan stated that Douyin is a camera for the "real world," and with GenAI technology, Dream AI hopes to become a camera for the world of imagination, recording everyone's creative ideas and helping every imaginative person express themselves easily and create freely.

With KUAISHOU releasing a new 2.0 model, the industry is eagerly awaiting ByteDance's next move, especially regarding when the Doubao video generation model version 1.5 will be launched, as the technological competition between the two continues.

However, the prospects for the AI video generation sector are still in the exploratory stage.

The Harmony Hui TMT Software Group, a private equity firm with billions in assets, stated to Wall Street Insights that the industry is divided on AI video generation products represented by Sora. If Sora is regarded as an AIGC video production tool, its value may not be particularly significant, possibly just disrupting creative software tools. If Sora is a general-purpose video weapon, its potential is vast, for example, in combination with robots.

Recently, Liao Qian, Vice President of Product at Shengshu Technology and head of the Vidu product, stated that when multimodal capabilities can achieve real-time control and interactivity, it can be completely personalized, and a content platform that brings new experiences will definitely emerge. This technology will be applied in various fields such as social media, gaming, VR, and AR, and will have a profound impact on all industries Overall, compared to large language models, the challenges faced by the AI video generation sector are greater. Whether it is the scaling law, the consumption of computing power, or the exploration of business models, the complexity is increasing.

This is destined to be a more challenging track. Although ByteDance and KUAISHOU-W have the genes of video platforms, to reach the end, they need to continue innovating in order to secure their position among global competitors like Veo2, Runway, and Pika