Meituan's large model is here! Open-source "Changmao," performance on par with DeepSeek V3.1, also focusing on "computing power savings."

Wallstreetcn
2025.08.31 00:50
portai
I'm PortAI, I can summarize articles.

Meituan has open-sourced the LongCat-Flash large model, which features a hybrid expert model with 56 billion parameters, pursuing excellent performance and computational efficiency. Through the "zero computation" expert mechanism, the model dynamically allocates computing resources, activating only 18.6 billion to 31.3 billion parameters, significantly saving computing power. The introduction of the shortcut connection hybrid expert model (ScMoE) enhances training and inference throughput. LongCat-Flash has undergone multi-stage training and aims to become an intelligent agent for solving complex tasks

Just now, Meituan open-sourced their giant model LongCat-Flash.

A mixture of experts (MoE) model with 56 billion parameters.

It not only pursues excellence in performance but also achieves remarkable computational efficiency and advanced agent capabilities through a series of architectural and training innovations.

LongCat-Flash, while ensuring powerful capabilities, allocates computational resources to the "cutting edge."

It does not activate all 56 billion parameters for every task but achieves dynamic resource allocation through clever design.

One of the most innovative designs of LongCat-Flash is the "Zero-computation" expert mechanism.

The model can intelligently assess the importance of different parts of the input content and assign tasks with lower computational demands (such as common words and punctuation) to a special "zero-computation" expert.

This expert does not perform actual complex calculations but directly returns the input, greatly saving computational power.

Thanks to this, the model only needs to dynamically activate 18.6 billion to 31.3 billion parameters (averaging about 27 billion) when processing each token, achieving a perfect balance between performance and efficiency.

In large-scale MoE models, communication delays between different "expert" modules often become performance bottlenecks.

To address this, the giant model introduces the shortcut-connected mixture of experts model (Shortcut-connected MoE, ScMoE).

The ScMoE architecture effectively expands the overlap window of computation and communication by introducing a shortcut connection, significantly enhancing the throughput of training and inference, making the model respond faster.

To ensure that the model can not only "chat" but also serve as an "intelligent agent" capable of solving complex tasks, LongCat-Flash underwent a carefully designed multi-stage training process tailored for agents.

This process includes large-scale pre-training, targeted enhancement of reasoning and coding capabilities in mid-stage training, and a focus on dialogue and tool usage capabilities in post-training.

This design allows it to excel in executing complex tasks that require tool invocation and interaction with the environment. An interesting and noteworthy detail is that the official technical report emphasizes that LongCat-Flash was trained on a large-scale cluster containing tens of thousands of accelerators.

This wording is very precise.

In the current AI field, while people usually think of NVIDIA's GPUs, the term "accelerator" is a broader concept that can include Google's TPUs, Huawei's Ascend, or other chips specifically designed for AI computing.

The official choice to use this terminology, without explicitly stating "GPU," leaves some room for imagination regarding the specific hardware source and reflects its precision in technical statements.

Regardless of the specific hardware, completing a training volume of over 200 trillion tokens in just 30 days on such a massive cluster is enough to demonstrate the strength of the underlying infrastructure and the excellence of engineering optimization.

The engineering optimization results of LongCat-Flash are ultimately reflected in the user-perceived performance and cost:

Extremely high inference speed: Inference speed exceeds 100 tokens per second (TPS).

Extremely low operating cost: The cost of processing one million output tokens is only $0.7.

Powerful comprehensive capabilities: Supports 128k long text context and demonstrates competitiveness comparable to industry-leading models in various aspects such as code, reasoning, and tool invocation.

To more intuitively showcase the strength of LongCat-Flash, let's take a look at its detailed evaluation comparison with other top models in the industry.

Meituan's LongCat-Flash model has shown very strong and competitive performance in various benchmark tests.

It not only matches the industry's top open-source models (such as DeepSeek V3.1, Qwen3) in multiple aspects but even surpasses them in certain specific capabilities.

General Domains capability shows that LongCat-Flash performs stably and excellently in tests measuring the model's general knowledge and reasoning ability.

MMLU / MMLU-Pro:

This is a core metric for measuring the model's overall knowledge level.

LongCat-Flash's scores (89.71 / 82.68) are on par with DeepSeek V3.1, Qwen3 MoE, and Kimi-K2, proving its solid foundational knowledge and reasoning ability ArenaHard-V2:

This benchmark focuses more on the model's "feel" as a chat assistant and its ability to handle complex instructions. LongCat-Flash scored 86.50 in this category, surpassing DeepSeek V3.1 and coming very close to Qwen3 MoE (88.20), indicating its excellent conversational and reasoning capabilities.

Chinese Language Ability (CEval / CMMLU):

As an authoritative test in the Chinese domain, LongCat-Flash performed excellently on CEval (90.44) and maintained a good level on CMMLU, proving its strong support for the Chinese language.

Instruction Following: This is the most prominent highlight of LongCat-Flash.

The technical report mentions that the model underwent specialized multi-stage training for "Agent" capabilities, and the evaluation results corroborate this.

IFEval & COLLIE:

These two benchmarks specifically assess the model's ability to understand and execute complex, multi-step instructions.

In IFEval, LongCat-Flash scored (89.65), ranking among the top, surpassing DeepSeek V3.1 and on par with Kimi-K2 and Qwen3 MoE.

In the COLLIE test, LongCat-Flash achieved a high score of 57.10, ranking first among all compared models.

This strongly demonstrates its outstanding ability in executing complex "intelligent agent" tasks that require tool invocation and interaction with the environment.

Currently, the LongCat-Flash model has been released on Hugging Face and GitHub communities, following the MIT license.

Researchers and developers in the global academic and industrial sectors can freely use and explore this powerful model to jointly promote the development of AI technology.

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account individual users' specific investment goals, financial situations, or needs. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk