Xiaomi launches its first inference open-source large model Mimo! With 7 billion parameters, it surpasses OpenAI o1-mini and Alibaba QwQ-32B-Preview

Under the same reinforcement learning (RL) training data conditions, MiMo-7B demonstrates a significantly superior reinforcement learning potential in mathematics and coding compared to other widely used models in the industry, including well-known RL starter models such as DeepSeek-R1-Distill-7B and Qwen2.5-32B

The AI competition is fierce, and Xiaomi has also joined the fray!

On April 30th, Xiaomi launched the open-source large model MiMo, which focuses on reasoning capabilities. With only 7B parameters, it surpassed OpenAI's closed-source model o1-mini and Alibaba's 32B model QwQ in mathematical reasoning and coding competition evaluations.

According to Xiaomi, the core issue explored at the inception of Xiaomi MiMo was to stimulate the model's reasoning potential. This model integrates pre-training and post-training to comprehensively enhance reasoning abilities.

The AI competition both domestically and internationally is becoming increasingly intense. This week, Alibaba released Qwen 3, and shortly after, Musk officially announced Grok 3.5. According to previous media reports, Xiaomi is building a WanKa GPU cluster and attracting top AI talent, demonstrating a comprehensive investment in the large model field.

Performance Breakthrough: Achieving Great Capability with Small Parameters

The most striking feature of the Xiaomi MiMo model is that it surpassed OpenAI's closed-source reasoning model o1-mini and Alibaba's larger open-source reasoning model QwQ-32B-Preview on public evaluation sets for mathematical reasoning (AIME 24-25) and coding competitions (LiveCodeBench v5) with only 7B parameters.

Notably, under the same reinforcement learning (RL) training data conditions, MiMo-7B demonstrated significantly superior reinforcement learning potential in mathematics and coding compared to other widely used models in the industry, including DeepSeek-R1-Distill-7B and Qwen2.5-32B, which are well-known starting models for reinforcement learning.

Technical Key: Dual-Drive of Pre-Training and Post-Training

According to Xiaomi, the success of the MiMo model is not accidental but comes from multi-faceted innovations in both the pre-training and post-training phases.

In the pre-training phase, the Xiaomi team focused on mining corpora rich in reasoning patterns and synthesized approximately 200B tokens of reasoning data. The training process adopted a three-stage strategy, gradually increasing the training difficulty, accumulating a total of 25T tokens of training data, which is at a leading level among models of similar scale.

Innovation in the post-training phase is even more critical. The Xiaomi team proposed the "Test Difficulty Driven Reward" mechanism, effectively addressing the issue of sparse rewards in difficult algorithm problems. At the same time, they introduced the "Easy Data Re-Sampling" strategy, significantly enhancing the stability of reinforcement learning training. At the framework level, they designed the "Seamless Rollout" system, which increased the speed of reinforcement learning training by 2.29 times and the verification speed by 1.96 times.

Beyond Technology: Xiaomi's Comprehensive Investment Strategy in AI

According to Jiemian News, Xiaomi is building its own GPU-level cluster with tens of thousands of units and will make significant investments in AI large models. An insider revealed that the plan has been in implementation for several months, with Xiaomi founder Lei Jun personally involved in leading it. The insider emphasized, "In the matter of AI hardware, the core is the smartphone, not glasses. It is impossible for Xiaomi not to go 'all in' in this field."

Xiaomi's talent layout in AI is also accelerating. On December 20, Yicai reported that Luo Fuli, one of the key developers of the open-source large model DeepSeek-V2, will join Xiaomi, possibly working at the Xiaomi AI Lab to lead the Xiaomi large model team. Luo Fuli is one of the core developers of the MLA (Multi-head Latent Attention) technology, which plays a key role in reducing the usage costs of large models