MiniMax released the M2.5 model: running for 1 hour costs only 1 USD, which is 1/20 of GPT-5's price, with performance comparable to Claude Opus

The M2.5 model achieves a dual breakthrough in performance and cost. The price is only 1/10 to 1/20 of mainstream models like GPT-5. Its performance rivals that of Claude Opus, winning the championship in the multilingual programming test Multi-SWE-Bench, with a 37% improvement in task completion speed compared to the previous generation. It adopts a native Agent reinforcement learning framework, with 30% of tasks already completed autonomously by AI, and code generation in programming scenarios accounting for 80%

MiniMax has launched its latest iteration of the M2.5 series model, significantly reducing inference costs while maintaining industry-leading performance, aiming to address the economic infeasibility of complex Agent applications, and claiming that it has reached or refreshed the industry SOTA (state-of-the-art) level in programming, tool invocation, and office scenarios.

On February 13, MiniMax announced data showing that the M2.5 exhibits significant price advantages. In the version that outputs 50 tokens per second, its price is only 1/10 to 1/20 of mainstream models such as Claude Opus, Gemini 3 Pro, and GPT-5.

In a high-speed operating environment that outputs 100 tokens per second, the cost for M2.5 to run continuously for one hour is only $1, and if reduced to 50 tokens per second, the cost further drops to $0.3. This means that a budget of $10,000 is sufficient to support 4 Agents running continuously for a year, greatly lowering the threshold for building and operating large-scale Agent clusters.

In terms of performance, M2.5 performs strongly in core programming tests and achieved first place in the multi-language task Multi-SWE-Bench, matching the overall level of the Claude Opus series. At the same time, the model has optimized its ability to decompose complex tasks, achieving a 37% improvement in task completion speed in the SWE-Bench Verified test compared to the previous generation M2.1, with end-to-end runtime reduced to 22.8 minutes, which is on par with Claude Opus 4.6.

Currently, MiniMax's internal business has been the first to validate the capabilities of this model. Data shows that 30% of its overall internal tasks have been autonomously completed by M2.5, covering core functions such as R&D, product, and sales. Particularly in programming scenarios, the code generated by M2.5 accounts for 80% of newly submitted code, demonstrating the high penetration and usability of this model in real production environments.

Breaking Through Cost Barriers: Economic Feasibility of Infinite Running Agents

The design intention of M2.5 is to eliminate the cost constraints of running complex Agents. MiniMax has achieved this goal by optimizing inference speed and token efficiency. The model provides a reasoning speed of 100 TPS (transactions per second), approximately twice that of current mainstream models.

In addition to simply reducing computing costs, M2.5 has reduced the total number of tokens required to complete tasks through more efficient task decomposition and decision logic.

In the SWE-Bench Verified evaluation, M2.5 consumes an average of 3.52M tokens per task, lower than M2.1's 3.72M.

The dual enhancement of speed and efficiency allows enterprises to economically build and operate Agents almost without limits, shifting the competitive focus from cost to the iteration speed of model capabilities.

Advancing Programming Capabilities: Thinking and Building Like an Architect

In the field of programming, M2.5 not only focuses on code generation but also emphasizes system design capabilities. The model has evolved to exhibit native Spec (specification) behavior, allowing it to proactively decompose functions, structures, and UI designs from an architect's perspective before coding.

This model has been trained in over 10 programming languages (including GO, C++, Rust, Python, etc.) and in hundreds of thousands of real environments.

Tests show that M2.5 is capable of handling the entire process from system design (0-1), development (1-10), to functional iteration (10-90) and final code review (90-100).

To verify its generalization in different development environments, MiniMax conducted tests on programming scaffolds such as Droid and OpenCode.

The results show that M2.5 achieved a pass rate of 79.7 on Droid and 76.1 on OpenCode, both outperforming the previous generation model and Claude Opus 4.6.

Complex Task Handling: More Efficient Search and Professional Delivery

In terms of search and tool invocation, M2.5 demonstrates a higher level of decision-making maturity, no longer simply pursuing "doing it right," but seeking to solve problems through more streamlined paths.

In tasks such as BrowseComp, Wide Search, and RISE, M2.5 has saved approximately 20% in round consumption compared to previous generations, approaching results with better token efficiency.

For office scenarios, MiniMax has integrated industry tacit knowledge into model training by collaborating with seasoned professionals in finance, law, and other fields.

In the internally constructed Cowork Agent evaluation framework (GDPval-MM), M2.5 achieved an average win rate of 59.0% in pairwise comparisons with mainstream models, capable of producing industry-standard Word research reports, PPTs, and complex Excel financial models, rather than simple text generation.

Technical Foundation: Native Agent RL Framework Drives Linear Improvement

The core driving force behind M2.5's performance enhancement comes from large-scale reinforcement learning (RL) MiniMax has adopted a native Agent RL framework called Forge, which decouples the underlying training and inference engine from the Agent by introducing an intermediate layer, supporting the integration of any scaffolding.

At the algorithm level, MiniMax continues to use the CISPO algorithm to ensure the stability of the MoE model during large-scale training, and has introduced a process reward mechanism to address the credit allocation challenges brought by the long context of the Agent.

In addition, the engineering team optimized the asynchronous scheduling strategy and tree-based merging training sample strategy, achieving approximately 40 times training acceleration, validating the trend that model capability shows near-linear improvement with increased computing power and task numbers.

Currently, M2.5 has been fully launched in MiniMax Agent, API, and Coding Plan, and its model weights will be open-sourced on HuggingFace, supporting local deployment