Track Hyper | Catching up with the world's top: Qianwen 3 reasoning model open-sourced

Wallstreetcn
2025.08.06 08:04
portai
I'm PortAI, I can summarize articles.

Recently, Alibaba's AI has achieved remarkable success, winning three global open-source championships

Author: Zhou Yuan / Wall Street News

On July 25, Alibaba open-sourced the Qianwen 3 inference model.

This is the first code model in the Qianwen series to adopt the Mixture of Experts (MoE) architecture, with a total parameter count of 480 billion. It natively supports a context of 256K tokens and can be expanded to a length of 1 million, helping programmers perfectly complete basic programming tasks such as writing and completing code, significantly improving programming work efficiency.

The Mixture of Experts model (MoE) is an efficient neural network architecture design, with the core idea of enhancing model performance through division of labor while controlling computational costs. This has become a key technology for balancing performance and efficiency, especially as the parameter scale of large models breaks through hundreds of billions and trillions.

In simple terms, the MoE architecture is like an intelligent team: there are many members (experts) with specialized roles, but only the most suitable few are allowed to work on each task (gated scheduling), ensuring efficiency while being able to handle more complex demands.

According to public information, the model shows significant performance improvements in key dimensions such as knowledge reserve, programming ability, and mathematical computation, comparable to top global closed-source models like Gemini-2.5 pro and o4-mini.

From July 21 to July 25, Alibaba continuously open-sourced three important models, achieving remarkable results and becoming the global open-source champion in the fields of foundational models, programming models, and inference models.

This series of actions not only engages technical developers in research but also draws the attention of business decision-makers to the application of these technological achievements in practical business, which may positively impact the application landscape of AI technologies.

After the release of the Tongyi Qianwen 3 flagship model, the Tongyi team continues to optimize its inference capabilities.

The newly open-sourced Qianwen 3 inference model supports a context length of 256K, enabling it to easily handle long documents and multi-turn dialogues, avoiding the loss of key information.

In knowledge assessment (SuperGPQA), programming ability assessment (LiveCodeBench v6), and other tests, it performs close to top closed-source models and ranks among the best in open-source models.

Compared to its predecessor, this model shows significant improvements in complex problem decomposition analysis, fluency, and accuracy: for example, when handling multi-step logical reasoning questions, it can present the reasoning process more clearly.

The Qwen3-235B-A22B-Instruct-2507 (non-thinking version) that was open-sourced during this period shows significant performance improvements, surpassing closed-source models like Claude4 (Non-thinking) in tests covering multiple ability dimensions such as GPQA knowledge assessment and AIME25 mathematics assessment.

These tests comprehensively measure the model's overall capabilities from multiple angles, including knowledge coverage, mathematical logical operations, and code writing accuracy.

AI research institution Artificial Analysis evaluated the newly open-sourced Qianwen 3 model as "outstanding among non-thinking foundational models," based on the model's specific performance across various metrics.

In the field of AI programming, Qwen3-Coder outperformed GPT4.1 and Claude4 in tests such as multi-language SWE-bench, successfully topping the Hugging Face model leaderboard This ranking comprehensively evaluates various data such as model download volume, usage frequency, and user ratings, and is highly recognized within the industry.

In practical applications, programmers can generate basic brand official websites in as little as 5 minutes using it, allowing entry-level programmers to accomplish in one day what a senior programmer would do in a week.

As of now, Alibaba has open-sourced over 300 Tongyi large models, with the number of derivative models exceeding 140,000, surpassing Meta's Llama series, creating a massive family of open-source models that are widely used among developers and enterprises.

These derivative models have been fine-tuned by global developers for different scenarios and are applied in various industries such as education, finance, and healthcare. For example, derivative models in the education sector can assist teachers in generating personalized exercises, while models in the finance sector can perform simple risk assessments.

According to data from the well-known overseas model API aggregation platform OpenRouter, the API call volume for Alibaba's Qianwen has surged. As of July 25, the API scale has exceeded 100 billion tokens in just a few days, ranking among the top three globally on the OpenRouter trend chart, making it the most popular model at present.

This data reflects the model's popularity, especially favored by small and medium-sized development teams, as its open-source nature reduces usage costs while its performance meets project requirements.

Alibaba's open-source models allow Chinese enterprises to use them commercially for free, which lowers the threshold for small and medium-sized enterprises to apply AI technology, enabling more companies to enjoy the technological dividends. At the same time, it is open to enterprises in countries such as the United States, France, and Germany, helping underdeveloped countries obtain local derivative models, enriching the diversity of the AI open-source community and promoting the global dissemination of technology.

Wall Street Insights has noted that when enterprises implement AI, they often package models with cloud products for procurement.

For instance, when e-commerce companies use the Tongyi Qianwen model for intelligent customer service responses, they will also purchase Alibaba Cloud's database to store customer information and Alibaba Cloud's security services to ensure data security, forming an ecological closed loop.

This model enhances the depth of use of Alibaba Cloud products and their relevance to customers, increasing customer stickiness.

Currently, some organizations are migrating AI workloads to the cloud, and enterprises that have deployed cloud architectures are actively integrating AI capabilities into their systems, leading to a sustained demand for GPU resources and IaaS (Infrastructure as a Service).

The excellent performance of the Qianwen 3 series models will help Alibaba Cloud attract more customers and promote the development of public cloud business, especially in areas that require strong AI computing power support.

The Qianwen 3 inference model stands out among open-source models due to the Tongyi team's continuous optimization of its technical architecture and algorithms.

With a context length of 256K, it has a clear advantage in handling long text tasks: in the legal industry, it can assist in reviewing lengthy contracts, accurately extracting clauses, responsibilities, and risk points, reducing omissions in manual reviews; in the research field, it can quickly capture the research background, experimental methods, and core conclusions of academic papers, helping researchers save reading time; in scenarios such as knowledge Q&A and code generation, its performance is close to that of top proprietary models The performance improvement of Qwen3-235B-A22B (non-thinking version) is attributed to advancements in training technology.

Regarding the name of this model: "Qwen" is the English identifier for Alibaba Qianwen, "3" indicates that this model belongs to the third generation of the Qwen series, distinguishing it from the earlier Qwen1 and Qwen2 versions; "235B" refers to a parameter scale of 235 billion; "A22B" is typically related to model architecture, training configuration, or hardware compatibility (naming conventions may vary among different manufacturers).

"Instruct" signifies that the model type is an "Instruct-tuned Model." These models undergo further fine-tuning with human instruction data after pre-training, making them better at understanding and executing users' natural language instructions (such as "write a piece of code" or "summarize the document"), rather than merely continuing text, thus enhancing practicality.

"2507" is likely a version date or iteration number, possibly referring to "July 2025" (or a similar internal version time), used to distinguish different iterative versions of the same base model (for example, updates that fix certain issues or optimize performance for specific tasks).

The model was trained on a dataset of 36 trillion tokens during the pre-training phase, covering various types such as books and code repositories, ensuring the breadth and depth of knowledge, enabling the model to handle knowledge queries across different fields; post-training involved multiple rounds of reinforcement learning, integrating non-thinking and thinking models, optimizing overall performance, and making the model more flexible in handling different types of tasks.

The breakthrough in coding capability of Qwen3-Coder comes from the improved Transformer architecture and optimized Agent calling process.

The improved Transformer architecture enhances the accuracy of understanding programming requirements. When developers input the instruction "write a backend interface for user registration," it can accurately grasp the functions and parameter requirements that the interface needs to implement; the optimized Agent calling process increases the efficiency of tool invocation, allowing for faster matching and calling when external code libraries are needed, leading it to excel in multi-language testing and top the HuggingFace leaderboard.

From an ecological perspective, Qwen3-Coder has attracted a large number of secondary developments: developers have added specific industry code libraries to it, enabling it to generate code that better complies with industry standards in the fintech sector; other developers have optimized its response speed, making it more suitable for online programming scenarios that require high real-time performance.

Currently, over 300 general-purpose large models and 140,000 derivative models are widely used in industries such as scientific research and education, promoting the transition of AI technology from the laboratory to practical production and daily life, enhancing efficiency across various sectors