Track Hyper | Baidu open-sources ERNIE 4.5: What is the strategy?

Exploring technological evolution in open collaboration

Author: Zhou Yuan / Wall Street News

On June 30, Baidu officially open-sourced the Wenxin large model 4.5 series (ERNIE 4.5), which includes 10 models of different parameter scales, including a mixed expert (MoE) model with 47B (4.7 billion) and 3B (300 million) active parameters, as well as a dense model with 0.3B (30 million) parameters. The pre-trained weights and inference code are fully open.

Currently, these models are available for download on platforms such as PaddlePaddle Galaxy Community and HuggingFace, and Baidu Smart Cloud's Qianfan large model platform also provides API (Application Programming Interface) services.

This move continues the tradition of "open collaboration" in the tech field and provides new possibilities for the implementation of large model technology.

However, Baidu's founder, chairman, and CEO Robin Li once stated at the 2024 WAIC (World Artificial Intelligence Conference) that open-sourcing large models is an "IQ tax."

From Parameter Coverage to Tool Adaptation

The 10 models open-sourced by Baidu form a gradient coverage from 0.3B to 47B parameters, encompassing basic text models and visual multimodal models (VLM). Except for the smallest 0.3B model, the other models adopt a heterogeneous multimodal MoE (Mixture of Experts) architecture.

For small and medium developers with limited computing power, the 0.3B dense model can lower deployment thresholds, while the MoE models can meet the needs of enterprise-level complex tasks. This layered supply approach allows users with different resource conditions to find suitable tools.

Unlike conventional single-modal MoE, the ERNIE 4.5 model open-sourced by Baidu is a heterogeneous hybrid type, which enhances model performance through a "divide and conquer" strategy: integrating multiple different types of expert modules, where only a portion of the relevant expert modules are activated for each input, significantly increasing model capacity without a substantial increase in computational load.

The core idea of this architecture is to decompose complex tasks to multiple specialized "expert models" for processing, and then dynamically select the optimal expert or combination of experts for output through a gating network, thereby enhancing the model's expressive ability and efficiency while keeping the model size manageable.

In comparison, the technical characteristics of the Wenxin large model 4.5 series are concentrated on optimizing multimodal capabilities.

As a native multimodal model, ERNIE 4.5's understanding of images and audio-video is not a simple overlay but is achieved through modality fusion based on a heterogeneous MoE architecture. It does not blindly pursue breakthroughs in a single metric but gradually enhances multimodal processing capabilities based on stable performance in text tasks.

Observing the technical structure of ERNIE 4.5, it can be seen that the heterogeneous MoE architecture of ERNIE 4.5 includes three types of FFN experts: text experts, visual experts, and shared experts FFN experts refer to the expert modules composed of Feed-Forward Neural Networks (MoE) in the mixed expert model.

Each FFN expert can be seen as an independent sub-model capable of processing specific types or ranges of data.

The model determines which FFN experts are responsible for processing each input token through a gating network or routing mechanism.

For example, in image understanding, whether it is a daily photo or a cartoon icon, the model can output interpretations that align with the scene's logic. This enhancement in capability stems from continuous learning of the correlations in multimodal data, rather than isolated technical stacking.

It is well known that NVIDIA's strength lies not only in the excellent performance of its AI acceleration cards but also in the closely related ecosystem of development tools adapted to CUDA.

Baidu has also simultaneously launched supporting development tools for ERNIE 4.5: a complete open-source development toolchain, including the ERNIEKit training tool and FastDeploy inference deployment tool, aimed at lowering the threshold for developers to use large models and promoting the widespread application of multimodal AI technology.

In essence, this is also practicing the technical ethics of "tools should serve people."

These tools lower the technical barriers for model post-training and deployment, allowing developers to perform secondary development based on open-source models without needing to deeply understand the underlying principles.

Baidu's open-source initiative is not an isolated action; as early as February this year, Baidu announced the open-source plan for the Wenxin large model 4.5.

From the perspective of ecosystem construction, the open-source of the Wenxin large model 4.5 follows the positive cycle logic of "technology-user-data."

The value of the ecosystem lies in connection rather than control.

By open-sourcing, Baidu hands over the usage rights of the model to developers, and the applications developed based on the model will generate new data feedback, which will in turn benefit the model's iteration.

For example, when retail companies develop product image recognition tools, the accumulated industry data can help the model optimize its capture of product features; the use by educational institutions may enhance the model's understanding of teaching scenarios. This distributed optimization process is more efficient than closed-door research by a single company.

Consideration: Balancing Sharing and Sustainability

The "dual-layer open-source" of the PaddlePaddle platform and Wenxin model further strengthens the synergy of the ecosystem.

PaddlePaddle serves as the underlying framework, providing the operating environment for the model; the Wenxin model, as the upper application carrier, enriches the usage scenarios of the framework.

This structure aligns with the "endosymbiotic theory" proposed by American biologist Lynn Margulis—different components form a more powerful whole through mutual symbiosis.

When developers debug the Wenxin model on PaddlePaddle, they are not only using tools but also participating in the collaborative optimization of the two systems. This deep binding enhances ecosystem stickiness more than mere technical output.

However, open-source does not mean boundless free access.

The Wenxin large model 4.5 adopts the Apache 2.0 license, which allows commercial use while requiring the retention of original author information. This institutional design balances sharing and rights protection.

In fact, from a practical perspective, clear property rights delineation is a prerequisite for collaboration Clear agreement terms allow developers to understand what they can and cannot do, avoiding legal risks in technology applications, while also reserving space for Baidu's commercial monetization: achieving sustainable operations through cloud platform API services, value-added tools, and other means.

From a cost perspective, open source is a "distributed R&D" strategy. The training and iteration of large models require continuous investment in computing power and manpower, which a single enterprise finds difficult to bear entirely.

What is good management? It's simple: letting the right people do the right things.

After Baidu open-sourced the model, the wisdom of global developers was incorporated into the innovation system, with some optimizing inference speed and others expanding application scenarios. This division of labor allows each participant to focus on their areas of expertise, indirectly reducing overall R&D costs.

For the industry, Baidu's open-source model provides a path of "differentiated innovation based on standardization."

The unification of basic models reduces the waste of redundant R&D; while secondary innovations by developers can meet the personalized needs of different industries.

Just as the manufacturing industry focuses on the model's understanding of industrial blueprints, the media industry pays more attention to the fluency of text generation. This is a model of "common technology + personalized application," where basic technology is the gene, and industry applications are its phenotypes in different environments, enriching the ecological diversity of technology.

The open-sourcing of the Wenxin large model 4.5 provides a reference development paradigm for the domestic large model industry.

Unlike the black-box operations of closed-source models, open source makes technological capabilities tangible and verifiable. Developers can directly view model weights and inference code, and when the model makes decisions, users can trace its logical chain rather than passively accept the results.

From a global perspective, this open-source initiative is also an attempt for domestic large models to participate in international collaboration.

Currently, there are various development paths in the global large model field: some insist on closed-source commercialization, some choose partial open source, and others are completely open.

The full open-source of the Wenxin large model 4.5 is equivalent to handing a technical business card to global developers. Its open stance helps domestic technology integrate into the global innovation network and find its positioning through international feedback.

Of course, open source is not a universal key. The performance of the model still needs to be tested in practical applications: whether it can accurately identify small defects in industrial quality inspection scenarios, whether it can understand complex livelihood demands in government services, and whether it can align with teaching rules in educational assistance: these real-world scenario tests are more convincing than laboratory evaluation data.

The significance of Baidu open-sourcing the Wenxin large model 4.5 may not lie in the current technological breakthroughs, but in the development philosophy it demonstrates: building consensus through openness and solving problems through collaboration.

As more and more developers get involved and as the model is implemented in more industry scenarios, large model technology can truly step out of the laboratory and become a practical tool for promoting social progress, but there are no shortcuts in this process