Track Hyper | Landing: SenseTime launches Wuneng embodied intelligence platform

Wallstreetcn
2025.08.02 09:44
portai
I'm PortAI, I can summarize articles.

The core is an embodied world model, and the means are multimodal interaction

Author: Zhou Yuan / Wall Street News

On July 27, at the 2025 World Artificial Intelligence Conference (WAIC) large model forum, SenseTime launched the "Wuneng" embodied intelligence platform: centered around SenseTime's embodied world model as the core engine, relying on SenseTime's large devices to provide edge and cloud computing power support, capable of providing perception, visual navigation, and multimodal interaction capabilities for robots and smart devices; at the same time, this platform supports embedding into edge-side chips and terminal hardware such as robots.

This is SenseTime's specific practice in the field of embodied intelligence and provides new technical options for the development of smart devices.

The core engine of SenseTime's "Wuneng" embodied intelligence platform is the embodied world model, which is a complex dynamic system: not a simple static replication of the physical world, but rather a digital mirror that reflects changes in the physical world in real-time by continuously learning and integrating massive amounts of data, similar to a digital entry point into the physical world.

These massive amounts of data cover various aspects of information, including the spatial structure of the physical environment, the physical properties of objects, the occurrence patterns of various events, and human behavior patterns.

The operational logic of SenseTime's embodied world model is similar to the human cognitive process of understanding the world.

Humans acquire information through sensory organs such as eyes, ears, and nose, forming an understanding of the world in the brain to guide actions, while the embodied world model collects environmental data through sensors and other devices, processing it through algorithms to form a "cognition" of the world, thereby providing decision-making basis for smart devices.

This model can continuously update its "cognition" based on new input data, just as humans adjust their views of the world after experiencing new things.

With the support of edge and cloud computing power provided by SenseTime's large devices, the "Wuneng" embodied intelligence platform can provide basic capabilities such as perception, visual navigation, and multimodal interaction for robots and smart devices.

At the perception layer, the platform integrates various sensor data and analyzes environmental information using the embodied world model.

For example, in a home scenario, a robot equipped with the "Wuneng" embodied intelligence platform can recognize furniture layout, family members, and temperature and humidity; in an office environment, it can also distinguish office equipment and documents.

This perception is significantly affected by environmental factors such as light and occlusion, and the perception capability of the "Wuneng" platform is gradually improved through continuous interaction between the device and the environment; under normal environmental conditions, it can stably output environmental information.

The visual navigation function mainly addresses the issue of autonomous movement of devices. The platform analyzes space through the embodied world model to plan paths for robots to avoid obstacles.

In structured environments such as warehouses, logistics robots can use this platform to complete cargo transfer; in indoor corridor scenarios, service robots can travel along designated routes, achieving precise point-to-point movement within preset scenes.

Multimodal interaction supports both voice and visual methods. Voice can convey basic commands, while visuals can recognize simple gestures and expressions to assist in understanding user intentions.

Smart speakers can respond to voice commands through the platform and adjust the volume based on gestures, meeting users' routine operational needs in everyday interaction scenarios.

The hardware adaptation of the "Wuneng" embodied intelligence platform is flexible and can be applied to humanoid robots, service robots, and some smart devices This adaptability allows the platform to test applications in different scenarios and provides hardware manufacturers with technical integration options.

The technology philosopher Lewis Mumford emphasized in "Technics and Civilization" that technology is not an external existence to human life, but is deeply embedded in and shapes every aspect of human life.

The "Aito" embodied intelligence platform plays a role by adapting to different hardware, fundamentally changing or reshaping human life in real-world scenarios.

From a technical experience perspective, the platform supports embedded edge-side chips, significantly enhancing practical application value: it reduces reliance on cloud computing power, allowing devices to maintain basic functions even when the network is unstable; edge-side processing accelerates response speed and reduces data transmission latency.

Taking smart home devices as an example, after embedding the platform, the local processing speed of facial recognition for smart locks improves, reducing the need to upload data to the cloud, lowering latency, and providing a better experience and more stable functionality in home security scenarios.

Application scenarios include home, office, and industrial fields: home robots can assist with cleaning, moving, and other simple chores; in office scenarios, smart devices assist with document classification and meeting room reservations; in industrial environments, robots participate in parts handling and basic quality inspection.

Xu Li, Chairman and CEO of SenseTime, demonstrated the effect of a humanoid robot equipped with the embodied world engine explaining a PPT on "Lychee of Chang'an," showcasing its natural and humorous language, automatic page turning, ability to answer various questions, and capability to summarize periodically.

This demonstration intuitively presents the platform's interactive functions and reflects the current level of embodied intelligence.

In terms of technological development, the "Aito" platform has clear optimization directions for perception comprehensiveness, navigation adaptability, and interaction depth. In the future, SenseTime will enhance platform stability and applicability by collecting application data to iterate the embodied world model.

In this process, industry collaboration is crucial; cooperation among enterprises in the field of embodied intelligence can effectively promote technological progress. As a direction of artificial intelligence development, progress relies on technological accumulation and scenario validation.

The "Aito" platform is a practice by SenseTime, and its practical value will gradually emerge in subsequent applications.

For the industry, this exploration promotes the transition of embodied intelligence from concept to practice, providing more technical pathways for the development of smart devices.

From the user's perspective, the platform's value is reflected in the user experience. Whether robots and smart devices can solve practical problems and maintain stability and reliability is key to measuring value.

The existing functions of the "Aito" platform provide possibilities for meeting demands and will continue to improve to align with user expectations.

In the process of technological implementation, cost control is a key aspect.

The integration cost of the platform and the manufacturing cost of devices affect the degree of popularization; SenseTime and its partners are exploring ways to reduce costs while ensuring functionality.

Overall, the "Aito" embodied intelligence platform is a concrete practice of artificial intelligence in the embodied field, possessing certain technical characteristics and application potential, while facing practical issues such as technological improvement, scenario adaptation, and cost control.

The development of such platforms depends on the speed of technological iteration, market feedback, and the depth of industry collaboration, requiring time to present the final effect