Humanoid Robots - Waiting for the "Scaling Law" Moment

Wallstreetcn
2025.09.03 00:20
portai
I'm PortAI, I can summarize articles.

At the 2025 autumn strategy meeting, the "Scaling Law" moment in the field of humanoid robots and its prospects for industrial applications were discussed. The current robotics industry is in the early stages of investment, facing challenges of high hardware costs and insufficient intelligence. In the future, attention needs to be paid to the emergence of intelligence in robotic brains to accelerate industry development. The mainstream engineering path is the large and small brain approach, combining pre-trained large models with lightweight control small models to address current technological limitations. There is increasing emphasis in China on the development of embodied intelligent large models, with industry participants actively building software and hardware ecological platforms

Core Viewpoints

On August 27-28, we organized the 2025 Autumn Strategy Meeting, discussing the "Scaling Law" moment for humanoid robots at the Humanoid Robot Forum, the application prospects and solutions for ontology in industrial scenarios, and the necessity of an open platform for robots.

Key Highlights:

1. Currently, robots are in the early stage of industrial trend investment. We believe that the initial number of orders does not constitute a key signal; the core issue is whether we can solve the core bottlenecks of humanoid robots: 1) High hardware costs, complex structures, and lack of standardization; 2) Insufficient intelligence in the brain. On the hardware level, based on the large-scale entry of domestic manufacturing enterprises, coupled with the subsequent release of Tesla's Optimus 3, hardware costs are expected to decline non-linearly and achieve standardization or rapid breakthroughs. On the software level, the current model paradigm is converging towards a dual-system layered VLA, but the "ChatGPT" moment for robot brains has yet to emerge. We believe that future attention should focus on the intelligent emergence of the robot brain's "Scaling Law," which is expected to genuinely drive the positive flywheel of humanoid robots, accelerating the industry non-linearly. If robots demonstrate sustained demand in multiple vertical scenarios, it is likely to strengthen market confidence and recognition of long-term market space, thereby breaking free from the investment paradigm of "marginal changes" + "million-unit terminal valuation method."

2. The big and small brain approach is currently the mainstream for the implementation of large-scale robot models. We believe that among several robot models: 1) Non-end-to-end modular models capture vertical scenarios with clear links and low costs, but due to their rigid rules, they are difficult to generalize. 2) End-to-end VLA relies on massive data, achieving the highest performance ceiling, but is constrained by training technology, hardware reserves, and real-time and controllable thresholds. 3) The big and small brain approach: using pre-trained large models as "thinking" systems, with lightweight small models completing the "reflection" from thought to action, is the most balanced engineering path considering current limitations in computing power, task success rates, data efficiency, real-time performance, and interpretability. Currently, there is an increasing focus on the development of embodied intelligent large models in China, with important industry participants including companies focused on the development and iteration of embodied intelligent model paradigms (ontology companies and those specializing in embodied intelligent large model development), as well as platform development companies leading the creation of a software and hardware ecosystem for the robotics industry. Domestic embodied large model enterprises are gradually gaining favor in financing.

3. We believe that the landing scenarios for robots will first be in research, education, guiding, and demonstration performances in ToG scenarios. Currently, leading humanoid robot manufacturers can perform relatively simple and repetitive labor in ToB industrial manufacturing scenarios. As the industry's generalization capability improves, B-end scenarios become the first stop in the commercialization deep water zone for robots. Taking garment manufacturing as an example, there are about 60 million garment sewing workers globally, facing recruitment difficulties due to working hours and wages. In the past, industrial robots were rarely applied in the garment manufacturing industry due to the flexibility of garment fabrics, non-standard processes, and rapid style updates, making traditional automation programming models difficult to match. In recent years, the rapid development of large models and the end-to-end architecture have freed the programming process, making the subsequent replacement of many non-standard labor tasks possible

Main Text

Core Viewpoint: In the "2025 Mid-term Strategy Meeting Briefing - Humanoid Robot Forum: Industrialization Enters Deep Water" released on June 6, 2025, we conducted an in-depth review of the humanoid robot market since 2022. We found that as the pace of industrial progress continues to accelerate, the market has deeply recognized the long-term potential of humanoid robots. Since Tesla entered the humanoid robot sector in 2022, the market has experienced several fluctuations, all driven by the announcements and updates of leading robots. Starting from Q4 2024, as Tesla and domestic robot companies begin preliminary mass production, the market has already priced in expectations for a surge in penetration rates, combined with significant anticipatory effects, propelling the market quickly beyond the pure thematic phase. We currently position this as an early stage of industrial trend investment.

The shift from an early pure thematic market to a trend market is based on the underlying logic that the accelerating industrial progress has strengthened market confidence and recognition of the long-term market potential for humanoid robots. By the end of 2024 to early 2025, the core industrial essence of the robot market will be that the robot industry is beginning to enter the actual small-batch production phase. However, due to the current lack of significant intelligent capabilities in robot brains, the initial mass production demand primarily stems from exploratory purposes such as applications and testing, and the sustainability of order demand remains to be observed.

From an industrial trend perspective, the current bottlenecks for humanoid robots are: 1) cost reduction and non-standardization of hardware; 2) intelligence of the brain. We believe that as Chinese industrial chain companies begin large-scale layouts in the humanoid robot sector this year, through investments, mergers, and various business expansion methods, the entry of Chinese manufacturing enterprises is expected to lead to a non-linear decrease in hardware costs, making the hardware bottleneck less of a core issue. More importantly, with AI empowering the innovation of large robot models, the brain is expected to achieve intelligent generalization following the "Scaling Law" paradigm of AI, potentially accelerating the industrial trend. We believe that if the robot market wants to replicate the historical industrial investment trends of new energy vehicles, smartphones, and other emerging intelligent terminals, an initial signal may be the formation of relatively mature hardware solutions that begin to land in simple industrial scenarios and special application scenarios (with preliminary generalization capabilities), which may emerge in the next two years. Key attention should be paid to the progress of the robot brain "Scaling Law" brought by foreign companies like Tesla and Figure, as well as leading domestic enterprises.

The "Scaling Law" of Robots May Open a New Wave in the Industry

As AI enters the reasoning era, the emergence of large models with capabilities such as thinking chains is expected to initiate a new round of transformation and innovation cycles for edge products. Among various edge products, compared to speakers, glasses, cameras, smartphones, and PCs, robots not only require empowerment from large language models but also need to possess autonomous mobility or action capabilities as embodied intelligent carriers. However, due to the novelty, complexity, and undefined nature of humanoid robot hardware structures, the innovation cycle required for AI large model transformation is relatively longer

Bottleneck 1: High hardware solution costs, not converged and lacking standards. Currently, the BOM of Tesla's robots remains high. According to Tesla's AI Day, the goal is to reduce the cost of Tesla robots to $20,000 per unit in the future. Key areas for cost reduction include joint modules, dexterous hands, and six-dimensional force sensors. Different humanoid robot manufacturers have varying solutions for joint actuators, dexterous hands, and sensors, such as planetary roller screw linear joint solutions, micro screw/link/rope-driven hand solutions, axial flux/frame-less torque motors, and reducers, which have become points of technological differentiation.

Bottleneck 2: Software lacks strong model representation capabilities + high-quality large-scale data. Software requires strong model representation capabilities + large-scale high-quality data (efficiently collected real-world data that is useful for algorithm models). Brain generalization relies on data, and the cerebellum's control and hardware coupling, while the data modalities for robot movement and operation are more complex, requiring a complete redefinition of data and long-term, large-scale collection in real environments. Before large models, task definition, decomposition, and motion code generation were done by engineers. Perception decision-making large models break down complex tasks into a series of action instructions, which are executed one by one by operational large models; cerebellum algorithms are primarily based on model predictive control (MPC) and lower-level whole-body joint force control (WBC) based on dynamic models. After large models, the trend is towards reinforcement learning + imitation learning in simulated/real environments. Software iteration lacks high-quality, low-cost, large-scale datasets, but the challenges in data collection include high costs, difficulties in data generalization, lack of dedicated scene data, and absence of unified data standards.

The difficulty in generating revenue and shipping humanoid robots is not significant; the key challenge lies in mass production and large-scale practical application. By 2025, leading humanoid robots represented by Tesla have achieved small-batch production and initial commercialization in specific scenarios. Several domestic humanoid robot companies have announced the delivery of hundreds to thousands of units. However, upon closer examination of the delivery scenarios, apart from a few leading companies, there are relatively few companies globally that have truly achieved a commercial closed loop for bipedal humanoid robots. Most companies leading in commercial delivery are engaged in small-quantity strategic cooperation, data collection, and demonstration performance scenarios. While short-term revenue may be considerable, the sustainability of orders remains to be observed. Additionally, in current scenarios with high shipment volumes such as data collection, scientific research, and demonstrations, the demands are mostly for research and scene training with low generalization requirements. Buyers conduct algorithm development based on the hardware, while the hardware companies focus more on selling hardware without achieving technological breakthroughs at the software level Therefore, we believe that the number of delivery orders in the early stage of the industry is not a key indicator; the core lies in whether a correct model paradigm and data flywheel can be initially formed. We believe that the positive flywheel of humanoid robots should be: initial generalization of the brain → opening up mass production scenarios → scaling down hardware costs → increasing data collection volume → strengthening model training → "Scaling Law" is expected to reflect and bring more intelligence to the brain → further opening up demand. Currently, we observe that leading humanoid robot companies are, on one hand, starting to implement VLA large models, and on the other hand, exploring real data collection and model training in some industrial manufacturers with relatively high requirements for precision and operational capabilities.

From an industrial trend perspective, American companies like Tesla and Figure, along with leading domestic enterprises, are guiding the innovation direction of large robot models, and the intelligence of the brain is expected to achieve nonlinear acceleration with the "Scaling Law" paradigm of AI. Chinese industrial chain companies are making large-scale layouts in the humanoid robot track, expanding their business through various means such as investment and mergers and acquisitions, and the entry of domestic manufacturing enterprises is expected to bring nonlinear reductions in hardware costs. With the acceleration of software and hardware iterations, humanoid robots are expected to officially kick off the acceleration of industrial trends, similar to the early application stages of smartphones and new energy vehicles. As functions gradually improve and demand is stimulated, there is hope for nonlinear growth in demand in the coming years.

From modularization to end-to-end VLA, large robot models may converge.

Large robot models are developing along the path from modularization to end-to-end, and VLA may converge in the industry. With the advancement of large language models (LLM) and multimodal large language models (MLLM), utilizing their capabilities to achieve task planning and motion control for robots has become more feasible. Reviewing the development of large robot models in academia and industry, we believe it can be mainly divided into three technical routes: non-end-to-end modular models, end-to-end VLA models, and dual-system hierarchical VLA models.

Dual-system hierarchical VLA model: the mainstream choice in the industry.

The dual-system hierarchical VLA model may be the current preferred architecture in the industry, with Figure Helix being a typical representative. The dual-system hierarchical VLA model still falls within the VLA category, adopting a heterogeneous module architecture (large model corresponds to the brain, small model corresponds to the cerebellum) to combine the cognitive capabilities of the large model with the real-time control capabilities of the small model Figure, IM Motors, NVIDIA, and Google have made achievements in the VLA model that combines large and small brains, promoting industry implementation.

Figure HelixVLA consists of two systems: fast and slow, similar to the human brain and cerebellum. In February 2025, Figure released the first VLA—Helix—that enables high-speed continuous control of the entire upper body of a humanoid (including wrists, torso, head, and individual fingers), demonstrating good generalization capabilities and supporting edge-side operation. The VLM backbone is general but not fast, while the robotic visual motion strategy is fast but not general; Helix addresses this trade-off through two complementary systems. Helix is fully end-to-end trained, mapping from raw pixels and text commands to continuous actions with standard regression loss, requiring only a single training phase and a set of neural network weights.

1) The slow system, also known as System 2 (S2), is an edge-side VLM-7B pre-trained on internet data, operating at a frequency of 7-9Hz for scene understanding and language comprehension, achieving broad generalization across objects and contexts. System 2 can utilize open-source VLM, pre-trained on internet-scale data, processing robotic images and state information (including wrist posture and finger positions) projected into the visual language embedding space. VLM processes segmented video clips from onboard robotic cameras and prompts: “What instructions would you give the robot to make the actions in this video occur?” Combining this with natural language commands specifying the desired behavior, S2 extracts all semantically relevant information into a continuous latent vector, which is passed to S1 to adjust its low-level actions.

2) The fast system, also known as System 1 (S1), is an 80M parameter cross-attention encoder-decoder Transformer designed for low-level control, with pre-training conducted entirely in a simulation environment. S1 essentially represents a fast-reactive visuomotor strategy. The latent vector from S2 is projected into S1's token space and concatenated with visual features from S1's visual backbone along the sequence dimension, providing task modulation. S1 outputs complete upper body humanoid control at a frequency of 200Hz, including the desired wrist posture, finger flexion and abduction control, as well as torso and head direction targets.

The large and small brain approach is currently the mainstream for the implementation of robotic large model engineering, and end-to-end VLA is the vision for general AGI in robotics. We believe that non-end-to-end modular models can conquer vertical scenarios with clear links and low costs, but due to their rigid rules, they struggle to generalize to open tasks End-to-end VLA relies on massive data, with the highest performance ceiling, but is constrained by training technology, hardware reserves, and the thresholds of real-time and controllability. In comparison, the large and small brain uses pre-trained large models as a "thinking" system, with lightweight controlled small models completing the "reflex" from thought to action. This approach achieves task success rates, data efficiency, and real-time requirements well under limited data and computing power, while retaining interpretable interfaces, thus becoming the most balanced engineering path at present. If future computing chip efficiency/power consumption continues to optimize, low-cost large-scale robot data generation is realized, and breakthroughs in large model interpretability technology are achieved, starting from first principles, end-to-end VLA may still be the optimal choice, maximizing cross-scenario generalization, while the large and small brain acts as a bridge, guiding the industry to transition steadily.

The industry is beginning to intensify its focus on the development of embodied intelligent large models.

Domestic capital is starting to shift from hardware entities to embodied intelligent large models. As the core of general robot technology, embodied intelligent algorithms, i.e., large models, were initially less emphasized domestically than abroad. Hardware manufacturers were more favored by capital, occupying the vast majority of financing shares in the embodied intelligence track, while many tech giants launched non-embodied intelligent large models with limited investment in the development of embodied intelligent large models. Foreign tech giants place a high emphasis on embodied intelligent large models and entered the market early (e.g., Google, NVIDIA; Google has completed multiple technical iterations from Saycan to RT-H). Startups focused on embodied intelligent large models have attracted significant capital, such as SKid AI, which completed $300 million in financing in July 2024, achieving a post-investment valuation of $1.5 billion; Physical Intelligence, which completed $70 million in financing within a month of its establishment, and $400 million in financing by November 2024, with a post-investment valuation of approximately $2.4 billion; executives from Covariant have been successively hired by Amazon since August 2024 ("talent acquisition"). Domestic companies in the embodied large model track are experiencing a wave of financing in 2024, with several startups in embodied intelligent large models, such as Qianxun Intelligent and Qiongche Intelligent, securing hundreds of millions of RMB in financing since the second half of 2024.

In addition to companies focused on the development and iteration of embodied intelligent model paradigms (including hardware companies and those focused on embodied intelligent large model development), we believe that important participants in the subsequent software direction also include companies that build platform capabilities. The high threshold for robot development, difficulties in model selection, multi-machine collaborative scheduling, and software usability are common barriers in the software development and engineering implementation process. Typical challenges include: 1) Numerous visual perception components have various selections, involving different laser/visual SLAM algorithms; 2) Many application scenarios lack landing data and ecosystem tools. Some companies, such as XianGong Intelligent, are developing a "robot brain" platform by using robot controllers as an entry point, collaborating with multiple downstream hardware and component manufacturers to build a development platform for embodied intelligent large models, saving time on repetitive work and improving industry development efficiency

Commercialization: Diverse Application Scenarios Gradually Emerging

We believe that the initial landing scenarios for robots will be in research, education, guided tours, and performance displays in ToG scenarios. The mid-term landing in ToB scenarios is the first stop for the commercialization of bipedal robots, while the long-term landing in ToC scenarios for commercialization represents a large market space with high non-standardization, which may be the ultimate market for humanoid robots. Based on the difficulty of landing and market scale, the order is To C > To B > To G.

① ToG: The landing difficulty of ToG scenarios in research institutions is relatively low. Research institutions purchase robots mainly for research and scenario training, with low requirements for generalization capabilities. The manufacturers do not need to achieve technological breakthroughs at the software level but can quickly land a small number of delivery orders. This has become a priority scenario for many startups (based on first-mover and production capacity advantages, Unitree H1 under Yu Shu Technology has become the preferred product for global research institutions and AI companies, with 24H2 having shipped globally). Currently, the prices of the robots are continuously decreasing, such as the price of Zhongqing SA01 at 42,000 yuan, Songyan Power N2 at 39,900 yuan, and Unitree R1 at 39,900 yuan.

② ToB: Currently, leading humanoid robot manufacturers can perform relatively simple and repetitive tasks in ToB industrial manufacturing scenarios. The tasks in these scenarios are relatively fixed, and the scenarios are semi-open, requiring robots to have a certain level of generalization capability. Agility Robotics, which has taken the lead in commercialization, has its Digit performing tasks in factories, including picking up handbags from AMRs and placing them on conveyor belts. We believe that as the generalization capability of the industry improves, structured scenarios in the B-end, such as textiles, industrial manufacturing, automotive intelligent manufacturing, warehousing logistics, and security inspections, may become the first stop for the commercialization of robots.

③ ToC: ToC has higher generalization requirements for humanoid robots. This scenario has many interference factors and is complex, with different groups having varying adaptability requirements for robots, thus requiring higher generalization capabilities for model training.

Task execution is shifting from standardization to non-standardization, and commercialization opportunities are moving from focusing on vertical scenarios to semi-general scenarios. The B-end is expected to become the first stop for deep commercialization: first, on the demand side, if algorithm planning, multi-modal perception, and task scheduling capabilities are gradually accumulated, robots will continuously expand their non-standard task capabilities, and the urgent replacement space in the B-end may be released first, forming a foundation for early industry volume; second, on the cost side, as the potential demand from manufacturers is released and orders are placed down the supply chain, the scale effect of hardware manufacturing can further push down robot manufacturing costs, thereby increasing robot penetration rates. We believe that around 2030, B-end applications are expected to enter production processes such as assembly, sorting, quality inspection, and flexible handling, while the C-end is expected to first land in clearly defined, high-frequency urgent need scenarios such as safety care, nursing assistance, and household collaboration. By around 2035, robots in the B-end are expected to form a flexible production line collaborative system with AGVs, robotic arms, and other automation systems In the complex family environment on the C-end gradually applying, some high-risk operational scenarios will also enter the full-process robotic stage.

Scenario: The application of humanoid robots in garment manufacturing has great potential, with leading enterprises having clear product planning.

Global annual labor expenditure in garment manufacturing reaches trillions, and AI development makes machine replacement possible. According to the Sewing Machinery Association, in the past 8 years, China's domestic and export demand for industrial sewing machines totaled about 57 million units, indicating that the global stock of industrial sewing machines is nearly 60 million units. Assuming a human-machine ratio of 1:1, this corresponds to approximately 60 million sewing workers worldwide. Based on an estimated annual salary of 30,000 to 40,000 per person, the annual labor expenditure in the garment manufacturing industry is at the trillion RMB level, while the market size for industrial sewing machine equipment is only in the tens of billions, indicating that there is still significant room for machine replacement. Although the labor scale is large, the application of industrial robots in the garment manufacturing industry has been relatively low due to the flexibility of garment fabrics, non-standard processes, and rapid style updates, making traditional automation programming difficult to match. In recent years, the rapid development of large models has freed the end-to-end architecture from the programming process, making the replacement of many non-standard labor tasks possible.

Humanoid robots and traditional automation combine organically, leading to an unmanned future in garment manufacturing. Compared to traditional industrial robots, while AI has given humanoid robots a certain degree of generalization ability, there are still limitations in precision control/success rates in the last mile. Taking Jack Technology's layout in unmanned garment manufacturing as an example, for complex A/B processes in garment manufacturing, such as pocket attachment, the company first achieves skill reduction through automated sewing units, template machines, and other automation products, while enhancing the flexibility of template technology to broaden the usage scenarios of template machines, ultimately using humanoid robot products to perform the remaining loading and unloading tasks outside of sewing. Currently, the company's humanoid robot gripper can accurately separate single-layer fabric from multiple layers, solving the fabric grabbing problem. The company's self-developed humanoid robot has completed product prototype development and plans to accelerate its mass application in the garment industry.

Article authors: Xie Chunsheng, Xie Chunsheng et al., Source: Huatai RuiSi, Original title: "Huatai | Joint Research: Humanoid Robots - Waiting for the 'Scaling Law' Moment," content slightly abridged.

Risk warning and disclaimer The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk