"American version of Yushu" Figure responds to the doubts about "robots entering BMW," claiming a "60-minute unedited video shows a performance surge in three months."

The "American version of Yushu" Figure claims that after just three months of logistics environment deployment, Helix's average package processing speed improved from 5.0 seconds to 4.05 seconds, an efficiency increase of nearly 20%, while also being able to handle complex package types such as deformable plastic bags and flat envelopes, very close to the efficiency of human operators

Recently, the partnership between Figure AI, dubbed the "American version of Yu Tree," and BMW has come under scrutiny, with reports suggesting that its progress is not meeting expectations. In response, Figure AI strongly denied these claims, and Adcock even publicly threatened to sue the media involved.

However, the performance of the company's co-founder and CEO Brett Adcock at the Bloomberg Technology Conference on the 6th stood in stark contrast to its competitors. While Agility Robotics and Boston Dynamics showcased their robotic products on-site, Figure AI chose to be absent.

When pressed by reporters about the lack of a live demonstration, Adcock's reasoning seemed rather flimsy:

"Our philosophy is not to participate in many events; I think it's a huge waste of time. Frankly, I have to bring a team here to showcase the robots, and they could be working in the office."

Adcock added that the company is showcasing the robots through video.

On June 8th, Figure AI released a response video. The company published a 60-minute unedited logistics sorting video featuring the Helix robot, claiming that after just three months of deployment in a logistics environment, Helix's operational speed and flexibility have begun to approach human levels.

Figure Robot: Approaching Human Performance in Three Months?

Figure claims that the Helix robot developed by the company has demonstrated several impressive highlights in the field of humanoid robots, particularly achieving significant progress in logistics and operational tasks.

The average package handling speed of Helix has improved from 5.0 seconds to 4.05 seconds, an efficiency increase of nearly 20%, while also being able to handle complex package types such as deformable plastic bags and flat envelopes, very close to the efficiency of human operators. Even more astonishing is the barcode scanning success rate, which soared from 70% to 95%—indicating that the robot is not only faster but also more accurate.

When Helix encounters crumpled plastic packaging, it first gently flattens the surface to ensure the barcode is read completely. This is all learned through end-to-end learning directly from data, without explicit programming.

Figure: Helix Features Advanced Perception and Control Architecture

Figure states that engineers have implanted three key modules into Helix, enabling it to possess short-term memory, motion history perception, and force feedback capabilities:

Visual Memory: Introduced a short-term visual memory module, allowing the robot to remember past visual information for more intelligent multi-step operations, eliminating redundant actions and improving task success rates.

State History: By integrating historical data of the robot's recent states, it has achieved faster and more responsive control, allowing the robot to maintain coherence during operations and respond promptly to unexpected situations.

Force Feedback: Integrated tactile perception capabilities, enabling the robot to sense contact with objects and the environment, thus achieving more precise grasping and operation, enhancing the system's robustness to object variations Controlled experiments by Figure show that Helix training data increased from 10 hours to 60 hours, processing time decreased from 6.34 seconds to 4.31 seconds, and scanning success rate rose from 88.2% to 94.4%. This indicates that its learning-based approach has strong scalability.

The company stated that Helix is steadily narrowing the gap between the needs of learning robots and real-world tasks. A future where humanoid robots work alongside humans in terms of speed, efficiency, and flexibility is no longer science fiction, but an imminent reality.

The following is the original text from the Figure AI official website:

"Expanding Helix: A New Breakthrough in Humanoid Logistics"

June 7, 2025

Just three months after we first deployed the Helix system in a logistics environment, the system's capabilities and performance have made significant leaps. The Helix system can now handle a wider variety of packaging types and is gradually approaching human-level dexterity and speed, bringing us one step closer to achieving fully autonomous package sorting. This rapid progress highlights the scalability of the Helix system's learning-based robotic approach, which can quickly translate into effective applications in practice.

New Types of Packages — The Helix system can now manipulate deformable polyethylene bags and flat envelopes as reliably as it handles rigid cardboard boxes, adjusting its grasping methods and strategies for each morphological factor, dynamically handling various objects.

Higher Processing Speed — Despite the increased complexity and diversity of package types, the execution speed has improved to 4.05 seconds per package (down from about 5.0 seconds), achieving approximately a 20% increase in processing speed while maintaining accuracy.

Higher Barcode Scanning Success Rate — The orientation of shipping labels can now be correctly facing the scanner in about 95% of cases (up from about 70%), thanks to improved visual and control capabilities.

Adaptive Behavior — The robot has demonstrated subtle behaviors learned from demonstrations, such as gently tapping plastic envelopes to smooth out wrinkles, thereby improving barcode readability.

Small package logistics, as shown in this example, is an ideal environment for AI learning, as the packages and scenes change continuously at each time step, making it very suitable for neural networks.

These improvements have been achieved through data expansion and model architecture enhancements:

Temporal Memory — A new visual memory module has endowed the Helix system with stateful perception capabilities. The current strategy also incorporates historical records of past states, enabling temporally extended behaviors and improving robustness to interruptions.

Force Feedback — Force sensing has been integrated into state inputs, providing a tactile proxy for more precise grasping and package manipulation.

Here, we analyze the sources of these enhancements, examining how increasing demonstration training data (from 10 hours to 60 hours) affects performance, and how each of the aforementioned architectural enhancements contributes to the Helix system's progress in speed and accuracy in package handling

Expansion of Package Types and Adaptive Behavior

The logistics strategy of the Helix system has expanded to handle a wider variety of packages. In addition to standard rigid boxes, the system can now manage polyethylene bags, padded envelopes, and other deformable or thin packages, which present unique challenges. These items may fold, wrinkle, or bend, making it more difficult to grasp and position labels. The Helix system addresses this issue by dynamically adjusting its grasping strategy in real-time— for example, by quickly flipping soft bags or using a pinching method to handle flat mail. Despite the increased diversity in shape and texture, the Helix system has improved its throughput, with an average processing time of about 4.05 seconds per package, without any bottlenecks.

The goal of this logistics task is to rotate the packages so that the barcode faces down for scanning. A notable behavior is that the Helix system tends to flatten plastic packaging before attempting to scan. If the shipping label is on a curved or wrinkled surface (common with loosely filled polyethylene bags or bubble envelopes), the strategy responds by briefly pressing and flattening the surface. This subtle "flattening" action is learned from demonstrations, ensuring that the barcode can be fully read by the scanner. This adaptive behavior highlights the advantages of end-to-end learning—robots learn from demonstration strategies that are never explicitly hard-coded, directly learning from data to overcome the imperfections of real-world packaging.

Crucially, these new capabilities have not compromised efficiency. Throughput has increased alongside versatility. The average processing time per package for the Helix system has decreased from about 5.0 seconds (on a simplified set of packages) to 4.31 seconds, even as the tasks have become more challenging with the introduction of new package types. This speed improvement brings performance closer to that of human operators. Similarly, the success rate for barcode orientation has risen to about 95%. These improvements collectively indicate a more agile and reliable system capable of approaching human-level speed and accuracy across a wide range of real-world packages.

Architectural Improvements to the Helix System's Visual-Motion Strategy

Many of the aforementioned enhancements have been achieved through improvements to the Helix system's visual-motion strategy. Over the past two months, we have introduced new memory and perception modules, making the control strategy more context-aware and robust. These enhancements enable the Helix system to better perceive the state of the world and understand what it is doing, supplementing the visual and control foundation established at the time of initial deployment. Here, we detail each improvement and how it contributes to the logistics performance of the Helix system.

Visual Memory

The Helix system's strategy now maintains a short-term visual memory of its environment, rather than operating solely based on immediate camera frames. Specifically, the model is equipped with a module that combines features from a recent series of video frames, providing it with a temporally extended scene view. This implicit visual memory enables stateful behavior: the robot can remember which side of a package it has already checked or which areas of the conveyor belt are free. For example, if the initial camera view does not fully reveal the label, the Helix system can recall previous glimpses and decide to rotate the package to an angle where the label is remembered to be visible. Thus, the memory module helps eliminate redundant actions (the robot does not "forget" and recheck the same side twice) and increases success rates by ensuring that all necessary views of the item are considered. Essentially, visual memory gives the Helix system a sense of temporal context, allowing it to act more strategically in multi-step operations. This is key to increasing the success rate of barcode orientation to 95%—the current strategy can reliably execute multi-step operations (such as multiple small rotations or perspective adjustments) to locate the barcode, guided by visual memory rather than relying on a single lucky glimpse.

State History

We also combine the Helix system's proprioceptive inputs with a history of recent states, resulting in faster and more responsive control. Initially, the strategy operated in fixed-duration action blocks: it would observe the current state and output a series of motion trajectories, then re-observe, cycling through this process. By incorporating a window of the robot's past states (hand, torso, and head positions) into the strategy's inputs, the system maintains continuity between these action blocks. Importantly, the state history retains context, so even with a higher frequency of replanning, the strategy does not lose track of its ongoing operations or destabilize the manipulation. The end result is a faster response to unexpected events or disturbances: if a package moves or an attempt to grasp does not land perfectly, the Helix system can correct mid-motion with minimal delay. This enhancement significantly contributes to the reduction of processing time for each package.

Force Feedback

To give the Helix system a basic sense of touch, we integrated force feedback into the strategy's input observations. The forces exerted by the Helix system on the environment and the objects it manipulates now become part of the state of the input neural network. This information enables the strategy to detect contact events and adjust accordingly. For example, when the Helix system reaches out to grab a package, it can sense the moment of first contact with the object or when a package is pressed against a surface. It learns to use these cues to adjust its movements: for instance, pausing downward motion upon detecting contact with the conveyor belt. By forming a closed loop with touch, the Helix system achieves more precise handling, ultimately improving the success rate and consistency of actions, making the system more robust to variations in the weight, hardness, and placement of objects

Results and Discussion

To quantify the impact of these improvements, we conducted a controlled evaluation of the logistics performance of the Helix system under different training data regimes and model configurations. We measured two key metrics: package processing speed (average seconds per package, lower is better) and barcode scanning success rate (percentage of packages correctly oriented towards the scanner, higher is better). The following results break down the contributions of additional training data and new architectural features to the overall performance improvement of the Helix system.

Expanded Training Data

First, we examined the impact of increasing the amount of human demonstration data on the proficiency of the Helix system. We compared models trained on demonstration trajectories with approximately 10 hours, 20 hours, 40 hours, and 60 hours of training data (with the same network architecture and hyperparameters). As shown in Figure 1 below, increasing the training data resulted in significant improvements in both throughput and accuracy.

Figure 1: The impact of training data volume on package processing performance. More demonstration data led to faster average processing speed (seconds per package, lower is better) and higher barcode scanning success rate. All models below are the same, utilizing the latest Helix system 1 architecture with memory and feedback modules.

From increasing training demonstrations from 10 hours to 60 hours, the average processing time per package for Helix decreased from approximately 6.84 seconds to 4.31 seconds, resulting in a 58% increase in throughput, and the barcode success rate rose from 88.2% to 94.4%. These returns indicate that we are still in a low-data phase, as model performance continues to improve steadily with increasing data volume.

Contribution of Memory and Feedback Modules

Next, we assessed the contribution of recent architectural enhancements—visual memory, state history, and force feedback—to performance. We conducted an ablation study comparing different variants of the Helix model with these modules enabled or disabled. In this comparison, all models were trained on the same 60-hour dataset, so any differences in metrics reflect the presence or absence of these new features. Figure 2 summarizes the results of this ablation study, listing processing speed and success rate.

Figure 2: Performance impact of adding visual memory, state history, and force feedback. Each row presents a variant of a Helix strategy (trained on 60 hours of data) with certain modules enabled. The full model (last row) includes all enhancements. We report the average processing time (seconds/package) and barcode success rate for each variant

In Figure 2, we demonstrate how each module eliminates specific bottlenecks. The monocular baseline lacks depth and temporal context, resulting in inaccurate grasping positions, and often pauses for extended periods due to the inability to determine how long to stay in a certain state. Adding stereo vision resolves the depth issue—grasping becomes cleaner and throughput improves—but the long pauses problem still persists. One way to address the pause issue is to increase the length of action blocks, but this comes at the cost of reduced reaction time. Instead, introducing visual memory allows the strategy to recall whether the bag has been flipped or whether the label has been visible, eliminating redundant redirection and cutting half a second from the loop. When state history and force feedback are added, the robot gains a perception of the passage of time and touch: it no longer stagnates, better adjusts the grasping force on cardboard boxes, and better controls the force applied to the surroundings to avoid losing balance, increasing the success rate of the first barcode scan to 94%. Finally, by increasing the number of parameters in the network's Transformer decoder head by 50%, the network is expanded, utilizing these richer inputs to reduce the average processing time to 4.05 seconds while maintaining an accuracy rate of over 92%.

Visual Condition Reflex: Human-Machine Handover

Although Helix's primary goal in logistics scenarios is autonomous sorting, the same end-to-end model can easily adapt to new interactions. One example is the human-machine handover behavior achieved through visual condition reflexes. We only provided a few additional demonstration clips, where a person waits for a package handover (these clips were randomly collected during the main data collection process), allowing the strategy to interpret the outstretched hand of the person as a signal for handing over items. No new skills were explicitly programmed; the network simply learned that when a person extends their hand, the appropriate action is to hand the package to the other person rather than placing it on the conveyor belt. This behavior uses the same neural policy and weights as all other actions—the difference purely comes from Helix's observation of humans and the context learned from those additional examples.

Conclusion

We have demonstrated how significantly enhancing Helix's performance in real-world logistics can be achieved by expanding a high-quality demonstration dataset and incorporating architectural improvements such as visual memory, state history, and force feedback. The result is a general visual-motor policy capable of handling various packages at speeds and reliability close to human levels—this represents a significant advancement compared to the initial capabilities two months ago. These improvements not only address immediate challenges in package handling but also bring universal benefits to Helix's control system, which can extend to other use cases. By enabling stateful perception and force sensing, we have made the strategy more robust and flexible without sacrificing efficiency. Crucially, the strategy benefits from both data expansion and architectural improvements; neither alone could drive the performance enhancement of the strategy.

Helix is steadily improving its agility and robustness, narrowing the gap between the operational needs of learning-based robots and real-world task requirements. Ongoing work will continue to expand its skill set and ensure stability at higher speeds and workloads