Morgan Stanley: Visual data reconstruction AI robot competitive landscape, with Tesla as the core focus target

Morgan Stanley's research report points out that the competition for AI robots has shifted from algorithm iteration to data acquisition, with visual data being the core resource for VLA model training. Companies like Tesla, Meta, and Brookfield are building technological barriers through scene coverage and data accumulation. As embodied artificial intelligence technology matures, the battle for visual data is becoming increasingly intense. Companies that can balance data collection efficiency, user privacy, and commercialization are expected to reshape the global AI robot industry landscape

According to the report from Morgan Stanley, the competition for AI robots has shifted from "algorithm iteration" to "data acquisition." Visual data, as the core resource for training VLA models, will directly determine a company's position in the industry. Whether it's Tesla (TSLA.US) focusing on video collection in industrial scenarios, Meta (META.US) seizing the consumer-end wearable device entry, or Brookfield activating real estate scenario resources, the essence is to build technological barriers through "scenario coverage + data accumulation."

As embodied artificial intelligence technology matures, the "photon competition" will become increasingly fierce, and companies that can balance data collection efficiency, user privacy, and commercialization are expected to stand out in this competition and reshape the global AI robot industry landscape.

The main points of the report are as follows:

In the current accelerated iteration of artificial intelligence and robotics technology, a silent battle for "visual data" has begun. On September 22, Morgan Stanley released a research report stating that the visual-language-action (VLA) model is the core for AI robots to achieve autonomous interaction, and the key to training such models—"real-world capture data"—is becoming the focus of competition among global technology and manufacturing giants.

From Tesla's Optimus robot shifting to pure visual training, to Meta embedding ultra-high-definition cameras in wearable devices, to Brookfield collaborating with AI companies to layout scenario data collection, the consensus in the industry has become: "Whoever can acquire high-quality real-world scene videos on a large scale will gain an advantage in the AI robot era."

1. The essence of the "photon competition": Visual data is the "fuel" for AI robots

Morgan Stanley's report vividly illustrates the value logic of visual data with the metaphor of a "fat bluefin tuna": On a remote island, a 600-pound bluefin tuna has zero value if it cannot be caught; only with a boat, fishing gear, and detectors can the tuna have a million-dollar value. The value of visual data is similar—without the ability to collect and process it, the potential value of global visual data cannot be released; when a company masters "Yao-level floating-point operations (10²⁴ times/second)," real-world scene data will become the core "fuel" for breakthroughs in AI robot technology.

This understanding is driving companies to deploy cameras in homes, offices, cars, and even wearable devices. Morgan Stanley analysts, in discussions with Alex Kendrick, co-founder of Figure AI (a startup focused on end-to-end generative AI for autonomous driving), were told: "Whoever can acquire ultra-high-definition video of home scenarios on a large scale... will win." This viewpoint directly addresses the core bottleneck in AI robot development— the scarcity of high-quality, multi-scenario visual training data.

2. Tesla bets on pure visual training, opening a new path for robot data collection

As a key player in the AI robot field, Tesla's actions regarding visual data applications are closely watched. The report reveals that in May 2025, Tesla's former Optimus project leader released a video on the X platform, showing that Optimus can autonomously perform tasks through "human demonstration videos." These videos are shot from a first-person perspective, with the long-term goal of transitioning to "third-person perspectives captured by randomly deployed cameras." This transformation marks a key leap for Tesla from "manual control assistance" to "data-driven autonomous learning."

More groundbreaking is the report from Business Insider in August 2025, which shows that the pre-training of Optimus will completely "dehumanize"—no longer relying on remote operators wearing motion capture suits and virtual reality (VR), but instead obtaining training data through "recorded videos of factory workers performing tasks." This model not only reduces training costs but also allows robots to learn the complex operational logic in real industrial scenarios, enhancing practical value.

Coincidentally, the report also mentions the layout of the unlisted company Skild AI: this company is building a "robotic foundational model," with core training data sourced from "human action videos on the internet," further confirming the universal value of "real-world scenario data" in robot training.

III. Competition Among Giants: Meta Seizes the Wearable Device Entry

In both consumer and scenario ends, technology and asset giants are accelerating their layout in visual data collection, forming a diversified competitive landscape.

Meta: Wearable Devices Become the "Data Battlefield," User Faces Carry Key Value

Meta's layout for the next generation of wearable devices directly targets visual data collection. The report indicates that Meta plans to embed two ultra-high-definition cameras in its eyewear products, focusing on capturing users' "real data of hand movements"—whether playing the piano, knitting, pouring coffee, or taking out the trash, these daily actions will become valuable material for training AI robots. Morgan Stanley predicts that within the next two years, the ownership of such devices may reach 20 million units, nearly double the current number of Tesla vehicles worldwide.

Every Meta glasses user may train a "humanoid virtual avatar" in "billions of scenes in the digital universe." The report vividly states: "These glasses may be stylishly designed, but your face has already become the 'battlefield' for data competition." Although Meta's wearable devices are currently still in the "proof of concept stage," making it difficult to have a substantial impact on finances in the short term, Morgan Stanley's internet team emphasizes that its "full-stack layout" (self-developed hardware + AI operating system + content ecosystem) has laid the foundation for seizing the next generation of computing platforms, with visual data collection being the core link of this layout.

Brookfield: Activating Real Estate Resources to Create the "Largest Scale Pre-training Dataset"

Unlike the device layout of technology companies, Brookfield, a global leader in infrastructure solutions, chooses to "exchange assets for data." The report reveals that Brookfield recently reached a cooperation with Figure AI, planning to open its vast real estate portfolio—over 1 million residential units, 500 million square feet of commercial office space, and 160 million square feet of logistics warehouse space—for the collection of AI robot training data The core value of this collaboration lies in "scene diversity": the environmental characteristics, object layouts, and human activity patterns of different scenarios such as residential, office, and logistics can provide AI robots with multidimensional training materials, helping robots learn to move, perceive, and act in "human-centered various scenarios." Currently, Brookfield's assets have initiated data collection work, which will be scaled up in the coming months; both parties also plan to explore long-term commercialization opportunities for "deploying humanoid robots in real estate," forming a closed loop of "data collection - model training - scene implementation."

IV. Investment Perspective: Tesla as the Core Target, Focus on Data Ecosystem Chain Opportunities

Morgan Stanley clearly lists Tesla as a core focus target in its report, giving it an "overweight" rating with a target price of $410. Among them, breakthroughs in AI robot-related technologies and data accumulation are key variables supporting long-term valuation.

The report also highlights core industry risks: first, intensified competition in the AI robot field from traditional automakers, Chinese automakers, and tech giants; second, execution risks related to the production of multiple Tesla factories and technology iterations; third, the full self-driving (FSD) adoption rate and average revenue per user (ARPU) falling short of expectations, leading to a decline in market recognition of the value of "Dojo supercomputer-enabled service business."