Morgan Stanley: Visual data determines the future of AI, Tesla stands at the forefront of the "photon race"

Morgan Stanley's latest research report points out that companies like Tesla, Meta, and Figure AI are actively laying out the collection and utilization of visual data, forming a "photon race" for real-world visual data. The firm has given Tesla an "overweight" rating with a target price of $410, emphasizing the strategic value of visual data in AI training. Tesla plans to shift to "pure visual" training, using human video to learn and autonomously complete tasks, marking a significant adjustment in its training paradigm

According to the latest research report from Morgan Stanley, a "photon race" for real-world visual data is quietly emerging as multiple companies shift their resources and attention towards physical/embodied AI and robotics technology. In this context, the bank has given Tesla an "overweight" rating with a target price of $410.

Companies like Tesla, Meta, and Figure AI are actively laying out their strategies for collecting and utilizing visual data through different paths. The bank emphasizes, "You can have all the computing resources in the world, but without visual data, you cannot train visual-language-action models (VLA)." Morgan Stanley points out that visual data has become the most scarce and strategically valuable resource in AI training.

Morgan Stanley illustrates the value of visual data with a vivid metaphor: a 600-pound bluefin tuna swimming far from the shore has zero value without a fishing boat and gear; however, if one has the capability to catch it, its value could reach up to $3.1 million. Similarly, if the world's visual data cannot be captured and processed, its value is also zero; but if it can be collected and processed on a large scale, its value will be immeasurable.

Tesla: Shifting to "Pure Vision" Training

In May 2025, the former head of Optimus at Tesla released a series of videos demonstrating how Optimus learns to complete tasks autonomously through human videos. These videos were shot from a first-person perspective (with the camera on the demonstrator), but the ultimate goal is to shift to third-person perspectives obtained through "random cameras" and internet videos.

"Telsa is reportedly shifting to a 'pure vision' approach to pre-train Optimus, no longer using wearable motion capture suits and VR remote operators, but instead recording videos of workers performing tasks as training data."

This shift marks a significant adjustment in Tesla's training paradigm, highlighting the core role of visual data in robotic behavior imitation and generalization capabilities.

The bank expects that in the future, visual data will not only be used for training models but also for building "robot training gyms" (simulated environments) to iterate billions of scenarios in the digital world. Tesla owners, while driving, are not only moving in physical space but also "playing video games," feeding data to the simulated world to train the latest FSD model; Meta glasses users are teaching models how to play the piano, knit, pour coffee, or take out the trash.

Morgan Stanley emphasizes that visual data is the core resource for training the next generation of AI models, and its value is being redefined. Companies like Tesla, Meta, and Figure AI are advancing their data collection strategies through different paths, from vehicles and glasses to real estate, all vying for the leading position in this "photon race."