From Shooting for the Moon to Failing: A Brief Discussion on the Fall of Tesla Dojo

Tesla's Dojo supercomputer project has been disbanded, marking the end of its ambitious aspirations. The project aimed to build a supercomputer specifically for solving AI problems, but ultimately failed to achieve its expected goals due to factors such as the loss of core talent, technological bottlenecks, and high R&D costs. Nevertheless, future next-generation chip architectures, such as wafer-level integrated chips, are still expected to enhance AI computing efficiency. Dojo's design philosophy emphasizes extreme optimization, utilizing a cache-less dual-layer storage system aimed at maximizing computing density and power efficiency

From Shooting for the Moon to Failing: A Brief Discussion on the Fall of Tesla's Dojo

Tesla's Dojo supercomputer is not an ordinary hardware project; it can be described as a "moonshot project," a bold attempt to build a special supercomputer dedicated to solving AI problems. However, according to a Bloomberg report on August 7, Tesla is disbanding the Dojo project team, indicating that Dojo has come to a complete end. The design philosophy of Dojo is to achieve high standards of manufacturing processes through complex programming, thereby obtaining theoretical peak performance.

However, under the triple pressure of core talent loss, yield bottlenecks in wafer-level packaging, and the rapid iteration of external GPU technology, its high R&D costs and uncertain commercial returns ultimately became unsustainable. As the scale of artificial intelligence models continues to grow and the demand for computing increases, the performance bottlenecks faced by traditional computing architectures have become increasingly prominent. Against this backdrop, we remain optimistic about the next generation of chip architectures, such as wafer-level integrated chips and coarse-grained reconfigurable architectures, which, after overcoming manufacturing bottlenecks and yield issues, are expected to enhance AI computing efficiency and flexibility.

What Are the Ambitions of the Dojo Architecture?

The design philosophy of Dojo is extreme optimization, which involves stripping away general computing functions to create a streamlined, massively parallel training "beast." Its architecture is based on two radical breakthroughs in AI memory walls and interconnect walls: 1) A cache-less dual-layer storage system. Dojo's D1 computing chip completely abandons traditional cache hierarchies and virtual memory, allowing all 354 cores to directly access 1.25MB of local SRAM. This maximizes computing density and power efficiency by removing complex memory management hardware. However, this is a typical NUMA (Non-Uniform Memory Access) structure: data not in local SRAM must be fetched from system-level HBM located on separate DIP (Dojo Interface Processors), and the loops crossing the interconnect structure introduce significant latency, transferring all the complexity of memory management to the software layer and creating a huge performance gap between local SRAM and remote HBM.

2) "Glueless" wafer-level interconnect. The true core of Dojo's goals lies in its interconnect design. Tesla utilized TSMC's InFO_SoW (Integrated Fan-Out System on Wafer) technology to create the "Training Tile" training unit, which is not a PCB board but a single, massive multi-chip module built on a carrier wafer, accommodating 25 D1 chips in a 5x5 array. These chips are designed for "glueless" communication, directly connecting to adjacent chips through thousands of high-speed SerDes links, creating a unified computing plane with an external bandwidth of up to 36TB/second, eliminating the network bottlenecks that plague traditional supercomputers

How to Learn from the Failures of Dojo?

The forward-looking design of Dojo is also its weakness. The failure of Dojo is not a single technical issue, but rather the result of three deep-rooted reasons working together:

1) Talent Loss. Complex technologies require a deep knowledge base. According to Bloomberg, after the departure of Dojo head Ganesh Venkataramanan in 2023, he founded a competing startup, DensityAI, and currently about 20 core engineers have also left Tesla to join DensityAI. Additionally, current Dojo head Peter Bannon is also reported by Bloomberg to be leaving Tesla, leading to a significant loss of the technical accumulation and know-how required to tackle Dojo's highly customized architecture.

2) Yield Defects. Wafer-level interconnects are theoretically very "smart," but are extremely challenging in industrial manufacturing processes. On modules of wafer size, any minor wiring defect or any mounting flaw among the 25 D1 chips can lead to the complete scrapping of a high-value Training Tile. Low yields make the cost of large-scale deployment expensive and commercially unfeasible.

3) Strategic Shift to Practicality. Dojo has been hindered by delays and low yields, while external suppliers like NVIDIA and AMD continue to rapidly advance in GPU performance and ecosystems. Therefore, for Tesla, the cost-effectiveness of pursuing high-risk internal projects has begun to decline. Tesla has shifted its strategic focus to more pragmatic solutions, strengthening cooperation with industry partners such as NVIDIA, AMD, and Samsung. On July 27, Tesla announced a $16.5 billion contract with Samsung to manufacture its AI6 inference chips and has increased its reliance on NVIDIA and AMD for training computing clusters.

From Shooting for the Moon to Crashing: A Brief Discussion on the Fall of Tesla's Dojo

Elon Musk's announcement to halt the Dojo supercomputer project was not a last-minute decision, but rather the result of multiple overlapping factors, primarily stemming from: technical bottlenecks, cost pressures, and core talent loss. These three reasons ultimately led the company to choose to abandon its internal supercomputer development.

Reason #1: Beginning with Talent Outflow from Dojo

The first major blow to the project came from the collective loss of the core team. According to Bloomberg, after the departure of Dojo head Ganesh Venkataramanan in 2023, he founded a competing startup, DensityAI, and currently about 20 core engineers have also left Tesla to join DensityAI. Additionally, current Dojo head Peter Bannon is also reported by Bloomberg to be leaving Tesla. This has created a noticeable vacuum in both research and development and execution for the project. DensityAI focuses on providing chips, hardware, and software solutions for AI data centers in the fields of robotics, AI agents, and automotive, with product directions highly overlapping with Dojo, directly entering the market segment Tesla originally intended to capture with Dojo The company was founded by former Tesla AI and chip development backbone, including Ganesh Venkataramanan, Bill Chang, Benjamin Floerin, and other core leaders and technical backbones of Dojo.

Reason #2: Strategic Shift to Cost-Effectiveness and Reliance on External Partners

In the face of execution pressure from the loss of core team members, Tesla has accelerated its strategic adjustment, shifting to rely on mature solutions from industry-leading manufacturers to reduce R&D and mass production risks. The company is significantly increasing its procurement ratio from NVIDIA and AMD, as directly introducing best-in-class, proven AI hardware can avoid the high investment and uncertainty required for zero-based development, ensuring that key product roadmaps such as Full Self-Driving (FSD) and Optimus robots are not affected by internal hardware bottlenecks. At the same time, Tesla has signed a contract worth $16.5 billion with Samsung to produce the next-generation AI6 inference chips in Texas, which can validate the company's shift towards a practical strategy. Musk has long positioned Dojo as a high-risk, high-reward "forward-looking project," with its feasibility core resting on whether the performance advantages of a customized architecture can offset the enormous investment and R&D difficulties required. However, with the successive launch of high-performance chips such as NVIDIA's Blackwell and Rubin series and AMD's MI350 and MI400 series, the potential performance lead of Dojo has clearly narrowed. Against the backdrop of continuously rising internal costs, multiple project delays, and the need to reallocate resources from other strategic priorities, the project risks have partially materialized, while the uncertainty of returns has significantly increased. Under the company's full consideration of cost-benefit balance, it has shifted towards external mature solutions.

Reason #3: Complex Architecture Difficult to Manage, Manufacturing Bottlenecks Exist in the Supply Chain

The core contradiction of the Dojo project stems from its disruptive design philosophy. This architecture abandons the general-purpose design thinking of traditional CPU/GPU, focusing on pushing the computing density and energy efficiency of AI training workloads to the extreme. However, this extreme pursuit of specific goals has introduced extremely high technical complexity in memory and interconnect systems. This design theoretically performs excellently, but faces enormous challenges in engineering practice and mass production, ultimately becoming the fundamental reason for the project's failure.

Memory Architecture: Cache-less Dual-Layer System

Dojo's memory design abandons standard functions in general computing, creating a system that is highly optimized for specific workloads but challenging in programming and management. At its core, Dojo has abandoned traditional data-side caching and virtual memory support In the 354 processing cores of the D1 chip, there is no L1/L2/L3 cache hierarchy; instead, they directly access a local 1.25MB SRAM block. By removing cache tags, state bits, TLB, and page-walking hardware, Dojo saves a significant amount of chip area and power consumption, allowing for denser computing arrays. However, the cost of this design is the complete transfer of memory management complexities (such as data locality, prefetching, etc.) to the software and compiler level, greatly increasing programming difficulty.

At the memory level, the system presents a typical dual-layer Non-Uniform Memory Access (NUMA) architecture, including:

Local Memory Layer (SRAM): Each core has a private 1.25MB high-speed SRAM, serving as the primary computing workspace with extremely low access latency;
Remote Memory Layer (HBM): A large-capacity system memory composed of HBM2e/HBM3. The key point is that this layer of memory cannot be directly addressed by the D1 computing cores but is mounted on the edge of the computing array via independent Dojo Interface Processors (DIP). If a core wants to access HBM, its request must traverse a complex on-chip network (NoC) to reach the DIP, with latency far higher than accessing local SRAM.

This design creates a significant performance cliff between the core SRAM and the off-chip HBM, imposing extremely stringent requirements on software scheduling and data layout, further exacerbating the challenges of software stack development and optimization.

Interconnect Structure: "No Glue" Wafer-Level Design

Dojo's interconnect architecture is a core highlight of its design and one of the most challenging aspects of its technical implementation. Its goal is to build a large-scale unified computing plane with ultra-high bandwidth through multi-level customized design. This architecture mainly includes two levels:

1) On-Chip Interconnect Using 2D Mesh: Within a single D1 chip, 354 computing cores are integrated and arranged in a 2D mesh structure. This design achieves extremely high bandwidth and low latency inter-core communication, providing efficient underlying support for data sharing and synchronization operations in large-scale parallel computing.

2) Training Unit Under Wafer-Level Integration (Training Tile): This is a concentrated embodiment of Dojo's architectural complexity and the core bottleneck of its yield issues. The Training Tile is not a traditional PCB circuit board but is based on TSMC's InFO_SoW (Integrated Fan-Out System on Wafer) Technology, a super-large-size multi-chip module built on the same substrate wafer. This module integrates 25 D1 chips in a 5x5 array. The edges of the D1 chips are designed with 576 high-speed bidirectional SerDes, achieving "glueless" direct interconnection between chips, meaning communication can occur without any external bridging chips. This design allows each D1 chip to communicate directly with its four neighboring chips, with a total I/O bandwidth of up to 8TB/s for a single chip. Ultimately, the total off-chip bandwidth of a single Training Tile can reach 36TB/s, a figure that far exceeds the capabilities of traditional data center network switching devices, making it a key factor in its performance leadership.

To achieve large-scale deployment beyond a single Training Tile, Dojo employs a multi-level physical integration scheme: by using customized high-density connectors, multiple training units are integrated into a system tray, and through interconnections between trays, a complete cabinet is formed, ultimately creating a massive exaPOD computing cluster. The system's external communication is handled by DIP. As the "gateway" connecting to the host system, DIP exchanges data with servers via a standard PCIe 4.0 bus that supports Tesla's self-developed transmission protocol (TTP).

However, Dojo's most ambitious wafer-level integration scheme also poses its greatest manufacturability challenge. Manufacturing a complex module the size of a wafer, containing 25 D1 chips and thousands of high-speed interconnections, is a significant test for existing processes. Any minor wiring defect on the substrate wafer or any flaw during the mounting and bonding process of the D1 chips could lead to the direct scrapping of the entire valuable training unit, resulting in yield loss.

Dojo's design philosophy essentially aims to achieve high standards of manufacturing processes through complex programming, thereby obtaining theoretical peak performance. The streamlined memory model requires complex software, and the forward-looking wafer-level interconnections push the difficulty of semiconductor manufacturing processes to the limit, creating a system that is conceptually excellent but extremely difficult to scale.

The direct consequence is extremely low yield. This complexity in architecture directly results in very low manufacturing yields. Due to the high precision required by the novel design and chip integration interconnection structure, a higher proportion of defective and unusable chips exist in Dojo. This manufacturing bottleneck is the ultimate technical barrier, and the forward-looking design at the architectural level ultimately leads to rigid constraints in the industry chain.

It can be said that the termination of the Dojo project is an inevitable result, rooted in the sharp contradiction between Tesla's grand technological vision and the objective laws of the semiconductor industry. The former is Tesla's "obsession" with creating a perfect AI supercomputer, while the latter consists of the harsh physical laws and economic costs of semiconductor manufacturing. Once the core technical team capable of balancing both aspects departed, the project's failure became unavoidable. Dojo was an ambitious "moonshot" plan, but ultimately it returned to the ground This attempt has delineated the boundaries of Tesla's technological vision and has left profound insights for the industry regarding the feasibility of technological routes and commercialization.

Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk