
Elon Musk's interpretation of the AI trend: OpenAI has been caught up, Microsoft has retreated, and the "Inference Era" requires only 2-3 giant data centers

Gavin Baker stated that in the future, data will become the core of competition, and cutting-edge models that cannot obtain unique and valuable data are the fastest depreciating assets in history, while distillation will only amplify this. In the future, only 2-3 giant data centers will be needed for pre-training, and the allocation of computing resources for pre-training and inference will be 5/95
The competition for AI large models is becoming increasingly intense, with the second half focusing on reasoning and data.
On February 23, Elon Musk liked a tweet analyzing the competitive landscape of AI models, praising it as "well analyzed." Notably, last week, Musk's xAI officially released the Grok 3 large model.
This tweet was posted by Gavin Baker, who stated in the article that the transformation of the AI industry landscape is accelerating, with OpenAI's first-mover advantage diminishing and Microsoft choosing to take a step back.
Gavin also predicts that data will become the core of competition in the future, and cutting-edge models that cannot obtain unique and valuable data will be the fastest depreciating assets in history. Giants like Meta are building moats through data monopolies and scale of computing power, while small and medium players focus on differentiated deployment and cost optimization.
However, Gavin remains optimistic about xAI and OpenAI, stating that if OpenAI is still the leader in the field five years from now, it may be due to its first-mover advantage, scale advantage, and product influence.
OpenAI's first-mover advantage diminishes, and Microsoft also chooses to take a step back
Gavin pointed out in the tweet:
When ChatGPT burst onto the scene in November 2022, OpenAI established a dominance in the generative AI field for seven consecutive quarters through aggressive bets on Scaling Law. But this advantage window is closing: Google's Gemini, xAI's Grok-3, and Deepseek's latest models have all reached a technical level comparable to GPT-4.
Even OpenAI founder Altman has pointed out that OpenAI's future leading advantage will be narrower; Microsoft CEO Nadella has essentially stated that their unique period of leading model capabilities is coming to an end.
According to a previous report by The Information, internal memos at Microsoft indicate that due to diminishing marginal returns on pre-training, the original plan to invest $16 billion to upgrade pre-training infrastructure has been halted, and Microsoft is instead focusing on providing reasoning for OpenAI to generate revenue.
Nadella also mentioned in a previous podcast that data center construction may be oversupplied, with leasing being preferable to self-building, and Microsoft may even use open-source models to support CoPilot. Gavin believes these indicate that the "pre-training era," which relied solely on parameter expansion to establish barriers, has come to an end.
Exclusive data resources become a moat
Gavin believes that as model architectures converge, exclusive data resources will become a moat Gavin stated in a tweet:
I have repeatedly quoted Eric Vishria's words: the inability to access unique and valuable data for frontier models is the fastest depreciating asset in history, and distillation will only amplify this.
If frontier models cannot access unique and valuable data from platforms like YouTube, X, TeslaVision, Instagram, and Facebook in the future, there may be no return on investment at all.
From this perspective, Zuckerberg's strategy seems to be much wiser. Unique data may ultimately become the only basis for differentiating and ROI of pre-trained trillion or multi-trillion parameter models.
This explains why Zuckerberg has anchored Meta's AI strategy in a closed loop of social data. According to previous media reports, the image labeling data from Instagram users has improved the training efficiency of Meta's multimodal models by 40%.
Only 2-3 giant data centers are needed, with 95% of computing power required for inference
This change will also bring about a disruptive shift in the AI infrastructure landscape. Gavin predicts,
Pre-training computing power: requires ultra-large-scale clusters (100,000 card level), but participants will be reduced to 2-3 companies, with the technology stack pursuing extreme performance (liquid cooling, nuclear power supply), making this center comparable to a "Ferrari" level supercomputing center.
Inference computing power: smaller 6-10 data centers, dominated by distributed, low-cost architectures, with geographical proximity deployment and energy efficiency being key, using wind/solar energy, supported by quantitative compression technologies (such as Deepseek R1's 1-bit LLM) for low-cost inference, akin to "Honda" level edge nodes.
Gavin emphasized that inference models are extremely compute-intensive, and only with strong computing power can models efficiently complete inference tasks. However, unlike the previous situation where computing resources were roughly split evenly between pre-training and inference phases, it will now shift to 5% for pre-training and 95% for inference. Excellent infrastructure will be crucial.
Overall, the future AI industry may present a bipolar pattern of "pre-training centralization and inference decentralization," with data becoming the core of power. Giants build moats through data monopolies and scale of computing power, while small and medium players focus on differentiated deployment and cost optimization