AI "Battle of the Gods": In the face of "Stargate," Zuckerberg aims to build "Prometheus"

Wallstreetcn
2025.07.15 02:52
portai
I'm PortAI, I can summarize articles.

Meta has launched two giant AI clusters, Prometheus and Hyperion, to break through the computing power bottleneck. The former has a scale of up to 1 GW and adopts a "comprehensive" strategy, integrating self-built parks, third-party leasing, and on-site natural gas power generation. The first phase of the latter will exceed 1.5 GW, and it is expected to become the world's largest single AI data center park by the end of 2027

Meta is initiating an unprecedented strategic transformation to reverse its lag in the foundational model competition.

On July 15, according to Wall Street Insight, Meta CEO Mark Zuckerberg stated on Monday that the company will invest hundreds of billions of dollars to build several large data centers, with the first data center, Prometheus, expected to be operational next year.

Reportedly, Meta is following in the footsteps of xAI by adopting a more flexible and faster-to-build "tent-style" data center design, while simultaneously secretly constructing two "gigawatt-level" (GW) supercomputing clusters in Ohio and Louisiana, codenamed Prometheus and Hyperion, respectively.

Under the personal push of founder Zuckerberg, this advertising giant with an annual cash flow of up to hundreds of billions of dollars is investing heavily in computing infrastructure and top talent without regard for costs, aiming to catch up with and surpass competitors like OpenAI, with a core goal targeting "superintelligence."

Computing Power is King: From "Tent" to "Gigawatt-Level" Clusters

To quickly acquire massive computing power, Meta has shelved its data center construction blueprint from the past decade.

Reportedly, Zuckerberg has decided to innovate the strategy again, embracing a new design that prioritizes construction speed. This "tent-style" structure, inspired by xAI, utilizes prefabricated power and cooling modules along with ultra-lightweight structures, sacrificing some redundancy (such as backup diesel generators) to bring GPU clusters online as quickly as possible.

To achieve this goal, Meta is advancing two massive infrastructure projects:

  • Prometheus Cluster: Located in Ohio, a 1-gigawatt AI training cluster. Meta has adopted a "comprehensive" strategy, integrating self-built parks, third-party leasing, and on-site natural gas power generation. Reportedly, the project aims to connect all sites through a super high-bandwidth network to form a unified backend network. To address local grid power supply bottlenecks, Meta is even following Musk's example by constructing two 200-megawatt on-site natural gas power plants.

  • Hyperion Cluster: Located in Louisiana, this project is even larger, aiming to fully surpass OpenAI's highly anticipated Stargate project. Reportedly, the IT power of the first phase of Hyperion will exceed 1.5 gigawatts, and it is expected to become the world's largest single AI data center park by the end of 2027.

The goals of these initiatives are very clear: to transform Meta from "GPU-poor" to "GPU-rich" in terms of per capita computing resources, enabling its training computing power scale to compete with leading labs like OpenAI.

The Llama 4 Debacle: Reviewing the Technical and Strategic Roots

Meta's aggressive transformation stems from the failure of its Llama 4 Behemoth model. After Llama 3 once led the wave of open-source models, this failure has damaged Meta's reputation.

According to reports, the technical roots of the failure mainly include the following points:

  • Architectural choice error: The model adopted a "Chunked Attention" mechanism to pursue efficiency in processing long texts, but this created "blind spots" at the boundaries of chunks, impairing the model's long-range reasoning ability. Additionally, the initially adopted "Expert Choice Routing" improved training efficiency but performed poorly during the inference phase, and switching back to "Token Choice Routing" midway led to confusion in expert division of labor.

  • Data quality bottleneck: Midway through training, the team shifted from using public datasets to its newly established internal web crawler, but was unprepared in data cleaning and deduplication. More importantly, unlike other top AI labs, Meta did not leverage the vast amounts of video and text data from YouTube, which may have limited its development of multimodal capabilities.

  • Shortcomings in scaling and evaluation: Reports indicate that the Llama 4 team encountered difficulties in scaling research experiments to large-scale training, lacking strong leadership to unify the technical direction. At the same time, Meta is relatively behind in reinforcement learning and internal evaluation infrastructure, failing to identify issues in architectural choices early on.

The report suggests that although Llama 4 itself failed, Meta is still transferring its knowledge to smaller Maverick and Scout models through model distillation technology.

Bridging the Gap: Lavish Spending and Strategic Acquisitions

While restructuring its computing infrastructure, Zuckerberg has shifted the strategic focus to another key element: talent. He is well aware of the talent gap between Meta and top AI labs, and is personally responsible for recruiting members for a brand new "superintelligence" team.

According to reports, Meta's compensation packages for top researchers can reach up to $200 million over four years, with some key positions even receiving rejected offers in the $1 billion range.

This strategy is not only aimed at attracting talent but also at raising the hiring costs for competitors. Recent notable recruits include former GitHub CEO Nat Friedman and Daniel Gross, who co-founded SSI with Ilya Sutskever.

Beyond talent strategy, strategic acquisitions have become another major pillar. The investment in Scale AI is seen as a key step, far from being a "second-best" option.

Analysts believe this move directly addresses the data and evaluation shortcomings exposed by Llama 4.

Scale AI's founder Alex Wang and its SEAL lab, which specializes in model evaluation, will bring much-needed capabilities to Meta, especially with its developed inference model evaluation benchmark HLE (Humanity’s Last Exam), which will significantly compensate for Meta's deficiencies