New Breakthrough in Cloud Computing Competition: Record-breaking NVIDIA GB200 Participates in MLPerf Test, Performance Boost Exceeds Twofold

In the MLPerf Inference v5.0 test measuring reasoning throughput, CoreWeave, NVIDIA, and IBM used 2,496 GB200 Blackwell chips, forming the largest NVIDIA GB200 NVL72 cluster in the MLPerf benchmark. In the benchmark suite's largest and most complex Llama 3.1 405B base model training, this cluster completed the entire process in just 27.3 minutes, achieving over double the training performance compared to similar-sized cluster test submissions

A competition regarding computing power infrastructure is quietly unfolding in the cloud—AI infrastructure provider CoreWeave, in collaboration with NVIDIA and IBM, has just delivered the largest MLPerf Training v5.0 test results in MLPerf history, utilizing the most NVIDIA GB200 superchips in an MLPerf benchmark test.

On Wednesday, June 4th, Eastern Time, CoreWeave announced that in this test with NVIDIA and IBM, 2,496 GB200 Grace Blackwell superchips were used, running CoreWeave's AI-optimized cloud platform, highlighting the scale of the CoreWeave cloud platform and its readiness for today's demanding AI workloads.

CoreWeave stated that this test constituted the largest NVIDIA GB200 NVL72 cluster ever in MLPerf benchmark testing, which is 34 times larger than the previous submission results from cloud service providers.

Moreover, in the largest and most complex model training of the benchmark test suite, the Llama 3.1 405B foundational model training, the aforementioned GB200 NVL72 cluster completed the entire process in just 27.3 minutes. Compared to similar scale cluster test results submitted by other participants, CoreWeave's NVIDIA GB200 cluster improved training performance by over two times.

CoreWeave believes that this result highlights the significant performance leap brought by the GB200 NVL72 architecture and demonstrates the strong capabilities of CoreWeave's infrastructure in providing consistent and top-tier AI workload performance.

Peter Salanki, CoreWeave's Chief Technology Officer and co-founder, stated, "AI labs and enterprises choose CoreWeave because we provide a purpose-built cloud platform that has the scale, performance, and reliability they need for their workloads."

MLPerf Training v5.0 Test Uses the Largest Model in the Training Suite, Industry Participation at an All-Time High

The MLPerf Inference benchmark suite was first launched in 2019 and has since been updated with new models and scenarios to ensure it remains a useful tool for measuring the inference performance of AI computing platforms. MLPerf Inference v5.0 is the latest version, measuring the inference throughput of a range of different models and use cases.

This Wednesday, the open industry alliance MLCommons released the MLPerf Training v5.0 benchmark results, showcasing the rapid growth and evolution in the AI field. This round of testing included a record total number of submissions, with most benchmark test submissions increasing compared to version 4.1.

MLCommons stated that MLPerf Training v5.0 introduced a new pre-training benchmark for the Llama 3.1 405B large language model (LLM), which is the largest model introduced in the training benchmark suite, replacing the previous version's GPT3-based benchmark MLCommons stated that although it has just been included in the testing, the submissions for the Llama 3.1 405B benchmark have already surpassed those of previous rounds based on GPT-3, demonstrating the popularity and importance of large-scale training.

MLCommons disclosed that this round of MLPerf Training v5.0 testing received 201 performance test results from 20 submitting organizations, setting a record with over 200 submissions, indicating a new high in industry participation. The majority of individual benchmark submissions have increased compared to the previous round.

In alphabetical order by English name, the participating organizations in this round of MLPerf Training v5.0 testing include AMD, ASUS, Cisco, CoreWeave, Dell Technologies, GigaComputing, Google Cloud, HPE, IBM, Krai, Lambda, Lenovo, MangoBoost, Nebius, NVIDIA, Oracle, YunDa Technology, SCITIX, Supermicro, and TinyCorp.

David Kanter, head of MLCommons MLPerf, particularly welcomed the first-time submissions of MLPerf training tests from AMD, IBM, MangoBoost, Nebius, and SCITIX. He also emphasized Lenovo's submission of the first set of power consumption benchmark tests in this round, as the energy efficiency issues of AI training systems are becoming increasingly severe and require precise measurement