DeepSeek becomes the world's second-largest AI laboratory, OpenAI and Google are restless

Wallstreetcn
2025.05.30 07:16
portai
I'm PortAI, I can summarize articles.

DeepSeek has become the world's second-largest AI laboratory alongside Google with its new version R1. According to a report by Artificial Analysis, DeepSeek's AI analysis index jumped from 60 to 68, surpassing xAI, Meta, and Anthropic. The index evaluates multiple leading models, and DeepSeek's progress is comparable to OpenAI's o1 and o3. On social media, netizens have praised DeepSeek's performance, considering its leap a milestone for open-source AI, but some have pointed out the differences between benchmark testing and real-world applications

DeepSeek ranks second globally with the new R1, crowned for open-source capabilities.

According to a report released today by the well-known independent AI benchmarking and analysis organization Artificial Analysis on May 30, DeepSeek has surpassed xAI, Meta, and Anthropic with its new R1, becoming the second-largest AI laboratory globally, tied with Google. Once shared, the report garnered over 300,000 views and extensive discussions and shares on the social platform X.

In the AI analysis index proposed by the organization, the index for DeepSeek-R1-0528 jumped from 60 to 68, tying for third place with Google Gemini 2.5 Pro. This AI analysis index is based on seven leading assessments, including MMLU-Pro and GPQA Diamond, conducted independently by Artificial Analysis on all leading models.

The increase for DeepSeek is the same as the difference between OpenAI's o1 and o3 (from 62 to 70). This advancement allows DeepSeek R1's intelligence level to surpass xAI's Grok 3 mini (higher version), NVIDIA's Llama Nemotron Ultra, Meta's Llama 4 Maverick, Alibaba's Qwen3-235B, and is comparable to Google's Gemini 2.5 Pro.

▲ Comments from users on the social platform X (translated from English to Chinese)

On the X platform, many foreign users expressed their admiration with comments like "So fast!", "Excellent!", and "Impressive."

Some users described the leap of DeepSeek-R1-0528 as a "milestone for open-source AI," while others praised its successful RL (reinforcement learning) driven improvements, indicating that "RL is more efficient than pre-training." However, some users also noted that there is still a difference between benchmarking and practical applications.

▲ Comments from netizens on social platform X (translated from English to Chinese)

Some netizens also associated it with AI competition, stating that "DeepSeek's R1 action is like participating in a competition," and expressed that with the arrival of the next round of benchmark testing, the game has just begun.

▲ Comments from netizens on social platform X (translated from English to Chinese)

DeepSeek Becomes the Second Largest AI Laboratory in the World First in the Open Source Field

The AI Analysis Index from Artificial Analysis includes 7 assessments: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500.

DeepSeek-R1-0528 has achieved multi-faceted intelligence improvements: the biggest breakthroughs are reflected in AIME 2024 (competitive mathematics, +21 points), LiveCodeBench (code generation, +15 points), GPQA Diamond (scientific reasoning, +10 points), and Humanity's Last Exam (reasoning and knowledge, +6 points).

As shown in the figure below, DeepSeek-R1-0528 scored 68 points in the AI Analysis Index, second only to OpenAI o4-mini (higher version) with 70 points and OpenAI o3 with 69 points. It is on par with Google Gemini 2.5 Pro's 68 points.

The gap between open-source models and closed models is smaller than ever. As shown in the figure below, the blue rectangle represents the open-source model, and the black rectangle represents the closed model. DeepSeek-R1-0528 firmly holds the first place with 68 points, followed by Qwen3-235B with 62 points.

Outstanding Programming and Mathematical Abilities Accelerating Catch-Up for Three Years

Breaking it down, in programming ability (referencing LiveCodeBench and SciCode tests), DeepSeek-R1-0528 ranks tied for second place with 59 points, just behind OpenAI o4-mini (higher version) with 63 points

In terms of mathematical ability (referencing AIME 2024 and Math-500), DeepSeek-R1-0528 ranks fourth with a score of 94, just behind OpenAI o4-mini (high version) with 96, Grok 3 mini Reasoning (high version) with 96, and OpenAI o3 with 95.

Extending the time dimension, it can be seen that DeepSeek has been narrowing the gap with OpenAI over the past three years. It has consistently maintained its leading position as an AI laboratory and is significantly closing in on OpenAI by January 2025.

The R1 version released by DeepSeek in January marked the first time an open-weight model achieved second place, and today's R1 update has brought it back to the same position.

Balance of Intelligence and Price “King of Cost Performance”

In terms of price, DeepSeek-R1-0528 is priced at $0.96 per million tokens, while OpenAI o4-mini (high version) is priced at $1.93 per million tokens, and OpenAI o3 is as high as $17.5 per million tokens. DeepSeek-R1-0528 can be regarded as the “king of cost performance.” Note that the price here is a combination of input and output prices (3:1 ratio).

Looking at the prices for input and output separately, DeepSeek-R1-0528 has an input price of $0.55 per million tokens and an output price of $2.19 per million tokens. Its input price is lower than OpenAI o4-mini (high version) at $1.1 per million tokens, and its output price is $4.4 per million tokens; it is far lower than o3's input price of $10 per million tokens and output price of $40 per million tokens

In terms of output speed, DeepSeek-R1-0528 achieved a speed of 32.01 tokens/second, while OpenAI o4-mini (higher version) had a speed of 129.37 tokens/second, and o3 had a speed of 150.73 tokens/second.

From the time of the first response token, DeepSeek-R1-0528 had a "thinking" time of 65.6 seconds, which is quite long.

Additionally, the new version R1 of DeepSeek increased the token usage: R1-0528 used 99 million tokens to complete the evaluation of the AI analysis index, which is 40% more than the original R1's 71 million tokens, indicating that the new R1 took longer to think than the original R1. This is still not the highest token usage we have seen: Gemini 2.5 Pro used 30% more tokens than R1-0528.

Conclusion: Open Source Rivals Closed Source, Chinese AI Labs Catch Up with American Peers

Currently, the gap between open-source models and closed models is smaller than ever. The R1 version released by DeepSeek in January was the first open-weight model to achieve second place, and today's R1 update has brought it back to the same position.

At the same time, models from Chinese AI labs have nearly caught up with their American counterparts, and this release continues this emerging trend. As of today, DeepSeek leads in AI analysis intelligence index over American AI labs, including Anthropic and Meta.

Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at their own risk