How to anticipate DeepSeek R2

Reuters reported that DeepSeek may release version R2 before May. Researcher Daya mentioned that RL is still in the early stages and significant progress is expected this year. The R1 paper pointed out that as RL data increases, the model's reasoning ability will improve, and complex behaviors will naturally emerge. DeepSeek plans to continue advancing the model's development based on open source, with R2 aiming to match OpenAI's complete model, while V4 may incorporate multimodal capabilities

Reuters reported tonight that DeepSeek may release r2 before May. Earlier, DeepSeek researcher Daya mentioned in early February that RL is still in its early stages and that there will be "significant progress" this year.

In fact, it was also mentioned in the r1 paper that due to the currently limited RL training data, the next version of R1 will see significant improvements.

This is the diagram from the r1 paper, which states: as RL data increases, the model's ability to solve complex reasoning tasks continues to improve steadily, and some complex behavioral capabilities, such as "reflection" and "exploring different methods," will naturally emerge. These capabilities are not designed by humans but emerge naturally as the model is trained in the RL environment.

A superficial understanding is that there is no need for significant algorithmic innovation at this stage. Following the current path + more computing power + such strong infrastructure capabilities from DS, it is still possible to achieve r2/r3 based on the current V3 base model. When observing the marginal improvement of RL slowing down, one can continue to advance RL based on the new base V4 to further enhance the reasoning model. This is illustrated in the diagram below: (left foot stepping on the right foot illustration)

Referring to OpenAI's roadmap, o3 has already decided not to release a complete model, and GPT-4.5 has become the last independently released base model, which means that starting with GPT-5 (hybrid model), it will become increasingly opaque. In simple terms, whether it is the base model or the reasoning model itself, they will be "raw materials" rather than "final products," and CloseAI and Anthropic will definitely keep them under wraps.

However, what DeepSeek aims to do is to continue to open source while others continue to close source. r2 should correspond to the complete version of o3, while V4 should at least correspond to GPT-4.5. The model based on V4+RL should correspond to the so-called "GPT-5" in the future. Therefore, a reasonable expectation is that V4 may incorporate multimodal capabilities, but the r series will still be reasoning models. Throughout this process, all "raw materials" will be fully open-sourced, and not only will the raw materials be open-sourced, but according to the tone of this code, even the "formulas" for producing the raw materials will be directly open-sourced There are actually no secrets that DeepSeek doesn't know, and it even far exceeds many model manufacturers in North America at the infrastructure level. Today, we discussed in the community: DeepSeek may even understand how to use GPUs better than NVIDIA. The so-called innovations in research, the inspiration for OpenAI's series o comes from already published "open-source" papers, combined with its own computational power advantages and engineering explorations. Ultimately, no one relies solely on their own closed-door development; everyone benefits from the global "open-source" research or practices.

So, coming back to it, compared to r2, everyone should actually look forward to V4 even more, as it opens up another level ceiling for inference models and paves a completely new runway. r2 is a confirmed event on the timeline, while V4 will be a surprise. Both will happen this year.

Information equality, original title: "How to Anticipate DeepSeek R2"

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account individual users' specific investment goals, financial conditions, or needs. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at your own risk