NVIDIA released Rubin CPX, targeting ultra-long context processing, Jensen Huang stated that it can infer millions of tokens at once

Rubin CPX enhances AI video generation and software development capabilities, providing 30 petaflops of computing power, which is 3 times the attention acceleration compared to the GB300 NVL72 system, and is set to launch by the end of 2026. Jensen Huang stated that the Rubin CPX is the first CUDA GPU specifically designed for large-scale contextual AI, capable of inferring millions of knowledge tokens simultaneously. NVIDIA claims that deploying $100 million in new chip hardware will generate up to $5 billion in revenue for customers

On September 9th, Eastern Time, NVIDIA released the next-generation Rubin CPX chip system, specifically designed for large-scale contextual processing tasks such as AI video generation and software development, enhancing AI coding and video processing capabilities.

The Rubin CPX is scheduled to launch at the end of 2026, in a card form that can be integrated into existing server designs or run as a standalone computing device in data centers.

This chip system achieves significant breakthroughs in technical specifications. The Rubin CPX GPU offers 30 petaflops of computing power (NVFP4 precision), equipped with 128GB of GDDR7 memory, and supports hardware for video decoding and encoding, achieving three times the attention acceleration compared to the NVIDIA GB300 NVL72 system.

The complete Vera Rubin NVL144 CPX platform integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs in a single rack, providing 8 exaflops of AI performance, which is 7.5 times the performance of the NVIDIA GB300 NVL72 system.

NVIDIA CEO Jensen Huang stated that the Rubin CPX is the first CUDA GPU built specifically for processing millions of tokens. He said:

"Just as RTX revolutionized graphics and physical AI, Rubin CPX is the first CUDA GPU built for large-scale contextual AI, allowing models to infer millions of knowledge tokens simultaneously."

NVIDIA claims that the new chip offers a return on investment of 30 to 50 times, with a deployment of $100 million in new chip hardware potentially generating up to $5 billion in revenue for customers, meaning up to $5 billion in revenue from a $100 million capital expenditure. This forecast emphasizes NVIDIA's efforts to quantify the commercial value of AI infrastructure.

Technological Innovation: Decoupled Inference Architecture Enhances Efficiency

The Rubin CPX employs a decoupled inference architecture, dividing the AI computing process into contextual and generative phases. The contextual phase requires high-throughput computing to process large amounts of input data, while the generative phase relies on fast memory transfers and high-speed interconnects.

This design allows the two phases to be processed independently, achieving precise optimization of computing and memory resources. The Rubin CPX is specifically optimized for the compute-intensive contextual phase, working in conjunction with existing infrastructure to provide three times the attention acceleration performance.

The platform is equipped with 100TB of high-speed memory and a memory bandwidth of 17 petabytes per second, connected via NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet, coordinated by the Dynamo platform

Application Scenarios: Reshaping Software Development and Video Generation

In the field of software development, Rubin CPX enables AI systems to handle entire codebases, maintain cross-file dependencies, and understand repository-level structures. This transforms programming assistants from autocomplete tools into intelligent collaborators capable of understanding "large-scale software projects."

In terms of video generation, the system can complete decoding, encoding, and processing on a single chip, with AI models capable of handling up to 1 million tokens of one hour of content. This provides unprecedented coherence and memory capabilities for long video content generation.

Multiple companies have expressed interest in collaboration. The code generation company Cursor plans to use this technology for code generation, the video creation platform Runway will apply it to video generation workflows, and the AI research company Magic plans to leverage this technology to build foundational models with a 100 million token context window.

Market Impact: Consolidating NVIDIA's Advantage in AI Infrastructure

The release of Rubin CPX further consolidates NVIDIA's leading position in the AI infrastructure space. Analysts estimate that NVIDIA's data center business is expected to reach $184 billion in revenue this fiscal year, surpassing the total revenue of other companies in the industry.

This product reflects NVIDIA's ongoing investment in hardware and software innovation, a pace that competitors have yet to match. By optimizing hardware specifically for certain AI workloads, NVIDIA continues to maintain the industry's reliance on its products.

The new platform is expected to create new possibilities for enterprises to build next-generation generative AI applications, particularly in high-value reasoning use cases that require handling large-scale contexts. This marks an important shift in AI infrastructure from general computing to dedicated optimization