At GTC 2026, NVIDIA CEO Jensen Huang dropped another bombshell, officially announcing Vera Rubin, the next-generation AI acceleration platform. This is not merely an update to a single chip, but a complete system comprised of seven new chips and five dedicated racks, providing comprehensive infrastructure from training to inference for the upcoming era of "Agentic AI." Huang described it as a "leap forward," declaring, "The turning point for Agentic AI has arrived, and Vera Rubin will usher in the largest wave of infrastructure development in history."
From a single chip to a complete factory: Vera Rubin's grand vision
As AI models evolve from simple question-and-answer generation to "agent AI" capable of autonomous planning, decision-making, and task execution, computational requirements are undergoing fundamental changes. Future AI will not only need powerful GPUs for model inference but also massive CPU resources to run simulation environments, verify results, call tools, and handle complex logical reasoning.
The Vera Rubin platform was created for this purpose, integrating the new Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC smart network card, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU inference accelerator.
These seven chips work together to form a complete AI factory, capable of handling all stages from large-scale pre-training, post-training, and test-time scaling to real-time proxy inference.
Anthropic CEO Dario Amodei stated, "Enterprises and developers use Claude for increasingly complex inference, agent-based workflows, and mission-critical decision-making. This requires infrastructure that can keep up. NVIDIA's Vera Rubin platform provides the computing, networking, and system design needed to sustain the service." OpenAI CEO Sam Altman also emphasized, "With NVIDIA Vera Rubin, we will be able to run more powerful models and agents at scale, providing faster and more reliable systems to hundreds of millions of people."
Comprehensive Analysis of Five Major Racks: Creating Dedicated Weapons for Every Aspect of AI Development
The Vera Rubin platform is not a single product, but rather consists of five different rack systems with varying functionalities, configurable to suit different AI workloads:
• NVIDIA Vera Rubin NVL72 Rack
This flagship rack integrates 72 Rubin GPUs and 36 Vera CPUs, interconnected via NVLink 6. Compared to its predecessor, the Blackwell platform, it requires only a quarter of the GPUs needed to train large hybrid expert (MoE) models, while inference throughput is increased by up to 10 times per watt, and cost per token is reduced to one-tenth. Designed for hyperscale AI factories, it can be seamlessly scaled via Quantum-X800 InfiniBand or Spectrum-X Ethernet.
• NVIDIA Vera CPU Rack
Reinforcement learning and agent-based AI workloads heavily rely on CPU-intensive environments to test and validate the results produced by GPU models. This rack integrates 256 liquid-cooled Vera CPUs, providing scalable, energy-efficient computing capacity. Compared to traditional CPUs, Vera delivers twice the efficiency and 50% faster execution.
• NVIDIA Groq 3 LPX Rack
LPX is designed for the low-latency, large-scale context (million-token level) requirements of proxy systems. When combined with Vera Rubin, the Rubin GPU and LPU work together to compute every layer of the model for every output token, achieving up to 35x improvement in inference throughput per megawatt and delivering up to 10x potential revenue opportunities for mega-parameter models.
• NVIDIA BlueField-4 STX Storage Rack
This is a native storage infrastructure designed specifically for AI, seamlessly scaling GPU memory across the entire POD (Compute Cluster). It is specifically optimized for storing and retrieving large language models and the massive amounts of key-value cache data generated by agent-based AI workflows. Combined with the new DOCA Memos framework, it can increase inference throughput by up to 5x. Timothée Lacroix, Chief Technology Officer at Mistral AI, points out that this will provide a key performance boost for the exponential scaling of agent-based AI.
• NVIDIA Spectrum-6 SPX Ethernet Rack
Designed specifically to accelerate east-west traffic flow in AI factories. Employing Spectrum-X Ethernet photonics technology with co-packaged optics, it offers up to 5 times better optical power efficiency and 10 times better resilience compared to traditional pluggable transceivers.
System-level thinking on energy efficiency and resilience
To address the massive power consumption challenges of AI factories, NVIDIA simultaneously launched the NVIDIA DSX platform. DSX Max-Q technology enables dynamic power configuration throughout the AI factory, allowing data centers with fixed power consumption to deploy up to 30% more AI infrastructure. DSX Flex software allows AI factories to become "flexible grid assets," freeing up to 100 gigawatts (GW) of idle grid capacity.
NVIDIA also unveiled the Vera Rubin DSX AI Factory Reference Design, a complete blueprint covering computing, networking, storage, power, and cooling, which will maximize token output per watt and overall effective throughput, further enhance system resilience, and accelerate mass production timelines.
Full support from the ecosystem: Products to be launched successively in the second half of 2025.
The Vera Rubin platform has gained support from global cloud service providers and system manufacturers, including AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure, and GPU cloud service providers such as CoreWeave and Lambda. System partners include Cisco, Dell, HPE, Lenovo, Supermicro, and Taiwanese supply chain companies such as Quanta, Wistron, Foxconn, ASUS, and Gigabyte.
AI labs such as Anthropic, Meta, Mistral AI, and OpenAI also plan to use the Vera Rubin platform to train larger, more powerful AI models and serve long-context, multimodal systems with lower latency and cost.
All Vera Rubin-based products will be available starting in the second half of this year.









