At the recent Hot Chips 2025 conference, Google DeepMind Gemini project co-leader and co-author of the Transformer model paper "Attention Is All You Need"Noam Shazeer, with the theme of "Prediction of the Next Stage of AI", explains the next stage of development of AI.
Noam Shazeer, who left Google to start an artificial intelligence technology company after Google refused to acknowledge that chatbots had "self-awareness,"Character.AI, and then "returned" to the Google system for a sky-high price of US$27 billion and became a Google DeepMind scientist.
What does LLM want? — Computing power, memory, and network bandwidth
In his speech at the Hot Chips 2025 conference, Noam Shazeer pointed out that the most important thing for large-scale language models is computing power. More FLOPS means larger models, longer context, and better reasoning capabilities.
Noam Shazeer also recalled that in 2015, training models on 32 GPUs was considered a major event, but ten years later, hundreds of thousands of GPUs might be needed to support the latest LLM training scale.
Noam Shazeer believes that computing power must reach the petaflop level or even higher to meet the training needs of large models. Larger memory and higher bandwidth determine the size of AI models and the amount of intermediate states that can be saved during inference, which is extremely important for long context and attention mechanisms.
In addition, network bandwidth also plays an important role when model parameters are distributed throughout the entire computing network. It is necessary to ensure that the latency of cross-chip data exchange is extremely low in order to speed up inference and support "long thought chain" reasoning.
Noam Shazeer further pointed out that in addition to increasing hardware scale, it is reasonable to reduce computing precision (such as FP8 and INT4) in exchange for higher performance, but reproducibility (determinism) should not be sacrificed, otherwise it will be impossible to effectively debug and verify the model.
Software and hardware synergy drives AI forward
Noam Shazeer, known as a "reverse crossover," is deeply curious about the underlying network architecture of the TPU and has driven projects such as Mesh-TensorFlow. He believes that hardware-software co-design is key to LLM growth. From on-chip SRAM and high-bandwidth memory to cluster network design, everything must be tailored to model requirements to maximize its potential.
Meanwhile, Noam Shazeer concluded with this sentence: "With bigger, faster, and more stable clusters, you can train smarter models."
If hardware stops improving, can AGI be achieved?
When the audience asked a pointed question: "If hardware stopped improving from today, could we still achieve AGI?", Noam Shazeer gave a rare affirmative answer: Yes.
Noam Shazeer believes that AI itself will accelerate the evolution of software and system design. Even if hardware stagnates, breakthroughs can still be achieved through algorithmic innovation. However, Noam Shazeer added, "If we can get better hardware, that would be even better."
