• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026 / 06 / 10 20:28 Wednesday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home exhibition

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?

Moving towards a more explicit "dual architecture" traffic splitting strategy, and undergoing a complete overhaul of network interconnection technologies.

Author: Mash Yang
2026-04-26
in exhibition, Hard body, network, Processor, Topics
A A
0
Share to FacebookShare on TwitterShare to LINE

During Google Cloud Next'26 in Las Vegas this year, in addition to the keynote address, Google's infrastructure team provided a deeper dive into the latest TPU development roadmap in a breakout session. Echoing the main theme of the event, "How to Design TPU Architecture for Cutting-Edge AI," Google further explained the updated TPU architecture.Hardware architecture details of the 8th generation TPUs (TPU 8t and TPU 8i)It also showcases the practical application results of the AI ​​team Decart in frontier world models.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8t and TPU 8i represent Google's biggest recent changes to its network infrastructure.

From an overall architecture perspective, Google has realized that a single chip architecture cannot simultaneously meet the needs of training massive models with mega-parameters and providing inference services with extremely low latency. Therefore, the 8th generation TPU adopts a more explicit dual-architecture offloading strategy and undergoes a complete overhaul of its network interconnection technology.

This is an advertisement.

Continuity and Breakthrough: Ironwood Architecture, 3D Torus, and Virgo Network

In this technical analysis, Google emphasized that the 8th generation TPU continues the Ironwood architecture that laid the foundation for previous generations and further enhances RDMA (Remote Direct Memory Access) performance. With the upgraded TPUDirect Storage technology, the transmission latency of data transfer between the chip and the storage cluster is minimized. For cutting-edge AI models that need to frequently read large datasets, this will significantly reduce the idle time of computing units "waiting for data".

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The 8th generation TPU continues the Ironwood architecture that laid the foundation for previous generations, and further enhances RDMA (Remote Direct Memory Access) performance. Combined with the upgraded TPUDirect Storage technology, it minimizes the latency of data transfer between the chip and the storage cluster.

Furthermore, the TPU 8t incorporates a Large Language Model (LLM) decoding engine in its SparseCore collaborative computing core, increasing its arithmetic strength by up to 30 times. This results in a 5-fold increase in the computational efficiency of models such as DLRM DCN v2. The TPU 8i utilizes SparseCore as a collective acceleration engine, coupled with integrated 384MB of SRAM (Static Random Access Memory) for key-value caching. This enhances short-term memory during inference in large AI models, reducing computational power and memory consumption when processing repetitive data, while also improving inference efficiency.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8t features a Large Language Model (LLM) decoding engine in its SparseCore co-computing core, increasing its arithmetic strength by up to 30 times, thereby improving the computational efficiency of models such as DLRM DCN v2 by 5 times.
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8i uses SparseCore as a collective acceleration engine, combined with 384MB of integrated SRAM (Static Random Access Memory) for key-value caching. This enhances short-term memory during large AI model inference, reducing computational power and memory consumption when calculating duplicate data, while also improving inference efficiency.

In terms of scaling, Google adopts a two-pronged strategy: vertical and horizontal scaling.

• Vertically increasing computing power (Scale-Up):Within a single Pod, Google still uses the mature and efficient 3D Torus network topology architecture, and through ICI (Inter-Core Interconnect) interconnect technology that doubles the bandwidth again, adjacent TPU chips can exchange data quickly with ultra-high bandwidth and extremely low latency. This allows all TPUs in a single cluster to operate like a giant chip, which is very suitable for handling highly coupled computing tasks.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲ It adopts a mature and highly efficient 3D Torus network topology architecture, and through ICI (Inter-Core Interconnect) interconnect technology that doubles the bandwidth, adjacent TPU chips can exchange data quickly with ultra-high bandwidth and extremely low latency. This allows all TPUs in a single cluster to operate like a giant chip, making it very suitable for handling highly coupled computing tasks.

• Horizontal scaling (Scale-Out):To overcome the physical limitations of a single Pod and build data center-level computing power through a concatenated approach, sufficient to train next-generation cutting-edge AI models, Google detailed its new Virgo Network technology. Virgo Network is designed for massive pools of computing power across clusters and even data centers, ensuring stable, high network throughput and fault tolerance even when performing distributed computing with tens of thousands of TPUs.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲Google details the new Virgo Network technology. Virgo Network is designed for massive pools of computing power across clusters and even data centers, ensuring stable, high network throughput and fault tolerance even with distributed computing across tens of thousands of TPUs.
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲In addition to pre-training with the TPU 8t, the inference process will continue to be trained again with the output results, thereby promoting positive learning and growth.

Dual architectures: TPU 8t specializes in extreme training, while TPU 8i optimizes inference cost.

Google's past TPU designs mostly attempted to achieve a balance between training and inference (such as the previously released v4 or Trillium, and even the previous generation Ironwood). However, with the maturity of the generative AI industry, workloads have become significantly differentiated, making it necessary to completely separate "training" and "inference" in order to achieve greater efficiency during "training" and greater cost-effectiveness during "inference". Therefore, unlike in the past, it is no longer possible to simply use the same TPU with different memory and other architectural settings to handle both "training" and "inference" tasks at the same time.

• TPU 8t (training):The "t" at the end represents training. This architecture is designed to handle the pre-training of massive language models and multimodal models, featuring maximum-capacity HBM high-bandwidth memory and maximizing the peak computing power of the matrix multiplication unit (MXU). Its design philosophy is to pursue computational density and memory bandwidth at all costs, thereby shortening the training cycle of cutting-edge models.

This is an advertisement.
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲To handle the pre-training of massive language models and multimodal models, the TPU 8t is equipped with the largest capacity HBM high-bandwidth memory and maximizes the peak computing power of the matrix multiplication unit (MXU).
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8t offers significant improvements in computational accuracy and quantization performance compared to its predecessor, the "Ironwood" TPU.
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8t can provide up to 121 EFlops of computing power in a single Pod computing cluster, and can connect more than 100 million TPUs in a single training cluster. Compared with the previous generation "Ironwood" TPU, it can improve performance per dollar by 2.7 times and performance per watt by 2 times.

• TPU 8i (inference):The "i" at the end represents inference. This architecture abandons some complex instruction sets dedicated to training, investing chip area in larger SRAM static random access memory caches and higher-speed I/O throughput. The goal is to provide the lowest first-word latency (TTFT) and the highest data throughput during the model deployment phase. It even uses HBM3e high-bandwidth memory (which provides higher data transfer bandwidth compared to the HBM3 high-bandwidth memory used in the TPU 8t), while significantly reducing the operating cost per API call.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8i's computing architecture integrates 384MB of SRAM (Static Random Access Memory) and utilizes HBM3e high-bandwidth memory with higher transmission bandwidth to improve inference efficiency while significantly reducing the operating cost per API call.
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The TPU 8i adopts a brand-new "Boardfly topology" that can directly connect 1152 TPU 8i units in a single compute pod.
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲This allows massive amounts of key-value (KV) cache data to be stored entirely in memory, achieving near-zero latency execution speeds and increasing the performance per dollar of inference operations by 80% compared to the previous generation "Ironwood" TPU.

The Moat of the Software Ecosystem: The Advantages of PyTorch on TPU

Of course, even the most powerful hardware is useless without a supportive developer ecosystem. Faced with the formidable ecosystem barriers erected by NVIDIA CUDA, Google has dedicated significant space to analyzing the design advantages of the upcoming PyTorch on TPU.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The open-source PyTorch can now run on TPUs, meaning that more PyTorch-based applications and services will be able to run faster via TPUs.

By continuously optimizing the XLA (Accelerated Linear Algebra) compiler, Google has enabled PyTorch developers to seamlessly transfer models that originally ran on GPUs to TPU computing clusters with "zero code modification" or "minimal modification," reducing translation costs when migrating between different computing tasks.

PyTorch/XLA can now automatically translate dynamic graphs into static graphs that TPUs can efficiently execute. In addition, it supports advanced technologies such as Automatic Mixed Precision Training (AMP) and Fully Split Data Parallel (FSDP), allowing startups to easily migrate existing PyTorch projects to TPU 8t for large-scale training.

A review of the history of TPU development

Looking back at the development of Google TPU, its architectural evolution is almost a microcosm of the history of modern AI development:

• TPU v1 (2015):Focusing on inference, it accelerates AlphaGo and its internal search capabilities.

• TPU v2/v3 (2017-2018):By incorporating floating-point operations and HBM high-bandwidth memory design, we have officially entered the field of model training and proposed the concept of Pod clusters.

• TPU v4 (2021):The introduction of optical circuit switches and 3D Torus architecture establishes a milestone in exaflop-level computing power.

This is an advertisement.

• TPU v5e / v5p (2023):This is the first attempt to divide the product line into two categories: one focusing on cost-effectiveness (v5e) and the other on extreme performance (v5p), laying the groundwork for future development of separate tracks. However, at this stage, the product line is still differentiated by different memory configurations, and is essentially still based on the same TPU design.

• TPU v6 "Trillium" (2024):Fully embracing generative AI significantly improves energy efficiency and memory bandwidth, and it also possesses both "training" and "inference" capabilities, which became the design basis for the subsequent "Ironwood".

• TPU v8 8t / 8i (2026):The dual-architecture (t/i) strategy for training and inference was formally established, and the horizontal scaling of computing power was pushed to a whole new level through the Virgo Network.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲The development history of Google TPU over 10 years
Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲Currently, TPU-accelerated computing is not only used for Google's own services, but is also provided to various third-party businesses through cloud platforms.

Decart Frontier World Model Applications

In the second half of this session, the conference also specially invited...Decart, a well-known AI teamThey shared their results of running a cutting-edge world model on an 8th generation TPU.

Decart points out that world models require handling extremely complex physical laws and time series generation, which places extremely stringent demands on memory bandwidth and interconnect latency. Through the TPU 8t's ICI interconnect technology and TPUDirect Storage data processing method, Decart successfully minimized data loading bottlenecks and achieved a near-real-time interactive reasoning experience on the TPU 8i, fully demonstrating the practical value of Google's dual-architecture design and fundamental network technology upgrade.

The reason for using Decart's cutting-edge world model results to illustrate the performance of the 8th generation TPU is clearly in response to the previous "Google's TPUs struggle to handle workloads like Decart that require real-time rendering.In response to criticisms such as "...", Google emphasized that the current TPU operation can be fully supported by PyTorch on TPU design, which is compatible with computing projects built with PyTorch. This also means that Google will continue to collaborate with many open source computing companies to optimize applications.

Further disassembly of the architecture of the 8th generation TPUs: How do the twin-design TPU 8t and TPU 8i support cutting-edge AI and world models?
▲Decart points out that world models need to handle extremely complex physical laws and time series generation, which places extremely stringent requirements on memory bandwidth and interconnect latency.
Tags: AIGoogleGoogle CloudGoogle Cloud NextGoogle Cloud Next 2026TPUTPU 8iTPU 8tArtificial wisdominferenceTraining
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Post a responseCancel Reply

This site uses Akismet service to reduce spam.Learn more about how Akismet processes website visitor comments.

Translation (Tanslate)

Recent updates:

Sony is bringing its latest professional broadcasting technology from NAB 2026 to Taiwan, partnering with Infinite Image to create a brand-new 4K flagship broadcasting van.

Sony is bringing its latest professional broadcasting technology from NAB 2026 to Taiwan, partnering with Infinite Image to create a brand-new 4K flagship broadcasting van.

2026-06-10
Logitech launched its first wireless folding mouse, Mobi, along with the Spotlight 2 presentation pen and World Cup-themed keycaps.

Logitech launched its first wireless folding mouse, Mobi, along with the Spotlight 2 presentation pen and World Cup-themed keycaps.

2026-06-10
Netflix is ​​reshaping the mobile experience in the Asia-Pacific region, expanding its entertainment vision with vertical video walls, AI technology, and a gaming ecosystem.

Netflix is ​​reshaping the mobile experience in the Asia-Pacific region, expanding its entertainment vision with vertical video walls, AI technology, and a gaming ecosystem.

2026-06-10
mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
×

You are about to be redirected to an external website.

The link you clicked will open an external webpage:

In reciprocal calculation...
×

Want to take a break? We recommend the following content:

  • • The HDMI Association will mandate certification for Ultra HDMI 2.1 cables.
  • • Users can finally use Google Pay to make payments via the web; it's also available on iOS and Windows.
  • Netflix not only excels in video streaming technology but also independently develops multiple network security technologies.

You can return by swiping the page or clicking anywhere.

No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com