In addition to Intel announcing the computing power performance of its server GPU platform code-named "Ponte Vecchio" at the Hot Chips 34 annual high-performance computing technology conference, NVIDIA also announced theExplainDesigned in the form of a Superchip, it displays details through a computing platform built with the "Grace" CPU.
Although it was previously stated that the "Grace" CPU was built using TSMC's 5nm process, this time it was adjusted to the "N5" version that is an improvement on the 4nm process, which is an improved 4nm process.
As for the "Grace" CPU itself, it is designed based on the Arm Neoverse architecture and supports the Armv9 instruction set, which also integrates the SVE2 instruction set used by supercomputers. This means that the "Grace" CPU is built on the Neoverse N2 architecture code-named "Perseus" and supports PCIe Gen 5.0, DDR5, HBM3, CCIX 2.0, and CXL 2.0 data transmission.
The Superchip built with the Grace CPU can be composed of two sets of Grace CPUs to form a 144-core computing platform, or a Grace CPU combined with a Hopper GPU to form a heterogeneous computing platform. In terms of design, NVIDIA emphasizes that the Grace CPU is not intended to replace the x86 architecture CPU, but to provide a better computing platform.More flexibility.
To transfer data between Grace CPUs, or between Grace CPUs and Hopper GPUs, NVIDIA uses a network communication architecture called NVIDIA SCF (Scalable Coherency Fabric) as the data transfer medium between the CPU, memory, and I/O ports, providing a transmission bandwidth of up to 3.2 TB/s. In conjunction with the existing NVLink-C2C communication technology, it accelerates the transfer rate of computing data between different computing elements.
Each NVIDIA SCF cache partition corresponds to the 72 computing cores of the "Grace" CPU, has 117MB of L3 cache memory, and supports the Memory Partitioning and Monitoring (MPAM) function added to the Armv8.4 instruction set. It can also achieve up to four sockets of coherence through Coherent NVLink. Each CSN cache exchange point can correspond to two computing cores and two NVIDIA SCF cache partitions, and each can access LPDDR5 memory, NVLink-C2C, or PCIe/cNVLink.
另外,「Grace」CPU以每4組運算核心為1個運算叢集,總計達18個運算叢集,並且對應最高68組PCIe通道,以及4組PCIe 5.0 x16通道,透過PCIe 5.0 x16通道最高可對應128GB/s的雙向資料傳輸量,同時也對應12組Coherent NVLink與NVLink-C2C。
The use of LPDDR5X memory in the "Grace" CPU is mainly due to the relatively high cost of HBM memory. At the same time, considering the overall low power consumption and cost of LPDDR5X, it can be constructed with 32 channels for a total capacity of 512GB, corresponding to a transmission bandwidth of 546GB/s.
In terms of overall computing efficiency, although the "Grace" CPU is slightly inferior to AMD's third-generation EPYC server processor code-named "Milan", it has better power consumption performance and can also achieve better computing performance when paired with the "Hopper" GPU.


