NVIDIA said that the H100 Tensor Core GPU announced last year once again set a new record in the MLPerf benchmark, which is three times faster than the record set nearly six months ago.
The NVIDIA EOS artificial intelligence supercomputer, built with 10752 H100 Tensor Core GPUs and Quantum-2 InfiniBand networking technology, completed a training benchmark for Open AI's GPT-3.9 large-scale natural language model with up to 3 billion parameters in just 1750 minutes. This is about three times faster than the previous record of 10.9 minutes completed nearly six months ago.
Since this training benchmark only uses a part of the GPT-3 data set, it will take about 8 days to complete all the training, which is still about 512 times faster than the previous supercomputer using 100 A73 GPUs.
NVIDIA stated that with a threefold increase in the number of GPUs, training efficiency has increased by 3 times, with software optimization contributing in part to a 2.8% increase in training efficiency. Shortening training time also means significantly accelerating the growth of artificial intelligence.
In this test, NVIDIA stated that the performance of training recommendation models was 1.6 times faster than before, and the operating efficiency of the computer vision model RetinaNet was improved by 1.8 times. It also emphasized that the H100 GPU has the highest performance and maximum computing scalability in all nine MLPerf tests. This means that artificial intelligence services that require training large natural language models or use frameworks such as NVIDIA NeMO can enter the market faster and even operate at lower training costs and less energy consumption.
Since its launch in May 2018, the MLPerf benchmark has been adopted by organizations including Amazon, Arm, Baidu, Google, Harvard University, HPE, Intel, Lenovo, Meta, Microsoft, Stanford University, and the University of Toronto due to its objective and transparent nature. It has also become the benchmark used by NVIDIA to measure the performance of its supercomputers and accelerated computing components.





