• Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
2026/01/14 02:14 Wednesday
  • Login
mashdigi-Technology, new products, interesting news, trends
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
  • Topics
  • Artificial wisdom
  • Autopilot
  • network
  • Processor
  • 手機
  • exhibition activities
    • CES
      • CES 2014
      • CES 2015
      • CES 2016
      • CES 2017
      • CES 2018
      • CES 2019
      • CES 2020
    • MWC
      • MWC 2014
      • MWC 2015
      • MWC 2016
      • MWC 2017
      • MWC 2018
      • MWC 2019
    • Computex
      • Computex 2014
      • Computex 2015
      • Computex 2016
      • Computex 2017
      • Computex 2018
      • Computex 2019
    • E3
      • E3 2014
      • E3 2015
      • E3 2016
      • E3 2017
    • IFA
      • IFA 2014
      • IFA 2015
      • IFA 2016
      • IFA 2017
    • TGS
      • TGS 2016
  • About us
    • About mashdigi
    • mashdigi website contact details
No Result
View All Result
mashdigi-Technology, new products, interesting news, trends
No Result
View All Result
Home Market dynamics

NVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 Benchmarks
InferenceMAX v1 benchmark focuses on efficiency rather than pure speed

Author: Mash Yang
2025-10-10
in Market dynamics, network, Processor
A A
0
Share to FacebookShare on TwitterShare to LINE

Semiconductor industry analyst firm SemiAnalysis released InferenceMAX v1 benchmark test results showing that NVIDIA's Blackwell graphics architecture GPU swept all test items, setting a new benchmark in performance, energy efficiency, and overall economics.

NVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 Benchmarks

This new benchmark is considered the first independent evaluation that can truly reflect the total cost of AI inference (Total Cost of Inference), covering a variety of models and real-world application scenarios, focusing on "efficiency" rather than pure speed.

The AI ​​Factory Formula for a 15x ROI

The report indicates that if an enterprise invests $500 million to deploy an NVIDIA GB200 NVL72 system, it can generate up to $7500 million in DSR1 token revenue from AI applications, with a return on investment of up to 15 times. This means that inference performance is no longer just a technical indicator, but a key engine for enterprise operational profitability.

“Inference is at the heart of how AI delivers value every day,” said Ian Buck, vice president of Hyperscale and High-Performance Computing at NVIDIA. “Blackwell’s achievements demonstrate that our end-to-end approach enables customers to achieve both extreme performance and optimal efficiency when deploying AI at scale.”

NVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 Benchmarks

Blackwell Architecture: Dual-track Drive for Performance and Efficiency

In the InferenceMAX v1 benchmark, the Blackwell-based B200 GPUs achieved impressive performance across multiple models, achieving a throughput of 60000 tokens per second per GPU and up to 1000 TPS (Token per Second) per user. Compared to the previous-generation H200 GPU, this performance increased by 4 times, while the computational cost per million tokens was reduced by 15 times, achieving the industry's lowest cost per million tokens at just $0.02.

NVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 Benchmarks

NVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 Benchmarks

This performance is enabled by NVIDIA's new TensorRT-LLM v1.0 inference framework and NVLink Switch high-speed interconnect technology, which provides 1800 GB/s of bidirectional bandwidth, allowing up to 72 GPUs to operate as a single super GPU.

NVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 BenchmarksNVIDIA Blackwell Sets New AI Inference Performance Record: Wins Across All InferenceMAX v1 Benchmarks

Open Source Collaboration Advances the Inference Revolution

NVIDIA has also collaborated with several AI research teams, including OpenAI (gpt-oss 120B), Meta 9Llama 3 70B), and DeepSeek AI (DeepSeek R1), to optimize open-source inference performance. Furthermore, collaborative development with communities like FlashInfer, SGLang, and vLLM has enabled TensorRT-LLM to fully exploit the parallelization potential of Blackwell.

In addition, the newly released gpt-oss-120B-Eagle3-v2 model introduces "Speculative Decoding" technology, which can predict multi-word output and significantly reduce latency, thereby tripling the throughput per user.

The balance between economy and sustainability

InferenceMAX uses the "Pareto Frontier" model to evaluate the balance between performance, energy consumption, and responsiveness. The results show that Blackwell not only leads in throughput but also sets new records in energy efficiency and cost control. These include a 10x increase in throughput per megawatt compared to the previous generation, and a significant increase in word output per watt, reducing the energy burden on data centers.

Conclusion: Benchmarks in the AI ​​Factory Era

As AI evolves from single-shot generation to multi-step reasoning and toolchain integration, inference performance will directly determine the economies of scale of AI services. NVIDIA, through its Blackwell architecture, has successfully translated "performance" into "revenue," making the concept of the AI ​​factory a reality.

The debut of InferenceMAX is not only a technology showcase, but also a symbol that NVIDIA is leading the industry into a new era of the "Inference Economy."

Tags: AIBlackwellInferenceMAX v1NvidiaSemiAnalysisArtificial wisdom
ShareTweetShare
Mash Yang

Mash Yang

Founder and editor of mashdigi.com, and student of technology journalism.

Leave a comment Cancel reply

Your e-mail address Will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

mashdigi-Technology, new products, interesting news, trends

Copyright © 2017 mashdigi.com

  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Follow us

Welcome back!

Login to your account below

Forgotten Password?

Retrieve your password

Hãy nhập tên người dùng hoặc địa chỉ email để mở mật khẩu

Log In
No Result
View All Result
  • About mashdigi.com
  • Place ads
  • Contact mashdigi.com

Copyright © 2017 mashdigi.com