Alibaba earlier unveiled its most powerful flagship inference model to date."Qwen3-Max-Thinking"This behemoth, with parameters exceeding trillions, not only claims to have defeated OpenAI's GPT-5.2-Thinking and Google Gemini 3 Pro in multiple authoritative evaluations, but also achieved the highest score in the world in the HLE evaluation, which is regarded as the "last test for mankind".
HLE's evaluation score was a whopping 10 points higher, with reasoning ability being its biggest highlight.
The biggest selling point of Qwen3-Max-Thinking lies in its powerful "deep reasoning" capabilities. According to data released by Alibaba, the model uses up to 36T of pre-trained data and performs excellently in 19 recognized benchmark tests.
Most notably, in the HLE benchmark, Qwen3-Max-Thinking achieved a high score of 58.3, significantly outperforming GPT-5.2-Thinking (45.5) and Gemini 3 Pro (45.8). In the field of AI benchmarking, a difference of more than 10 points is generally considered a "generational" lead, demonstrating its strength in solving complex mathematical, logical, and multi-step tasks.
The exclusive "experience extraction" mechanism makes AI smarter and smarter.
Why is it so powerful? The core lies in Alibaba's new "Test-time Scaling" mechanism.
Unlike traditional methods that simply increase the number of reasoning paths (brute force), Qwen3-Max-Thinking incorporates "experience extraction" technology. It can identify and remove redundant logical paths, concentrating computing power on the most valuable branches of thought, significantly improving reasoning efficiency and reducing enterprise application costs.
In addition, for AI agent applications, the new model enhances the ability to autonomously call tools, no longer just through simple dialogue interaction, but can proactively determine when to search online, when to write code or consult knowledge bases, greatly reducing the model "illusion" problem.
Surpassing Llama to become the new global open source leader
Beyond its technological breakthroughs, Qwen's achievements in the open-source ecosystem are also remarkable. According to Hugging Face data, the number of derivative models based on Qwen has exceeded 20, with a cumulative download volume of over 10 billion and an average daily download volume of 110 million. This means that Qwen has surpassed Meta's Llama series to become the preferred open-source large model base for developers worldwide.
Currently, developers can experience Qwen3-Max-Thinking for free on QwenChat, while enterprise users can call the API through Alibaba Cloud's Bailian platform.
Analysis of viewpoints
The emergence of Qwen3-Max-Thinking proves that China has the strength to stand on equal footing with, or even surpass, Silicon Valley giants in the "Reasoning Models" field.
Of particular note is the application of "test-time scaling" technology. In the past, we believed that the capabilities of AI depended mainly on the scale of "pre-training," but now the battleground has shifted to computational efficiency during "inference-time."
Alibaba improves efficiency through optimized thinking, which is crucial for commercial implementation because businesses need AI that is both smart and inexpensive, not experimental products that only burn through money.
On the other hand, Qwen's dominance in the open-source community is building an unfathomable moat for Alibaba. As millions of developers worldwide become accustomed to using the Qwen architecture to develop applications, this will, in turn, drive the growth of Alibaba Cloud's infrastructure. Just like Android in its early days, whoever controls the developer ecosystem controls the discourse in the AI era.




