OpenAI continues to expand its model lineup, earlierAnnounceChatGPT has announced two new small-scale models: the GPT-5.4 mini and the GPT-5.4 nano. Designed for high-volume, low-latency applications, these models offer more than double the speed of their predecessors and significant upgrades in key capabilities such as inference, multimodal understanding, and tool usage. The GPT-5.4 mini is currently available to ChatGPT free users and Go subscribers, offering a near-perfect experience through the "Thinking" mode.Flagship GPT-5.4 modelThe level of intelligence.
Free version and Go users welcome a new "thinking" option.
Since OpenAI released GPT-5.4 earlier this month, this flagship model, positioned for professional programming and data analysis, has garnered significant attention. Now, it's taking this upgrade momentum to a wider user base. Starting today, ChatGPT free version and Go subscription users can simply select "Thinking" mode in the menu to use GPT-5.4 mini for conversations and task processing.
For paid subscribers such as Plus and Pro, the GPT-5.4 mini serves as a backup – when a user reaches their GPT-5.4 usage limit, the system automatically switches to the mini model to ensure uninterrupted service. OpenAI states that this tiered design will allow users with different needs to enjoy a smooth and efficient AI experience.
Performance upgraded across the board: speed doubled, reasoning ability approaching flagship level.
According to data released by OpenAI, the GPT-5.4 mini shows significant improvement over its predecessor, the GPT-5 mini, in several key metrics. In the software engineering benchmark SWE-Bench Pro, the mini model achieved a score of 54.4%, far exceeding the GPT-5 mini's 45.7% and approaching the flagship GPT-5.4's 57.7%. In the graduate-level scientific question-answering test GPQA Diamond, the GPT-5.4 mini achieved an even higher score of 88.0%, only slightly behind the GPT-5.4's 93.0%.
Beyond reasoning capabilities, multimodal understanding and tool usage are also key focuses of this upgrade. The GPT-5.4 mini can more accurately interpret non-text input such as images and audio, achieving a 72.1% score in the OSWorld-Verified computer usage task, significantly higher than the GPT-5 mini's 42.0%, and nearly identical to the GPT-5.4's 75.0%. This means that the mini model already possesses near-flagship practical value in real-world applications such as interpreting screenshots and navigating user interfaces.
More importantly, the GPT-5.4 mini runs more than twice as fast as its predecessor, which is crucial for applications such as programming assistants and customer service robots that require instant responses.
GPT-5.4 nano: Lightweight infrastructure designed specifically for developers
Compared to the general-purpose positioning of the mini model, the GPT-5.4 nano is designed entirely for developers and enterprises. This smallest and lowest-cost model from OpenAI is currently only available through the API and is recommended for tasks where speed and cost efficiency are paramount, such as data classification, information extraction, and sorting.
OpenAI's pricing strategy better reflects the positioning of the nano model: it costs only $0.20 per million input tokens and $1.25 per output token, which is about one-third of the cost of the GPT-5.4 mini and only about one-tenth of the cost of the flagship GPT-5.4. This allows developers to deploy AI agents on a large scale and at high frequency without worrying about costs spiraling out of control.
Intelligent agent architecture becomes the focus: division of labor and collaboration between large and small models
It's worth noting that OpenAI specifically emphasized the concept of "subagent" in this release. In complex workflows, developers can let large models like GPT-5.4 act as "planners," responsible for strategic thinking and task breakdown, and then delegate the specific execution work to multiple GPT-5.4 mini or nano models to handle simultaneously, such as searching code libraries, reviewing documents, or calling APIs.
This "division of labor between large and small models" architecture ensures the overall intelligence of the system while significantly improving efficiency and reducing costs. OpenAI's own Codex programming platform has already adopted this model, allowing developers to complete a large number of code editing tasks at one-third the cost of a flagship model.



