Amidst the fierce competition among AI giants for advantages in inference cost and speed, Google has announced a new lightweight model."Gemini 3.1 Flash-Lite"This new generation model, emphasizing "ultra-fast and energy-efficient," is positioned by Google as the fastest and most cost-effective option in the Gemini 3 series, specifically designed for developers' large-scale, high-throughput workloads. With its superior performance and ultra-low latency compared to its predecessor, the 2.5 Flash, Gemini 3.1 Flash-Lite is poised to ignite a new wave of computing power revolution in enterprise applications and the API economy market.
Starting today, developers can obtain the 3.1 Flash-Lite preview version through the Gemini API of Google AI Studio, and enterprise users can also deploy and apply it on the Vertex AI platform simultaneously.
Market-shaking pricing and noticeable acceleration
In commercial applications, "cost" and "latency" are often the two biggest pain points for developers. Gemini 3.1 Flash-Lite has adopted a highly aggressive pricing strategy:
• Input tokens:Only $0.25 per million tokens.
• Output tokens:Only $1.50 per million tokens.
Besides its affordable price, speed is its biggest selling point. According to Artificial Analysis's benchmark tests, while maintaining the same or even higher generation quality, the Gemini 3.1 Flash-Lite's Time to First Token (TTFT) is 2.5 times faster than the 2.5 Flash, and its overall output speed is also increased by 45%.
Google emphasizes that this ultra-low latency is essential for high-frequency workflows, making it an ideal model for creating "instantly responsive experiences".
Cross-level inference and multimodal capabilities
Don't assume that having a "Lite" suffix means it's not smart enough. On the authoritative Arena.ai leaderboard, the Gemini 3.1 Flash-Lite achieved an impressive score of 1432.
Even more remarkably, in several benchmark tests that test logical reasoning and multimodal understanding, Gemini 3.1 Flash-Lite outperformed its competitors in the same class, and even "outperformed" previous, larger models (such as 2.5 Flash).
Introducing "Thinking Level" control allows for flexible handling of complex tasks.
To enable developers to more precisely control computing costs, Gemini 3.1 Flash-Lite comes standard with a highly practical new feature in AI Studio and Vertex AI – “Thinking Levels”.
This mechanism allows developers to flexibly adjust the model's "deepest thinking" for specific tasks. When faced with high-volume tasks that are extremely cost-sensitive (such as massive text translation or content moderation), the thinking level can be lowered to pursue ultimate speed; while when dealing with complex logic (such as generating UI interfaces, creating simulation environments, or following complex multi-step instructions), the thinking level can be raised to ensure accuracy. Early testers, including Latitude, Cartwheel, and Whiring, have stated that Gemini 3.1 Flash-Lite can handle complex inputs with near-large model accuracy and exhibits extremely high consistency in instruction adherence.



