Recently previewed at Google NEXT'25, a lower latency, more cost-effective AI modelGemini 2.5 Flash, currentlyStart providing testing to developers.
Compared to Gemini 2.5 Pro, which can process up to 100 million words of content understanding and can perform in-depth data analysis, provide key insights in specific professional fields, or perform complex coding work after understanding the entire code, making it Google's most capable artificial intelligence model, Gemini 2.5 Flash provides lower latency execution efficiency and lower usage costs. It is expected to become the main usage model for most application services, while also maintaining a certain execution accuracy performance, making it suitable for creating interactive virtual assistants or real-time content summarization tools.
Gemini 2.5 Flash also features dynamic, controllable reasoning capabilities that automatically adjust processing time based on the complexity of the question content (which can be considered a "thinking budget"), enabling faster interactions for questions with simple responses. Developers or businesses can also set usage costs and adjust response speed and accuracy based on actual needs, allowing service operation budgets to be used more efficiently.
Developers can adjust the number of tokens generated by Gemini 2.5 Flash during "thinking" through Google AI Studio or the Vertex AI platform. Lowering the token count results in faster responses, while increasing it will require more time to "think," resulting in higher response costs.
As for the amount of knowledge in Gemini 2.5 Flash, it has been capturing content as of January this year, and also supports multi-modal input of text, images, videos, and audio, but the result can only output text content. At the same time, it is positioned to replace the original Gemini 1 Flash Thinking.








