Gemini 2.5 Flash now available for developers to run AI models with lower latency and higher cost-efficiency
The Gemini 25 Flash AI model, which boasts lower latency and higher cost-effectiveness and was recently previewed at Google NEXT'2.5, is now available for testing by developers. Compared to the Gemini 2.5 Pro, which can handle up to 1 million terms and perform deep data analysis, provide key insights in specific professional fields, or perform complex coding tasks after understanding the entire codebase—making it Google's most powerful AI model to date—the Gemini 2.5 Flash offers lower latency and lower operating costs. It is expected to become the primary model for most application services while maintaining a certain level of accuracy, making it suitable for creating interactive virtual assistants or real-time content summarization tools. Furthermore, Gemini 2.5 Flash possesses dynamic and controllable reasoning capabilities, automatically adjusting processing time based on the complexity of the question (which can be considered a "thinking budget" consideration). It responds faster to questions with simple answers, and developers or businesses can set usage costs to adjust response speed and accuracy according to actual needs, thereby making service operating budgets more efficient. Developers can adjust the number of words generated during Gemini 2.5 Flash's "thinking" process using Google AI Studio or the Vertex AI platform. A lower number of words results in faster response times, while a higher number requires more time to "think," leading to higher processing costs. As for the knowledge content available in Gemini 2.5 Flash, as of January this year, it supports multimodal input including text, images, video, and audio, but only outputs text content. It is positioned to replace the original Gemini 2.0 Flash Thinking.

