Google announces a preview of its latest generation of AI models「Gemini 2.5 Computer Use」This model not only understands text and images but can also actually "operate" web interfaces like a human. Through actions like clicking, scrolling, typing, and dragging, the AI can complete tasks without API connections, such as filling out forms, submitting data, or searching for information on the web. This technology allows AI to go beyond simply answering questions and directly "act."
AI simulates human operating interfaces, opening up new application scenarios
Google states that the Gemini 2.5 Computer Use model possesses "visual understanding and reasoning capabilities," enabling it to observe web content and perform actions based on user instructions. This allows AI to interact with web interfaces and other user interfaces without relying on API connections. Applications include UI testing, automated operations, data collection, and internal enterprise tool integration.
The model currently supports 13 types of operation commands, including opening web pages, entering text, clicking buttons, and dragging and dropping elements. Google pointed out that this feature does not yet support full desktop system control, but it performs better than its peers in multiple web and mobile operation benchmarks.
Extended from Project Mariner, it can automatically complete browsing tasks
Gemini 2.5 Computer Use is actually a previous research project of GoogleProject MarinerThis is an extension of the project, which has demonstrated that AI can autonomously complete complex tasks in the browser, such as automatically adding items to a shopping cart based on a list of ingredients.
This new version has been integrated into the Gemini platform and supports developers to access it on Google AI Studio and Vertex AI.
Competition heats up for AI "action agents" against OpenAI and Anthropic
Google's new announcement comes on the heels of OpenAI'sDev Day EventAnnounce onNew ChatGPT App and Agent FeaturesAfterwards, it was emphasized that AI can complete multi-step tasks autonomously. Anthropic also launched a computer-operatedClaude Computer Use Model.
Unlike competitors that allow AI to control the entire computer environment, Google emphasizes that Gemini is currently limited to browser-level operations, aiming to ensure both security and controllability. Even so, Google stated that the model still "surpasses other mainstream alternatives" in multiple real-world tests and will continue to be optimized to support more interactions and application scenarios.
AI moves from "talking" to "operating"
Gemini 2.5 Computer Use represents a new phase in generative AI's evolution from "language understanding" to "action capability." In the future, developers will be able to not only instruct AI to answer questions through commands, but also enable it to directly execute operational tasks.
Between human operation and automation, Google is clearly trying to create a new AI interaction model - making AI not just an assistant, but a virtual agent that can actually "get its hands dirty."




