Google introduces Gemini 2.5 Computer Use model

COMMERCIAL NEWS

Google has launched the Gemini 2.5 Computer Use model, its new specialised model built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities that powers agents capable of interacting with user interfaces (UIs). 

It outperforms leading alternatives on multiple web and mobile control benchmarks, all with lower latency. Developers can access these capabilities via the Gemini API in Google AI Studio and Vertex AI.

"While AI models can interface with software through structured APIs, many digital tasks still require direct interaction with graphical user interfaces, for example, filling and submitting forms. To complete these tasks, agents must navigate web pages and applications just as humans do: by clicking, typing and scrolling. The ability to natively fill out forms, manipulate interactive elements like dropdowns and filters, and operate behind logins is a crucial next step in building powerful, general-purpose agents," said a Google blog.

How it works

The model’s core capabilities are exposed through the new `computer use` tool in the Gemini API and should be operated within a loop. Inputs to the tool are the user request, screenshot of the environment, and a history of recent actions. The input can also specify whether to exclude functions from the full list of supported UI actions or specify additional custom functions to include.

The model then analyses these inputs and generates a response, typically a function call representing one of the UI actions such as clicking or typing. This response may also contain a request for an end user confirmation, which is required for certain actions such as making a purchase. The client-side code then executes the received action.

After the action is executed, a new screenshot of the GUI and the current URL are sent back to the Computer Use model as a function response restarting the loop. This iterative process continues until the task is complete, an error occurs or the interaction is terminated by a safety response or user decision.

The Gemini 2.5 Computer Use model is primarily optimised for web browsers, but also demonstrates strong promise for mobile UI control tasks. It is not yet optimised for desktop OS-level control. - TradeArabia News Service

Get Noticed.

Send us your company’s news today and they could be featured on ABC’s Community News tommorow.