Login
Sign Up
Woofun AI reports that Google has natively integrated the Computer Use feature into the Gemini 3.5 Flash model, allowing developers to control devices directly through the Gemini API or Google Cloud Gemini Enterprise Agent Platform without invoking specialized proxy models. This integration streamlines agent development architecture by leveraging screen captures from browsers, mobile, or desktop environments for visual perception and step reasoning, subsequently outputting operation commands such as mouse clicks, keyboard inputs, and menu navigation to automate tasks like software regression testing and cross-page data collection. To facilitate debugging, the model appends an "intent" field to explain the logic of each generated command.
To mitigate prompt injection risks in real network environments, Google implemented targeted adversarial training and introduced two optional protections: mandatory human approval for irreversible operations involving fund transfers or file deletions, and automatic task termination upon detecting indirectly injected instructions in screenshots. Browserbase provides an online hosted demo environment at gemini.browserbase.com, while Google has open-sourced the reference implementation code named computer-use-preview on GitHub.