Gemini 3.5 Flash Integrates Native PC Control, Streamlining Enterprise Agent Architecture
2026-06-25 11:34

Woofun AI reports that Google has natively integrated the Computer Use feature into the Gemini 3.5 Flash model, allowing developers to control devices directly through the Gemini API or Google Cloud Gemini Enterprise Agent Platform without invoking specialized proxy models. This integration streamlines agent development architecture by leveraging screen captures from browsers, mobile, or desktop environments for visual perception and step reasoning, subsequently outputting operation commands such as mouse clicks, keyboard inputs, and menu navigation to automate tasks like software regression testing and cross-page data collection. To facilitate debugging, the model appends an "intent" field to explain the logic of each generated command.

To mitigate prompt injection risks in real network environments, Google implemented targeted adversarial training and introduced two optional protections: mandatory human approval for irreversible operations involving fund transfers or file deletions, and automatic task termination upon detecting indirectly injected instructions in screenshots. Browserbase provides an online hosted demo environment at gemini.browserbase.com, while Google has open-sourced the reference implementation code named computer-use-preview on GitHub.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Gemini 3.5 Flash
Gemini 2.5 Computer Use
Gemini API
Google Cloud Gemini Enterprise Agent Platform
Vertex AI platform
Browserbase
computer-use-preview
Google
Share:
back