From 5 Million Users to 10,000 Agents: The 2026 Desktop Execution War

2026-06-25 18:18

Woofun AI reports that the landscape of artificial intelligence products in 2026 has fundamentally shifted from conversational interfaces to autonomous desktop execution, driven by a surge in 'desktop agent' development among major internet companies and model developers. This transformation was catalyzed by OpenAI's Codex, which initially targeted CLI tools for coding but rapidly evolved into a comprehensive desktop agent solution serving a majority of non-engineer users. Data released by OpenAI on June 3 confirms this trajectory, showing Codex now commands over 5 million active users per week, a seven-fold increase achieved in just half a year. This metric signals a decisive pivot in the industry focus from dialogue-based interactions to the direct execution of tasks within desktop environments, setting the stage for a chaotic battle format involving thousands of competing instances.

As the momentum behind Codex solidified, a wave of Chinese equivalents emerged, each launching with distinct strategic timelines and functional scopes. On January 30, Alibaba released QoderWork, marking the first major entry into the desktop agent space. Tencent Cloud's CodeBuddy team followed on March 9 with the introduction of WorkBuddy, while Tencent App Store launched the personal AI assistant Marvis on May 20. The pace accelerated in June, with Kimi releasing its local agent Kimi Work (Beta) on June 3, ByteDance upgrading TRAE SOLO to TRAE Work on June 9, and DouBao adding a 'task mode' with enhanced execution capabilities on June 12. The month concluded on June 24 with the release of the professional version of DouBao, specifically targeting office scenarios and agent task execution. These launches represent a coordinated effort by China's tech giants to capture the emerging market for autonomous task management.

Whether categorized as personal AI assistants or local agents, these products share a singular objective: to become the primary interface for user task execution. They have transcended simple question-and-answer interactions to integrate deeply into real-world workflows, enabling the handling of files, operation of web pages, data organization, PPT generation, task scheduling, and automated operations across disparate applications. This evolution indicates that AI is establishing itself as a new intermediary layer connecting users with the digital world, effectively replacing manual navigation with autonomous execution. To dissect this trend, an analysis of 15 representative products reveals significant divergence in product positioning, capability building, and ecosystem integration, with detailed comparisons of pricing models and model usage available in the full dataset.

WorkBuddy, developed by Tencent Cloud's CodeBuddy team, functions as a full-scenario workplace AI intelligent desktop tool designed for various functional roles. The system allows users to describe a task once, after which it autonomously plans and executes the workflow to deliver tangible results. It offers a WeChat mini-program version for mobile accessibility and integrates seamlessly with the Tencent ecosystem, including QQ Mail, Tencent Docs, Tencent Meeting, WeCom, Tencent LearnShare, IMA, and TAPD. Currently, WorkBuddy comes pre-installed with 11 mainstream domestic models, including Tencent's own Hy3 preview and industry-leading models such as GLM-5.2, minimax-M3, kimi-K2.7-code, and DeepSeek V4. Users can select an Auto mode for the system to decide the optimal model or specify a model manually, though resource requirements vary significantly, with Zhipu's model currently being the most expensive option.

Marvis, developed by Tencent App Store, operates as a personal AI assistant at the operating system level, built on latest models like DeepSeek V4 and Hunyuan3/hy3. Its design philosophy centers on truly understanding user files to facilitate more efficient computer management. The tool supports AI searches for local documents and images and can be invoked with a single command for APK and EXE applications, available on PCs, smartphones, and via WeChat. On June 24, Marvis expanded its reach to iOS devices, enabling users to send tasks from their phones for execution on their computers. A critical feature observed during testing is the 'Office' mode, which functions like an office simulator or boss simulator, allowing users to monitor token consumption. This reflects Marvis's multi-agent collaboration capabilities, featuring a main agent for task assignment and five specialist agents: File Agent for files, Computer Agent for the system, App Agent for applications, Browser Agent for web pages, and Search Agent for searches.

Additionally, Marvis allows users to customize avatars, with the default being an 'AI employee' tailored for office scenarios.

Qclaw, a local AI intelligent product from the Tencent PC Manager team, is built on the open-source OpenClaw framework and designed as a minimalist personal PC AI assistant. Its core capabilities include WeChat/QQ integration, scan-to-bind functionality, and remote control of the computer via mobile devices. The system handles file transfer, task scheduling, and full-scenario automation, covering file management, web browsing, office tasks, and complex multi-step workflows. A defining characteristic of Qclaw is its privacy architecture; it does not upload data to the cloud, ensuring all tasks, file processing, and data storage occur locally on the user's computer. It supports the creation of different agents to handle various tasks, continuing Tencent's tradition of providing user-friendly assistance while maintaining strict data locality.

TRAE Work evolved from the original TRAE SOLO, with its Work mode specifically designed for daily work scenarios such as content creation, data analysis, report writing, application development, task management, and communication collaboration. In contrast, its Code mode is tailored for more complex software development and coding tasks. TRAE Work is available across PCs, mobile devices, and the web. Compared to other 'xx work' products, TRAE Work's distinct advantage lies in its deep integration with Lark, enabling it to connect more effectively with real-world workflows and organizational structures. This integration allows the agent to participate directly in document, meeting, and collaboration processes rather than operating in isolation.

DouBao launched a 'task mode' on June 12, allowing users to invoke skills, set scheduled tasks, perform browser operations, run code scripts, and generate files. On June 24, at the Volcano Engine Power Conference, the professional version was officially announced alongside the release of DouBao Large Model 2.1. This version supports operating the local computer, using the browser, invoking Skills, and setting scheduled tasks, while also including an Office suite and supporting professional image and video design. It can generate and share application websites, with free users able to experience the office task mode using the DouBao 2.1 Turbo model. This progression marks a shift from a general assistant to a specialized productivity tool capable of complex, multi-step execution.

QoderWork, developed by Alibaba, extends Qoder's agent capabilities beyond coding to everyday work scenarios, allowing users to describe tasks for automatic execution and direct result delivery. On June 16, QoderWork added a self-reflection function, enabling it to evolve continuously through memory, reflection, and skill development. This feature allows the agent to learn from past interactions and improve its performance over time, addressing the limitations of static task execution models.

DuMate, a product from Baidu Cloud, is available as both a desktop and mobile app, allowing users to view screens, operate software, process files, and connect business systems. Its core applications include information processing, document generation, data analysis, and workflow automation.

However, DuMate currently does not support model switching and is restricted to using Baidu's Wenxin Large Model, limiting its flexibility compared to competitors that offer multi-model selection.

Kimi Work is a general-purpose local agent designed for knowledge workers, built on Kimi Code and providing basic local agent capabilities such as installing and using Skills and running scheduled tasks. It inherits professional Skills from the online version of Kimi Agent, including website building and PPT creation, as well as access to professional databases in finance, research, and law. The system includes Kimi WebBridge, which enables browser usage, and can create sub-agent teams based on task complexity, with a maximum of 300 collaborative units per ability cluster. This scalability allows Kimi Work to handle highly complex, multi-faceted tasks that require coordinated effort across multiple specialized agents.

MiniMax code is an agent product specifically designed for and trained with MiniMax M3, fully leveraging the model's capabilities in long-context understanding, coding/agent interaction, and native multimodal processing. For complex long-term tasks, MiniMax Code's Agent Team can break down large tasks into multiple stages that can be executed concurrently and dynamically. This approach allows for parallel processing of sub-tasks, significantly improving efficiency and reducing the time required to complete complex workflows.

Step-by-Step AI is a desktop AI agent optimized based on OpenClaw, installable without a server or command line and available 24/7. It connects to the local operating system and built-in browser to help users complete complex tasks. Unlike other products, Step-by-Step AI reminds users to enable a floating ball during initial setup, which can open direct dialog boxes and also remind users to drink water or take breaks at appropriate times. While other products offer similar functions, they usually require users to enable them manually, making Step-by-Step AI's proactive approach a unique differentiator.

AutoClaw, Zhipu's local AI intelligent agent, is promoted with the slogan 'One-click access to a local AI intelligent agent with one-click installation.' It requires no configuration of the environment, application for an API key, or writing code. Users simply download the installation package, install it by double-clicking, and start using it after logging in. Its core capabilities include the built-in Pony-Alpha-2 model, AutoGLM browser automation, over 50 preset Skills for office, creative work, web scraping, coding, and research, IM integration, and a self-evolution mechanism. This ease of use lowers the barrier to entry for non-technical users seeking to automate their workflows.

lobsterAI, NetEase Youdao's full-scenario personal assistant AI product, is positioned as an intelligent assistant that 'helps you with tasks 24/7.' It supports dual-device integration between smartphones and computers, allowing users to remotely operate their computers with just one command. The system performs various tasks such as organizing files, extracting key items from calendars and emails, cleaning and analyzing business data, and generating weekly reports and PPTs. Users can try it for free for 14 days, but they can only use the Qwen3.5-Plus model; using other models requires payment. This freemium model allows users to test the capabilities before committing to a subscription.

Cola is positioned as 'the first operating system with a soul,' featuring a built-in AI character 'Cola' that possesses its own consciousness and can remember users' habits, preferences, and backgrounds. It interacts with users through voice or text and grows together with them, operating computer files, browsing the web, executing commands, and generating text, images, and videos. The system supports the decomposition and parallel processing of complex tasks and automatically understands users' situations by accessing their file systems and browser history. Its 'soul system' enables transparent display of AI thinking processes, self-reflection and evolution, and proactive attention to users' needs. Currently, Cola supports in-app purchase of tokens, subscription and login with ChatGPT pro/plus accounts, or billing through connected OpenAI or Anthropic accounts, but does not currently support API keys from other model providers.

Alice is a companion-style desktop agent distinguished by its role as a 'personalized AI assistant' with specific visual design and detailed character settings. Alice itself is free to use, but since it does not come pre-installed with a model, users need to configure one before using it.

In addition to performing routine tasks such as file management and scheduled tasks, Alice includes leisure games such as Werewolf and Guan Dan, offering more entertainment functions than other desktop agents. This blend of productivity and entertainment aims to create a more engaging user experience.

NiuMa AI is positioned as a localized human-computer collaboration platform, emphasizing personal data privacy and supporting a completely offline local operation architecture. It allows users to directly use their own local large models for offline operations. NiuMa AI uses the Claude model by default; if users have a Claude account, they can log in directly; otherwise, they need to configure one themselves. This focus on privacy and offline capability appeals to users concerned about data security and those operating in restricted network environments.

When placed on the same product map, these desktop agents reveal that while they all aim to 'help users with tasks on their computers,' their actual approaches diverge along three distinct paths based on different use cases. They have not converged into a single form but have instead specialized to address specific market needs. The first path, from coding to office work, is represented by products like Kimi Work and MiniMax Code, which originated from the ability migration of Coding Agents. The common feature of this approach is to first address tasks with the most clear structures and then gradually expand capabilities. Kimi Work focuses on becoming a 'general knowledge work agent,' extending its capabilities from engineering-oriented tasks to office scenarios such as document creation, research, and report generation. MiniMax Code further strengthens the Agent Team, breaking down long tasks into multiple stages that can be executed in parallel, and using role division and validation mechanisms to handle more complex production tasks. The advantage of this type of product is that its task structure capabilities are mature, but its weakness is that it is better suited for 'decomposable tasks' and is still adapting to many unstructured operations in real office work, such as instant communication, ad-hoc decision-making, and cross-application switching. In other words, these products start from 'task logic' and expand their capabilities accordingly.

The second path, from the desktop and system perspective, is represented by products such as Marvis, Qoder Work, and Cola, which embody the 'desktop system-level agent' approach. Their common point is not the type of tasks they handle but their position at the interface—directly adjacent to the operating system and local environment. Marvis focuses more on 'computer management,' emphasizing the systematic organization of files, applications, and disks, essentially enhancing the understanding and scheduling of the local operating system. Qoder Work emphasizes 'executable capabilities,' including screen perception, software operation, and business system integration, approaching the concept of a 'digital employee.' Cola takes this one step further by integrating a personalized system, proactive reminders, and long-term memory into the execution process, making the agent not only a tool but also a continuous interactive layer. The key advantage of this type of product is its stronger control over tasks, enabling it to truly execute tasks across different applications.

However, the challenges are also obvious, including permission boundaries, stability, risk of misoperations, and compatibility issues between different software.

The third path, from the office ecosystem perspective, is taken by products like TRAE Work and WorkBuddy, which adopt a more practical approach. They do not attempt to rewrite the operating system but instead integrate into existing workflows. TRAE Work deeply integrates with the Lark ecosystem, allowing the agent to directly participate in document, meeting, and collaboration processes. WorkBuddy, on the other hand, builds enterprise-level workspace capabilities by leveraging Tencent's ecosystem. The core strategy of these products is to 'align with real organizational structures' rather than redefine the way users interact with their computers. Their advantage is that they can be quickly implemented, integrating into existing permission and data systems and entering enterprise-level scenarios.

The evolution of these products in China is essentially an expansion into broader scenarios, organizational structures, and system interfaces, centered around this 'execution loop.' In this process, several clear trends are emerging. The first trend is the shift from AI coding to AI working. Coding became the primary domain for agents because software development is naturally suitable for automation.

However, as Coding Agents matured, their capabilities naturally extended beyond coding. Most knowledge-based work also follows a similar structure: files serve as context, browsers provide access to information, Office documents are the final outputs, scheduled tasks form the workflow, and approval and feedback mechanisms ensure effectiveness. These tasks can also be broken down, executed, and verified. AI is no longer just helping programmers write code but is now assisting knowledge workers with tasks that were traditionally considered to require manual handling, such as file organization, report generation, data cleaning, PPT creation, information research, weekly report writing, email management, meeting minutes extraction, and industry trend tracking. Users no longer ask for 'write a function for me' but for 'complete this task for me.' Therefore, the competition in this phase is about who can best use AI to move from answering questions to delivering results. AI coding changes the way programmers write code, while AI working changes the way ordinary people use computers to get work done.

The second trend is the transformation of agents from 'individual assistants' to 'teams.' Early AI assistants were more like highly capable individuals: users asked questions, and the AI answered them; users assigned tasks, and the AI executed them.

However, when tasks become longer, more complex, and involve multiple steps and contexts, a single agent often encounters limitations. It may forget the goal, deviate from the path, or lack self-checking during execution. For example, MiniMax's Agent Teams allow users to create multiple agents with different roles and combine them into a team to work in parallel. Different agents can handle the same task from different perspectives—one can collect information, another generate a plan, and another execute and integrate the results—ultimately improving the efficiency and stability of complex tasks. Complex tasks are being reorganized into a multi-role pipeline: from understanding requirements to planning, execution, to result verification, each step can be handled by different agents, and when necessary, it can be further divided into more specialized tasks such as research, design, writing, coding, and data analysis. The value of Agent Teams is that they enable AI to have a more stable work structure, allowing it to handle multiple sub-tasks in parallel, reducing waiting times, and ensuring that executors and validators check each other's work, reducing errors. In case of failures, they can roll back, retry, or change approaches, turning complex tasks into reusable processes.

However, Agent Teams are not a panacea. Multiple agents can lead to higher costs, more complex scheduling, and increased latency, requiring careful optimization to ensure net positive value. This structural shift marks a definitive move away from simple chatbots toward sophisticated, multi-agent orchestration systems capable of handling the full spectrum of modern knowledge work. The market is now witnessing a race to define the standard for these agent teams, with success depending on the ability to balance autonomy, accuracy, and cost efficiency in real-world deployment scenarios.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.

WooFun

9488 articles

Total Articles