Qwen-AgentWorld Launches as First Native Language World Model Surpassing GPT-5.4 in Simulation Benchmarks
2026-06-24 11:49

Woofun AI reports that the Tongyi Q&A team has officially deployed Qwen-AgentWorld, marking the debut of a native language world model designed with environment modeling as its core training objective from the continuous pre-training phase. Unlike models adapted post-hoc, this architecture integrates text-based environments such as MCP, Search, Terminal, and SWE alongside GUI-based contexts including Web, OS, and Android within a unified framework. The model leverages over 10 million real-world interaction trajectories through a rigorous CPT, SFT, and RL three-stage training pipeline to facilitate robust cross-domain knowledge transfer.

Concurrently, the team introduced the AgentWorldBench evaluation benchmark, featuring test samples grounded in real-world execution data. In these assessments, Qwen-AgentWorld-397B-A17B demonstrated superior simulation quality, outperforming GPT-5.4, Claude Opus 4.8, and Gemini 3.1 Pro. The research highlights two distinct application pathways: utilizing the model as a decoupled environment simulator for controllable simulation RL, which significantly enhances agent behavior compared to real-environment-only training, and deploying it as a unified agent base model. This latter approach enables effective transfer to multi-turn agent tasks across seven benchmarks, including three unseen during training, without requiring additional RL fine-tuning. Both the model and benchmark are now accessible via Hugging Face and ModelScope.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Qwen-AgentWorld
AgentWorldBench
Qwen-AgentWorld-397B-A17B
GPT-5.4
Claude Opus 4.8
Gemini 3.1 Pro
Hugging Face
ModelScope
Share:
back