Login
Sign Up
Per Woofun AI, Alibaba Big Model Team has deployed the Qwen-Robot Suite, an embodied intelligence base model collection designed to align visual-language capabilities with multi-domain physical actions. The suite comprises three specialized models: Qwen-RobotNav for navigation, Qwen-RobotManip for manipulation, and Qwen-RobotWorld for world simulation, collectively enabling multi-tasking and generalization across diverse robotic platforms.
Qwen-RobotNav, trained on 15.6 million samples, parameterizes visual attention strategies to dynamically adjust token budgets during inference. It has achieved state-of-the-art performance in five navigation domains and supports zero-shot deployment on the Yushu Go2 quadruped robot.
Concurrently, Qwen-RobotManip utilizes a Qwen3.5-4B VL backbone and flow-matching DiT action head, processing over 38,100 hours of training data to attain a 91.4% success rate in LIBERO-Plus evaluations.
Meanwhile, Qwen-RobotWorld employs a 60-layer dual-stream MMDiT structure to couple semantic representations with video latent variables, ranking first in physical law compliance benchmarks like EWMBench after training on 8.6 million video-text pairs. All models operate via a language-first interface, integrated within the Qwen-RobotClaw framework to allow upper-level planners to execute multi-step physical operations.