Alibaba Deploys Qwen-Robot Suite for Zero-Shot Physical Action Alignment and Multi-Domain Generalization
2026-06-16 14:26

Per Woofun AI, Alibaba Big Model Team has deployed the Qwen-Robot Suite, an embodied intelligence base model collection designed to align visual-language capabilities with multi-domain physical actions. The suite comprises three specialized models: Qwen-RobotNav for navigation, Qwen-RobotManip for manipulation, and Qwen-RobotWorld for world simulation, collectively enabling multi-tasking and generalization across diverse robotic platforms.

Qwen-RobotNav, trained on 15.6 million samples, parameterizes visual attention strategies to dynamically adjust token budgets during inference. It has achieved state-of-the-art performance in five navigation domains and supports zero-shot deployment on the Yushu Go2 quadruped robot.

Concurrently, Qwen-RobotManip utilizes a Qwen3.5-4B VL backbone and flow-matching DiT action head, processing over 38,100 hours of training data to attain a 91.4% success rate in LIBERO-Plus evaluations.

Meanwhile, Qwen-RobotWorld employs a 60-layer dual-stream MMDiT structure to couple semantic representations with video latent variables, ranking first in physical law compliance benchmarks like EWMBench after training on 8.6 million video-text pairs. All models operate via a language-first interface, integrated within the Qwen-RobotClaw framework to allow upper-level planners to execute multi-step physical operations.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Qwen-Robot
Qwen-RobotNav
Qwen-RobotManip
Qwen-RobotWorld
Qwen-RobotClaw
Qwen3.5
Qwen2.5-VL
LIBERO-Plus
EWMBench
WorldModelBench
Yushu Go2
Share:
back