Prime Intellect Releases Prime-RL 0.6.0 Enabling Trillion-Parameter RL Training on 28 Servers
2026-06-23 20:33

Prime Intellect reports the deployment of prime-rl version 0.6.0, a distributed reinforcement learning framework that has breached the training threshold for trillion-parameter mixed-expert models in super-long-context agent tasks. This release enables the training of GLM-5 with 131k context lengths using only 28 H200 servers, maintaining single-step processing times below five minutes, a significant reduction from previous requirements of thousands of GPUs.

The framework implements a fully decoupled asynchronous RL architecture to mitigate GPU idle time during complex code generation, allowing real-time weight updates without waiting for trial task completion. To resolve logic confusion from asynchronous updates, Routing Replay (R3) technology stabilizes expert data distribution, reducing discrepancies between training and inference to one-tenth.

Additionally, the system leverages Mooncake technology to aggregate idle memory into a shared cache pool and employs DeepGEMM with block scaling FP8 training to eliminate precision deviation crashes, ensuring efficient resource utilization across distributed clusters.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Prime Intellect
prime-rl
GLM-5
Mooncake
DeepGEMM
DeepSeek V3
Share:
back