Login
Sign Up
Prime Intellect reports the deployment of prime-rl version 0.6.0, a distributed reinforcement learning framework that has breached the training threshold for trillion-parameter mixed-expert models in super-long-context agent tasks. This release enables the training of GLM-5 with 131k context lengths using only 28 H200 servers, maintaining single-step processing times below five minutes, a significant reduction from previous requirements of thousands of GPUs.
The framework implements a fully decoupled asynchronous RL architecture to mitigate GPU idle time during complex code generation, allowing real-time weight updates without waiting for trial task completion. To resolve logic confusion from asynchronous updates, Routing Replay (R3) technology stabilizes expert data distribution, reducing discrepancies between training and inference to one-tenth.
Additionally, the system leverages Mooncake technology to aggregate idle memory into a shared cache pool and employs DeepGEMM with block scaling FP8 training to eliminate precision deviation crashes, ensuring efficient resource utilization across distributed clusters.