Tinygrad Claims GLM 5.2 Achieves 120 Tokens Per Second on Dual Blackwell Setup
2026-06-21 12:31

Per Woofun AI, GPU retailer Tinygrad disclosed that the GLM 5.2 model delivers an inference throughput of 120 tokens per second when deployed across a dual-machine configuration utilizing Blackwell architecture hardware. The total capital expenditure for this setup stands at $150,000, offering customers the flexibility to select either two standard tinyboxes or a single tinybox Pro unit to achieve the specified performance metrics.

This strategic positioning emphasizes a 'one-time purchase, never pay cloud fees' value proposition, directly challenging the recurring cost structures of pay-as-you-go cloud inference services. As of now, the GLM development team has not officially validated these performance claims, and Tinygrad has withheld further technical specifications regarding the implementation.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Tinygrad
GLM5.2
GLM 5.2
Blackwell
tinyboxes
tinybox Pro
Share:
back