Login
Sign Up
Per Woofun AI, GPU retailer Tinygrad disclosed that the GLM 5.2 model delivers an inference throughput of 120 tokens per second when deployed across a dual-machine configuration utilizing Blackwell architecture hardware. The total capital expenditure for this setup stands at $150,000, offering customers the flexibility to select either two standard tinyboxes or a single tinybox Pro unit to achieve the specified performance metrics.
This strategic positioning emphasizes a 'one-time purchase, never pay cloud fees' value proposition, directly challenging the recurring cost structures of pay-as-you-go cloud inference services. As of now, the GLM development team has not officially validated these performance claims, and Tinygrad has withheld further technical specifications regarding the implementation.