Login
Sign Up
Woofun AI reports that Coinbase has reduced artificial intelligence expenditures by nearly 50% by establishing open-weight models, specifically GLM 5.2 and Kimi 2.7, as the standard configuration through its LLM gateway. CEO Brian Armstrong emphasized that this strategy prioritizes optimizing default model selection, routing logic, and caching mechanisms over implementing usage friction or spending alerts, noting that 91% of employees remain well within established limits.
The company employs a custom pipeline to preprocess prompts and route tasks based on cache hit rates and pricing efficiency, distinguishing between planning phases requiring advanced models and execution phases suited for cost-effective alternatives. By maximizing cache awareness, Coinbase increased LibreChat’s cache hit rate from 5% to 60%, while mandating concise context management practices such as session resets and tool disconnection to support scalable infrastructure.