Citigroup Reports AI Inference Demand Tightness Shifts Bottleneck From Chips To Power Infrastructure

2026-06-16 16:49

Per Citigroup, the persistent intensity of AI inference demand is causing computing power scarcity to spill over from latest-generation chips to previous-generation GPUs, compelling model vendors to accelerate monetization through pricing, quotas, and routing mechanisms. Data compiled by Woofun AI shows that A100 GPU rental prices increased by 0.6% in the past week, accumulating an 11% rise over six weeks, signaling that demand extends beyond the most advanced hardware.

Concurrently, cutting-edge models have significantly raised prices following improvements in intelligence scores, with Citigroup noting that 'scarcity is being monetized faster than it is being solved.'

The report highlights that no single vendor currently offers simultaneous advantages in intelligence, speed, and price. While the intelligence score of top models rose by approximately 4 points, overall prices nearly doubled.

Meanwhile, mid-range models improved output speed, with the median rate for the top 20 models climbing from 64 tokens/s to 105 tokens/s over six weeks.

Additionally, the capability gap between closed-source and open-source models widened, with proprietary models leading by around 10 points, up from 6, suggesting vendors prioritize high-end market retention over price competition.

Beyond compute, electricity and data center siting are emerging as critical constraints. A private neocloud has secured contracts for 4.9GW of demand against a planned pipeline exceeding 40GW, underscoring the supply-demand mismatch. Citigroup observes that data centers favor regions with electricity prices near 9-12 cents/kWh, influenced by renewable energy availability and long-term power purchase agreements. As infrastructure costs rise, capital expenditure per equivalent H100 unit increases, shifting electricity costs from operational to upfront investment. The next value phase may flow to the 'inference routing layer,' which optimizes model and hardware selection, though enterprise data privacy remains a challenge. The AI inference cycle is thus expanding from chips to broader infrastructure, including optical communication and cloud services, with coverage targets such as Ciena, Lumentum, and MiniMax.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.

WOOFUN.AI — Your Smart Crypto Assistant. Reconstructing the crypto experience with smart technology. We simplify the complex, break professional barriers, and enable everyone to embrace the digital future with confidence, intelligence, and joy.

iOS

Google Play

Android Apk

Market Ecosystem Alpha Paradise Lost Ratings News News Flash Calendar Exchanges Wallets