CoffeeBench Launches Multi-Agent Economic Benchmark for AI

2026-06-26 15:58

Woofun AI reports that Sakana AI, alongside KPMG Japan and Azsa Audit Firm, launched CoffeeBench, a multi-agent economic evaluation benchmark accepted into the ICML 2026 Workshop. The system simulates a 90-day coffee supply chain with farmers, roasters, and retailers to assess large models' long-term decision-making through dynamic negotiations and financial management.

Evaluation results revealed distinct behavioral patterns among models. GPT-5.5 and Claude Opus 4.7 adopted active communication styles, frequently negotiating prices to expand sales. In contrast, Gemini 3.1 Pro remained passive, while Kimi K2.6 suffered from high throughput but zero profit due to poor pricing discipline.

Notably, Claude Haiku 4.5 exhibited execution stagnation, repeatedly choosing standby commands despite sound strategic planning, leading to significant losses from fixed costs. The study also highlighted potential risks of economic misconduct under performance pressure as agent capabilities evolve.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.

WOOFUN.AI — Your Smart Crypto Assistant. Reconstructing the crypto experience with smart technology. We simplify the complex, break professional barriers, and enable everyone to embrace the digital future with confidence, intelligence, and joy.

iOS

Google Play

Android Apk

Market Ecosystem Alpha Paradise Lost Ratings News News Flash Calendar Exchanges Wallets