SmartSpectrum AI GLM-5.2 Leads DeepSWE Benchmark with 44% Success Rate on Complex Tasks

2026-06-21 11:02

Data compiled by Woofun AI shows that SmartSpectrum AI's open-source model GLM-5.2 has secured the top position in the DeepSWE long-range software engineering benchmark. Operating in maximum thinking mode, the model attained a one-shot success rate of 44% for complex development tasks, surpassing the previously leading Kimi K2.7 Code by 13 percentage points.

While GLM-5.2 incurs an average cost of $3.92 per task compared to Kimi K2.7 Code's $2.82, it demonstrates superior performance against several mainstream closed-source alternatives. Specifically, it outperformed Claude Sonnet 4.6 [high] at 30%, Gemini 3.5 Flash [medium] at 37%, and Claude Opus 4.8 [low] at 41%. The DeepSWE benchmark, designed by Datacurve, evaluates AI agents on 113 real-world coding problems across five languages. Unlike traditional tests focusing on single-line modifications, this assessment requires collaborative editing of multiple files with average code fixes exceeding 600 lines, all executed within isolated containers with strict CPU and memory constraints.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.

Trending News

US Bitcoin ETFs record $6.4B net outflow in 30 days as cumulative flow drops to $53.4B

Perp DEX OI retention hits 99% as Hyperliquid Aster Lighter redefine derivatives war

PENGU token targets $0.25 by end of 2025 driven by 1M toy sales and 18% staking yields

Biconomy surges 40% as zkSync and Polygon zkEVM integrations drive 50M monthly transaction volume

Solana captures 97% of tokenized stock volume while legal structures diverge across 4 issuers

Bitdeer mined 921 BTC in May while holdings dropped to 171 BTC amid AI pivot

Resilient labor data pushes Fed rate cut odds to zero and triggers Bitcoin sell-off

Bitcoin digital credit yield trade breaks par as $10B market faces margin calls

Texas Governor Abbott mandates 121 data centers fund grid infrastructure to end household subsidies

WLD surges 149.6% while altcoin dominance drops to 21.16% amid persistent spot selling pressure