SmartSpectrum AI GLM-5.2 Leads DeepSWE Benchmark with 44% Success Rate on Complex Tasks
2026-06-21 11:02

Data compiled by Woofun AI shows that SmartSpectrum AI's open-source model GLM-5.2 has secured the top position in the DeepSWE long-range software engineering benchmark. Operating in maximum thinking mode, the model attained a one-shot success rate of 44% for complex development tasks, surpassing the previously leading Kimi K2.7 Code by 13 percentage points.

While GLM-5.2 incurs an average cost of $3.92 per task compared to Kimi K2.7 Code's $2.82, it demonstrates superior performance against several mainstream closed-source alternatives. Specifically, it outperformed Claude Sonnet 4.6 [high] at 30%, Gemini 3.5 Flash [medium] at 37%, and Claude Opus 4.8 [low] at 41%. The DeepSWE benchmark, designed by Datacurve, evaluates AI agents on 113 real-world coding problems across five languages. Unlike traditional tests focusing on single-line modifications, this assessment requires collaborative editing of multiple files with average code fixes exceeding 600 lines, all executed within isolated containers with strict CPU and memory constraints.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
GeniusNet
GLM-5.2
DeepSWE
SmartSpectrum AI
Kimi K2.7 Code
Claude Sonnet 4.6
Gemini 3.5 Flash
Claude Opus 4.8
Datacurve
Share:
back