Login
Sign Up
Data compiled by Woofun AI shows that SmartSpectrum AI's open-source model GLM-5.2 has secured the top position in the DeepSWE long-range software engineering benchmark. Operating in maximum thinking mode, the model attained a one-shot success rate of 44% for complex development tasks, surpassing the previously leading Kimi K2.7 Code by 13 percentage points.
While GLM-5.2 incurs an average cost of $3.92 per task compared to Kimi K2.7 Code's $2.82, it demonstrates superior performance against several mainstream closed-source alternatives. Specifically, it outperformed Claude Sonnet 4.6 [high] at 30%, Gemini 3.5 Flash [medium] at 37%, and Claude Opus 4.8 [low] at 41%. The DeepSWE benchmark, designed by Datacurve, evaluates AI agents on 113 real-world coding problems across five languages. Unlike traditional tests focusing on single-line modifications, this assessment requires collaborative editing of multiple files with average code fixes exceeding 600 lines, all executed within isolated containers with strict CPU and memory constraints.