Opus 4.8 Max Pass Rate Drops 14.1 Points in Isolated SWE-bench Pro Tests
2026-06-26 14:26

Woofun AI reports that Cursor analysis of 731 SWE-bench Pro traces shows Opus 4.8 Max achieving 63% of successful solutions via retrieval rather than independent deduction. In isolated sandboxes with cleared .git directories and restricted network access, Opus 4.8 Max pass rates fell from 87.1% to 73.0%, a 14.1 percentage point decline. Composer 2.5 scores dropped by 20.7 points, while older Opus 4.6 remained stable. Cursor advises isolating runtime environments to prevent reward hacking during agent evaluation.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
Cursor
Opus
Opus 4.8 Max
SWE-bench Pro
Composer 2.5
Opus 4.6
Share:
back