10Opinion·Feb 23
Why we no longer evaluate SWE-bench Verified
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
Covered by 1 source
- OOpenAI Blog↗Feb 23