Why we no longer evaluate SWE-bench Verified
- ID
- 221
- Status
- new
- Published
- 23 Feb 2026, 7:00 PM
- Fetched
- 27 Jun 2026, 7:47 PM
- Provider
- OpenAI News
- Category
- ai-labs
- Original URL
- https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified
- Source URL
- https://openai.com/news/rss.xml
Excerpt
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
Summary
No summary yet. It will appear after the daemon summarizes this item.