AI Weekly Malaysia

Back to items Summaries

Why we no longer evaluate SWE-bench Verified

ID
221
Status
new
Published
23 Feb 2026, 7:00 PM
Fetched
27 Jun 2026, 7:47 PM
Provider
OpenAI News
Category
ai-labs
Original URL
https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified
Source URL
https://openai.com/news/rss.xml

Excerpt

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

Summary

No summary yet. It will appear after the daemon summarizes this item.

Top