# GPT

- [81% vs. 46%: The AI Coding Benchmark That's Been Lying to You](https://sdd.sh/2026/04/81-vs.-46-the-ai-coding-benchmark-thats-been-lying-to-you.md): SWE-bench Verified — the benchmark that put every frontier model above 80% — is contaminated. OpenAI stopped reporting it in February. Here's what actually happened, what SWE-bench Pro replaces it with, and why 46% is a more honest number than 81%.