# Terminal-Bench

- [GPT-5.5 'Spud' Is OpenAI's Strongest Coding Model Yet — With One Important Asterisk](https://sdd.sh/2026/04/gpt-5.5-spud-is-openais-strongest-coding-model-yet-with-one-important-asterisk.md): OpenAI's first fully retrained base model since GPT-4.5 delivers 82.7% on Terminal-Bench 2.0 and leads on most agentic evals. But on SWE-bench Pro — the benchmark that tests real-world GitHub issue resolution — Claude Opus 4.7 still leads by 5.7 points. Here's what that split actually means.