# AI Tools

- [Mistral Medium 3.5 Just Entered the Agentic Coding Race — Here's Where It Stands](https://sdd.sh/2026/05/mistral-medium-3.5-just-entered-the-agentic-coding-race-heres-where-it-stands.md): Mistral's 128B Medium 3.5 model and its Vibe remote agent platform went live this week. 77.6% SWE-bench Verified, async cloud execution, and a direct shot at the agentic coding market. The benchmarks are strong. The architecture tells a more complicated story.
- [Meta Avocado Is Closed-Source. The Llama Era Might Be Over.](https://sdd.sh/2026/05/meta-avocado-is-closed-source.-the-llama-era-might-be-over..md): Meta's next flagship model has been delayed twice, benchmarks below GPT-5.5 and Claude Opus 4.7, and unlike Llama — it won't be open-sourced. Meta is reportedly considering licensing Google Gemini as a stopgap. The open-source AI story Meta spent two years building is quietly unraveling.
- [Cursor Security Review vs. Claude Security: Two Betas, One Week, Opposite Architectures](https://sdd.sh/2026/05/cursor-security-review-vs.-claude-security-two-betas-one-week-opposite-architectures.md): On April 30, 2026, both Cursor and Anthropic shipped AI-powered security products on the same day. The features look similar on paper. The architectures could not be more different — and that difference tells you everything about where each company thinks AI coding is headed.
- [Microsoft Agent 365 Is Live: The Enterprise Control Plane That Governs Agents You're Already Running](https://sdd.sh/2026/05/microsoft-agent-365-is-live-the-enterprise-control-plane-that-governs-agents-youre-already-running.md): Microsoft Agent 365 reached general availability on May 1, 2026, bundled into the new M365 E7 Frontier Suite at $99/user. It is not a coding agent or a development tool. It is governance infrastructure — a control plane for discovering, governing, and securing every AI agent in your organization. Here is what it actually does, what it cannot govern, and why it matters.
- [Claude Code at $2.5B ARR: How a Terminal Agent Outpaced Every AI IDE](https://sdd.sh/2026/05/claude-code-at-2.5b-arr-how-a-terminal-agent-outpaced-every-ai-ide.md): Claude Code hit $1B ARR in six months after launch — faster than Slack, Zoom, or any AI coding competitor. By February 2026 it had crossed $2.5B, accounting for more than half of all Anthropic enterprise spending. Here's what those numbers actually mean for the AI coding market.
- [Claude Code v2.1.119: Multi-VCS Support, Settings Persistence, and the Enterprise Push](https://sdd.sh/2026/05/claude-code-v2.1.119-multi-vcs-support-settings-persistence-and-the-enterprise-push.md): Claude Code v2.1.119 shipped multi-VCS support for --from-pr (GitLab, Bitbucket, GitHub Enterprise), settings persistence to ~/.claude/settings.json, and proper agent frontmatter handling in --print mode. A release that reads like a feature patch but signals something bigger about where Claude Code is heading.
- [Claude Security: Anthropic Enters the Defensive Security Market](https://sdd.sh/2026/05/claude-security-anthropic-enters-the-defensive-security-market.md): Anthropic's Claude Security went to public beta on April 30, bringing reasoning-based vulnerability detection to enterprise codebases. With CrowdStrike, Wiz, SentinelOne, and Palo Alto as launch partners, this is Anthropic's first step beyond the developer tools market — and its timing couldn't be better.
- [Three Bugs, Six Weeks, One Lesson: Anthropic's Claude Code Postmortem](https://sdd.sh/2026/05/three-bugs-six-weeks-one-lesson-anthropics-claude-code-postmortem.md): On April 23, Anthropic published an engineering postmortem admitting three overlapping changes caused weeks of Claude Code quality degradation. All three were caught by user complaints, not internal evals. The story matters less for what it says about three bugs than for what it reveals about the risks of depending on black-box AI infrastructure.
- [Cursor SDK: The IDE Escapes the IDE — But Does It Break the Ceiling?](https://sdd.sh/2026/04/cursor-sdk-the-ide-escapes-the-ide-but-does-it-break-the-ceiling.md): Cursor launched a TypeScript SDK in public beta on April 29 that lets developers invoke Cursor agents programmatically from CI/CD pipelines, backend services, or other products — with sandboxed cloud VMs, subagents, and durable agent lifecycle. It's Cursor's most significant architectural shift since Composer. The question is whether it actually solves the autonomy problem, or just relocates it.
- [OpenAI Lands on Amazon Bedrock — The Cloud That Already Houses Claude](https://sdd.sh/2026/04/openai-lands-on-amazon-bedrock-the-cloud-that-already-houses-claude.md): After Microsoft's exclusivity expired on April 27, OpenAI moved its models, Codex agent, and a new jointly built Bedrock Managed Agents runtime onto AWS. Amazon now hosts both Anthropic and OpenAI. Here's what the infrastructure power shift means for the AI coding landscape.
- [DeepSeek V4: Near-Frontier Performance, Open Weights, and the First Major Model Built for Huawei Chips](https://sdd.sh/2026/04/deepseek-v4-near-frontier-performance-open-weights-and-the-first-major-model-built-for-huawei-chips.md): DeepSeek V4 arrived April 24 with two variants: a 1.6T-parameter Pro and a 284B-parameter Flash, both MIT-licensed and priced far below Western closed models. The bigger story is what it runs on: Huawei Ascend chips, not Nvidia.
- [The Flat-Rate Era Is Over: GitHub Copilot Moves to Token Billing on June 1](https://sdd.sh/2026/04/the-flat-rate-era-is-over-github-copilot-moves-to-token-billing-on-june-1.md): GitHub Copilot transitions all plans to usage-based billing on June 1, 2026. Code review will double-bill against GitHub Actions minutes. The flat-rate subscription model for AI coding tools is officially dead — and developers are not happy about it.
- [Claude Code in 2026: The Complete Deep Dive](https://sdd.sh/2026/04/claude-code-in-2026-the-complete-deep-dive.md): Claude Code isn't a coding assistant. It's a terminal-native autonomous agent that plans, implements, tests, and iterates on software with minimal supervision. This is the definitive 2026 guide to what it is, how it works, and how to get the most from it.
- [Google's 75% Threshold: When AI Became the Primary Author of Production Code](https://sdd.sh/2026/04/googles-75-threshold-when-ai-became-the-primary-author-of-production-code.md): Sundar Pichai revealed at Google Cloud Next 2026 that 75% of new code at Google is now AI-generated and reviewed by engineers. That number crossed a threshold most didn't expect this fast — and it reframes every assumption about what software teams look like in 2026.
- [92% of AI-Generated Codebases Have Critical Vulnerabilities. Here's Why Agentic Review Is the Fix.](https://sdd.sh/2026/04/92-of-ai-generated-codebases-have-critical-vulnerabilities.-heres-why-agentic-review-is-the-fix..md): The 2026 AI Coding Impact Report reveals that 100% of engineering orgs are shipping more code thanks to AI — and security teams are drowning. 92% of AI-generated codebases contain critical vulnerabilities. The answer isn't less AI. It's better AI review.
- [DeepSeek V4 Ships: Frontier-Class Coding at 1/6th the Cost](https://sdd.sh/2026/04/deepseek-v4-ships-frontier-class-coding-at-1/6th-the-cost.md): DeepSeek V4-Pro hits 80.6% on SWE-bench Verified and 93.5% on LiveCodeBench — matching or exceeding most closed models — while costing 1/6th of Claude Opus 4.7 and releasing under the MIT license. Here's what actually matters, and what the benchmarks don't tell you.
- [Google Cloud Next 2026: A2A Goes Production, Jules Graduates — But the Autonomy Gap Remains](https://sdd.sh/2026/04/google-cloud-next-2026-a2a-goes-production-jules-graduates-but-the-autonomy-gap-remains.md): Google's Cloud Next 2026 delivered genuine infrastructure progress: A2A protocol in production at 150 organizations, Jules out of beta, Gemini Enterprise Agent Platform replacing Vertex AI. But integration breadth still isn't the same as autonomy depth.
- [Claude Code v2.1.118: Vim Mode, Custom Themes, and Hooks That Talk to MCP](https://sdd.sh/2026/04/claude-code-v2.1.118-vim-mode-custom-themes-and-hooks-that-talk-to-mcp.md): Claude Code v2.1.118 ships vim visual mode, a full custom theming system, and hooks that can now invoke MCP tools directly. Small-sounding updates that collectively make Claude Code meaningfully more extensible — and more comfortable for developers who live in the terminal.
- [MiniMax M2.7: The Open-Source Agent That Rewrote Its Own Training Loop](https://sdd.sh/2026/04/minimax-m2.7-the-open-source-agent-that-rewrote-its-own-training-loop.md): MiniMax M2.7 is the first open-source model to participate in its own development cycle — 100 autonomous rounds of scaffold optimization, 30% performance gain, 56.22% on SWE-Pro. It's not just a strong model. It's a glimpse of what model self-improvement looks like in practice.
- [Amazon Just Bet $25 Billion on Anthropic — and Locked In Its Cloud Destiny for a Decade](https://sdd.sh/2026/04/amazon-just-bet-25-billion-on-anthropic-and-locked-in-its-cloud-destiny-for-a-decade.md): Amazon announced up to $25B in new Anthropic investment tied to a $100B AWS commitment over 10 years. The deal gives Anthropic 5 GW of dedicated compute, native AWS console access for Claude, and a stable infrastructure runway well past any IPO. For developers building with Claude Code, the implications are more concrete than they first appear.
- [GPT-5.5 'Spud' Is OpenAI's Strongest Coding Model Yet — With One Important Asterisk](https://sdd.sh/2026/04/gpt-5.5-spud-is-openais-strongest-coding-model-yet-with-one-important-asterisk.md): OpenAI's first fully retrained base model since GPT-4.5 delivers 82.7% on Terminal-Bench 2.0 and leads on most agentic evals. But on SWE-bench Pro — the benchmark that tests real-world GitHub issue resolution — Claude Opus 4.7 still leads by 5.7 points. Here's what that split actually means.
- [Claude Design Is Not a Figma Clone. It's the Missing First Half of Your Agentic Stack.](https://sdd.sh/2026/04/claude-design-is-not-a-figma-clone.-its-the-missing-first-half-of-your-agentic-stack..md): Anthropic's Claude Design launched April 17 as a research preview. It's not a Figma alternative — it's the upstream half of the Claude Code shipping pipeline, and the handoff mechanism changes the conversation entirely.
- [OpenCode at 147K Stars: The Open-Source Terminal Agent That Won't Pick a Side](https://sdd.sh/2026/04/opencode-at-147k-stars-the-open-source-terminal-agent-that-wont-pick-a-side.md): OpenCode has 147K GitHub stars, 6.5M monthly developers, and supports 75+ LLM providers. Here's an honest look at what it gets right, where it falls short, and when it makes more sense than Claude Code.
- [Anthropic Tests Pulling Claude Code From Pro — And Gets an Instant Lesson in Developer Trust](https://sdd.sh/2026/04/anthropic-tests-pulling-claude-code-from-pro-and-gets-an-instant-lesson-in-developer-trust.md): On April 22, Anthropic quietly removed Claude Code from its $20 Pro plan — then called it an A/B test when developers noticed. The pricing logic is sound; the execution is another episode in a troubling pattern.
- [Salesforce Headless 360: The World's Largest CRM Just Became an MCP Server](https://sdd.sh/2026/04/salesforce-headless-360-the-worlds-largest-crm-just-became-an-mcp-server.md): At TDX 2026, Salesforce shipped 60+ MCP tools and 30+ coding skills under the 'Headless 360' banner, making every corner of its platform natively callable from Claude Code, Cursor, Codex, and Windsurf. When the world's largest CRM goes headless for AI, the enterprise software landscape just shifted.
- [Five Claude Code Features That Don't Make Headlines But Change Everything](https://sdd.sh/2026/04/five-claude-code-features-that-dont-make-headlines-but-change-everything.md): The benchmark releases get the press. The unglamorous power-user features don't. Here's what /ultrareview, auto mode for Max, xhigh effort, /recap, and the new prompt caching TTL controls actually change about your daily Claude Code workflow.
- [The Stanford AI Index 2026 Is Out. The Skeptics Are Out of Arguments.](https://sdd.sh/2026/04/the-stanford-ai-index-2026-is-out.-the-skeptics-are-out-of-arguments..md): Stanford HAI's 423-page 2026 AI Index dropped April 13. The numbers on agentic coding are not subtle: SWE-bench Verified jumped from 60% to near 100% of human baseline in a single year. Here's what the data actually means for working engineers.
- [Apple Sends 200 Siri Engineers to AI Coding Bootcamp — The Rest of Apple Already Got There](https://sdd.sh/2026/04/apple-sends-200-siri-engineers-to-ai-coding-bootcamp-the-rest-of-apple-already-got-there.md): Apple is sending nearly 200 Siri engineers to a multi-week AI coding bootcamp before WWDC 2026. The subtext: other Apple teams already run on Claude Code. When the world's most elite engineering org mandates the transition, the shift is real — but the story is messier than the headline.
- [OpenAI's Agents SDK Gets Sandboxed Execution and a Model-Native Harness: The Agent Infrastructure Layer Is Now Table Stakes](https://sdd.sh/2026/04/openais-agents-sdk-gets-sandboxed-execution-and-a-model-native-harness-the-agent-infrastructure-layer-is-now-table-stakes.md): OpenAI's April 15 Agents SDK update ships sandboxed execution, a model-native harness with configurable memory, provider-agnostic model support, and durable state via snapshotting. The primitives Claude Code has offered since day one are becoming the standard SDK layer. Here's what that means.
- [Claude Opus 4.7 Is Your New API Default on April 23. Here's What Changes.](https://sdd.sh/2026/04/claude-opus-4.7-is-your-new-api-default-on-april-23.-heres-what-changes..md): On April 23, the 'opus' API alias switches to Opus 4.7. Same price, one-third the tool errors, best SWE-bench Pro score on the market. If your pipeline uses the bare alias, you're upgrading automatically. Here's what that actually means.
- [OpenAI Codex Goes Desktop Agent. It's Still Not Claude Code.](https://sdd.sh/2026/04/openai-codex-goes-desktop-agent.-its-still-not-claude-code..md): OpenAI's April 17 Codex update ships multi-agent desktop control, 90+ MCP plugins, and persistent memory. It's a real step forward in autonomy — built on exactly the wrong architecture.
- [Claude Code on Bedrock with Mantle: The Enterprise Air-Gap Story](https://sdd.sh/2026/04/claude-code-on-bedrock-with-mantle-the-enterprise-air-gap-story.md): Claude Code v2.1.94 shipped Mantle backend support, enabling zero operator access on AWS-managed infrastructure. No SSH. No Session Manager. No Anthropic personnel in the inference path. Here's what that actually means for enterprise buyers.
- [Lucidworks MCP: $150K Per Integration Saved, and What It Says About MCP's Real Value](https://sdd.sh/2026/04/lucidworks-mcp-150k-per-integration-saved-and-what-it-says-about-mcps-real-value.md): Lucidworks launched an MCP server that connects AI assistants to enterprise search with claimed $150K savings per integration and 10x faster rollout. The numbers are impressive. The bigger story is what it reveals about MCP's role in enterprise AI architecture.
- [Claude Opus 4.7: 87.6% SWE-bench, Implicit-Need Tests, Same Price](https://sdd.sh/2026/04/claude-opus-4.7-87.6-swe-bench-implicit-need-tests-same-price.md): Anthropic shipped Claude Opus 4.7 on April 16, 2026. SWE-bench Verified jumps nearly 7 points to 87.6%, SWE-bench Pro leaps from 53.4% to 64.3%, and the model is the first Claude to pass implicit-need tests. Pricing stays flat at $5/$25 per million tokens.
- [The Orchestrator Seat: Claude Code's Desktop Redesign Makes Parallel Agents Native](https://sdd.sh/2026/04/the-orchestrator-seat-claude-codes-desktop-redesign-makes-parallel-agents-native.md): Anthropic's April 14 Claude Code desktop redesign isn't a UI polish — it's a rethinking of how developers manage multiple AI agents simultaneously. Multi-session sidebar, git worktree isolation, side chats, and an integrated toolkit mean you can orchestrate five agents without leaving the app.
- [Anthropic's Silent 'Effort' Default: A Reasonable Decision, a Transparency Failure](https://sdd.sh/2026/04/anthropics-silent-effort-default-a-reasonable-decision-a-transparency-failure.md): On March 3, Anthropic quietly changed Claude Opus 4.6's default effort level to 'medium' without telling users. An AMD executive's analysis of 6,852 sessions showed a 73% drop in visible thinking depth. Fortune, VentureBeat, and The Register covered the fallout. Here is what actually changed, why Anthropic did it, and what it means for developers who depend on Claude Code for serious work.
- [Claude Cowork Goes GA: Six Enterprise Features That Turn AI Into Workplace Infrastructure](https://sdd.sh/2026/04/claude-cowork-goes-ga-six-enterprise-features-that-turn-ai-into-workplace-infrastructure.md): Anthropic moved Claude Cowork from research preview to general availability on April 9, 2026, and shipped six enterprise management features alongside it. RBAC, group spend limits, OpenTelemetry, per-tool connector controls, a Zoom MCP connector, and expanded analytics. Here is what each feature does and why the bundle matters more than any individual item.
- [Claude Code Routines: The AI Cron Job That Actually Understands Your Codebase](https://sdd.sh/2026/04/claude-code-routines-the-ai-cron-job-that-actually-understands-your-codebase.md): Claude Code's new Routines feature — launched April 14 as a research preview — turns your AI agent into a cloud-native automation engine. Schedule it, trigger it via API, or fire it on GitHub events. Here is what routines are, how each trigger type works, and why this is a bigger architectural shift than it looks.
- [The Three-Layer AI Coding Stack That Nobody Planned (But Everyone Is Building)](https://sdd.sh/2026/04/the-three-layer-ai-coding-stack-that-nobody-planned-but-everyone-is-building.md): Cursor, Claude Code, and OpenAI Codex are not converging into a single winner-take-all tool. They are stratifying into three distinct layers — orchestration, execution, and review — and the most sophisticated developers are building workflows that use all three. Here is what each layer does, why Claude Code wins at the execution layer, and what the emergence of OpenAI's Codex plugin for Claude Code signals about where this is heading.
- [Anthropic Hits $30B ARR and Overtakes OpenAI: What the Revenue Rocket Means for Claude Code](https://sdd.sh/2026/04/anthropic-hits-30b-arr-and-overtakes-openai-what-the-revenue-rocket-means-for-claude-code.md): Anthropic just reported a $30 billion annual run rate — up 3x from $9B just four months ago — and overtook OpenAI in revenue. With a CoreWeave infrastructure deal, a Broadcom/Google TPU compute agreement, and 1,000+ enterprise customers spending over $1M per year, the company building Claude Code is now the fastest-growing software company in history. Here is what that means for the tools you use.
- [Claude Code Analytics API: The Missing Bridge Between AI Coding and Enterprise ROI](https://sdd.sh/2026/04/claude-code-analytics-api-the-missing-bridge-between-ai-coding-and-enterprise-roi.md): Anthropic's Claude Code Analytics API gives enterprise organizations programmatic access to daily aggregated usage metrics — commits, PRs, lines of code, session counts, token costs, and more — per developer, per day. Here is what it tracks, how to set it up, and why it matters for every team that needs to justify its AI coding investment to leadership.
- [84% of Developers Use AI Code Tools. Only 29% Trust What They Ship.](https://sdd.sh/2026/04/84-of-developers-use-ai-code-tools.-only-29-trust-what-they-ship..md): Stack Overflow's developer survey exposed a paradox: AI coding tool adoption is at an all-time high, but trust in AI-generated code just hit an all-time low. The gap isn't irrational — it's diagnostic. And it points directly to what's broken about the autocomplete paradigm.
- [Claude Code Is Now the #2 AI Coding Tool at Work — and Has the Best NPS in the Industry](https://sdd.sh/2026/04/claude-code-is-now-the-%232-ai-coding-tool-at-work-and-has-the-best-nps-in-the-industry.md): JetBrains surveyed 10,000+ developers in January 2026. Claude Code has grown 6x in eight months and now ties Cursor for second place — while GitHub Copilot still leads by adoption, Claude Code leads by every satisfaction metric.
- [Claude Code /powerup and /insights: Fixing the 80% Problem](https://sdd.sh/2026/04/claude-code-/powerup-and-/insights-fixing-the-80-problem.md): Most developers use a fraction of what Claude Code can do. Two new commands shipped in v2.1.90 — /powerup and /insights — attack this problem from opposite ends: one teaches you what's possible, the other shows you where your actual workflow breaks down.
- [Microsoft Agent Framework 1.0: The Enterprise .NET World Just Adopted MCP](https://sdd.sh/2026/04/microsoft-agent-framework-1.0-the-enterprise-.net-world-just-adopted-mcp.md): Microsoft shipped Agent Framework 1.0 on April 3 with full MCP and A2A protocol support for .NET and Python. This isn't just another framework — it's Microsoft committing the entire enterprise .NET developer ecosystem to MCP as the standard tool integration layer.
- [81% vs. 46%: The AI Coding Benchmark That's Been Lying to You](https://sdd.sh/2026/04/81-vs.-46-the-ai-coding-benchmark-thats-been-lying-to-you.md): SWE-bench Verified — the benchmark that put every frontier model above 80% — is contaminated. OpenAI stopped reporting it in February. Here's what actually happened, what SWE-bench Pro replaces it with, and why 46% is a more honest number than 81%.
- [Claude Code Ultraplan: When 30 Minutes of Cloud Thinking Beats 5 Seconds of Local Guessing](https://sdd.sh/2026/04/claude-code-ultraplan-when-30-minutes-of-cloud-thinking-beats-5-seconds-of-local-guessing.md): Ultraplan hands your planning task to a dedicated cloud session running Opus 4.6 for up to 30 minutes — while your terminal stays free. Here's what it actually is, how the three modes differ, and when to reach for it.
- [Claude Managed Agents: Anthropic Just Built the Agent Loop You Were Going to Write Anyway](https://sdd.sh/2026/04/claude-managed-agents-anthropic-just-built-the-agent-loop-you-were-going-to-write-anyway.md): Anthropic launched Claude Managed Agents on April 8 — a managed API that handles the agent loop, sandboxing, checkpointing, and tool orchestration you'd otherwise build yourself. Here's what it actually offers, how the pricing model works, and why it matters for teams shipping production agents.
- [Cursor 3: Agent-First Branding, IDE-Last Architecture](https://sdd.sh/2026/04/cursor-3-agent-first-branding-ide-last-architecture.md): Cursor 3 shipped a genuinely redesigned interface built around parallel agents. The Agents Window, Design Mode, /worktree, and /best-of-n are real features with real uses. But 'agent-first' describes the UI layer, not the architecture — and the distinction matters more than Cursor's marketing suggests.
- [GitHub Copilot Finally Got Autopilot Mode. It's Still Not an Agent.](https://sdd.sh/2026/04/github-copilot-finally-got-autopilot-mode.-its-still-not-an-agent..md): GitHub Copilot's April 8 VS Code update ships Autopilot Mode, nested subagents, and MCP sandboxing. These are real improvements. They're also a demonstration of why bolting autonomy onto an IDE produces something fundamentally different from a real agent.
- [Meta's Muse Spark Is Closed Source. Open-Source AI Just Lost Its Last Major Patron.](https://sdd.sh/2026/04/metas-muse-spark-is-closed-source.-open-source-ai-just-lost-its-last-major-patron..md): Meta Superintelligence Labs shipped Muse Spark — and made it closed-source. The company that framed open AI as a moral imperative just locked the door. Here's what that means for developers who built their stack on Llama.
- [Claude Mythos Goes Official: Project Glasswing and the Zero-Day Reckoning](https://sdd.sh/2026/04/claude-mythos-goes-official-project-glasswing-and-the-zero-day-reckoning.md): Anthropic officially unveiled Claude Mythos Preview on April 7, confirming what the March leak hinted at: a model that autonomously found thousands of zero-days across every major OS and browser. Their response — Project Glasswing — grants restricted access to a select group of tech giants to use Mythos as a defensive weapon. This is the most consequential 'too dangerous to release' moment in AI history.
- [GLM-5.1: The Open-Source Model That Just Beat Everyone on SWE-bench Pro](https://sdd.sh/2026/04/glm-5.1-the-open-source-model-that-just-beat-everyone-on-swe-bench-pro.md): Z.AI released GLM-5.1 today — a 754B open-weight model under MIT license that scored 58.4% on SWE-bench Pro, beating GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Its headline demo: an 8-hour autonomous session that built a complete Linux desktop environment across 655 iterations. The closed-model monopoly on frontier coding capability just got its first serious challenge.
- [The CLAUDE.md Trap: How a New Supply-Chain Attack Targets Agentic Developers](https://sdd.sh/2026/04/the-claude.md-trap-how-a-new-supply-chain-attack-targets-agentic-developers.md): A patched vulnerability in Claude Code (CVE-2026-21852) reveals an entirely new attack surface: poisoned project config files that silently bypass your deny rules and exfiltrate credentials. Here's what happened, how the exploit works, and what it means for agentic security.
- [Anthropic's OpenClaw Ban Is a Platform Power Move — And an Honest One](https://sdd.sh/2026/04/anthropics-openclaw-ban-is-a-platform-power-move-and-an-honest-one.md): Anthropic just blocked Claude Pro and Max subscribers from using their subscriptions with OpenClaw and other third-party harnesses. The decision is strategically transparent, commercially necessary — and a sign of where the agentic ecosystem is heading.
- [Windsurf After Cognition: GPT-5.4, One Million Users, and an Identity Crisis](https://sdd.sh/2026/04/windsurf-after-cognition-gpt-5.4-one-million-users-and-an-identity-crisis.md): Windsurf has crossed one million active users, added GPT-5.4 with five reasoning effort levels, and is now fully under Cognition AI's ownership. The product is better. The question is whether it has found an identity that justifies its place in the market.
- [Claude's 1M Context Window Is Now Standard: What Actually Changes for Agentic Coding](https://sdd.sh/2026/04/claudes-1m-context-window-is-now-standard-what-actually-changes-for-agentic-coding.md): On March 13, Anthropic made the 1M token context window standard on Sonnet 4.6 and Opus 4.6 — no beta header, no pricing premium above 200K. Here is what that actually changes for coding agents, how it compares to the competition, and what it still cannot solve.
- [Gemma 4: Google Just Made the Case for Running Your Coding Agent Locally](https://sdd.sh/2026/04/gemma-4-google-just-made-the-case-for-running-your-coding-agent-locally.md): Google's Gemma 4 dropped on April 2 with Apache 2.0 licensing, 80% on LiveCodeBench v6, a Codeforces ELO of 2,150, and agentic tool-use scores that make the previous generation look like a prototype. The 26B MoE model runs on a single consumer GPU with 256K context. Here's what it actually means.
- [GitHub Copilot CLI Goes GA: Microsoft Just Admitted Claude Code Was Right](https://sdd.sh/2026/04/github-copilot-cli-goes-ga-microsoft-just-admitted-claude-code-was-right.md): GitHub Copilot CLI reached general availability on February 25 with full autopilot mode, multi-model support, and a cloud offload feature that lets you delegate to an agent mid-session. Microsoft just shipped a terminal-native agentic coding tool. The irony is deliberate.
- [Pinterest's MCP Blueprint: 66,000 Invocations a Month, 7,000 Hours Saved — This Is What Production MCP Looks Like](https://sdd.sh/2026/04/pinterests-mcp-blueprint-66000-invocations-a-month-7000-hours-saved-this-is-what-production-mcp-looks-like.md): MCP hit 97 million downloads. Pinterest just showed what you do with them. Their production MCP ecosystem — domain-specific servers, a central registry, two-layer JWT auth, and hard ROI numbers — is the blueprint every serious engineering team will follow.
- [GitHub Copilot's April 24 Data Grab: What You're Agreeing To and How to Opt Out](https://sdd.sh/2026/04/github-copilots-april-24-data-grab-what-youre-agreeing-to-and-how-to-opt-out.md): Starting April 24, GitHub will train its AI models on Copilot Free, Pro, and Pro+ users' code by default — private repos included. The opt-out exists, but it's buried, not available on mobile, and unverifiable. Here's what's actually in the policy change and what it means.
- [What Anthropic's Accidental 512K-Line Leak Reveals About Claude Code's Future](https://sdd.sh/2026/04/what-anthropics-accidental-512k-line-leak-reveals-about-claude-codes-future.md): Anthropic accidentally published Claude Code's full TypeScript source to npm. Fifty thousand downloads later, we know about KAIROS — a proactive always-on daemon — plus ULTRAPLAN, undercover mode, anti-distillation traps, and a virtual pet. This isn't a scandal. It's an accidental roadmap.
- [Cursor Is Worth $50 Billion. Its Biggest Problem Is That It Still Needs You.](https://sdd.sh/2026/04/cursor-is-worth-50-billion.-its-biggest-problem-is-that-it-still-needs-you..md): Cursor's $50B valuation is real, its self-hosted cloud agents are a genuine enterprise product, and 67% of Fortune 500 companies are customers. But the autonomy ceiling — the fundamental limit that keeps Cursor in the IDE and humans in the loop — hasn't moved.
- [MCP Dev Summit NYC 2026: Authentication Is the Crisis, OpenAI Is Now a Stakeholder](https://sdd.sh/2026/04/mcp-dev-summit-nyc-2026-authentication-is-the-crisis-openai-is-now-a-stakeholder.md): The first major Linux Foundation MCP summit signals protocol maturity — but surfaces an uncomfortable truth: 43% of MCP servers have OAuth vulnerabilities, auth is still the dominant unsolved problem, and breaking changes are coming in SDK V2.
- [Claude Code Computer Use: The Agent That Can Now See, Click, and Ship](https://sdd.sh/2026/04/claude-code-computer-use-the-agent-that-can-now-see-click-and-ship.md): Anthropic's March 23 Computer Use launch for Claude Code is the closest thing yet to a fully autonomous coding agent. It can open your files, run your app, spot the bug, and fix it — without you touching a keyboard.
- [The SWE-bench Plateau: Three Frontier Models Walk In, All Score 80% — Now What?](https://sdd.sh/2026/04/the-swe-bench-plateau-three-frontier-models-walk-in-all-score-80-now-what.md): Gemini 3.1 Pro, Claude Opus 4.6, and GPT-5.3-Codex are all within 0.8% of each other on SWE-bench Verified. When every frontier model aces the exam, the exam stops being useful. Here's what actually differentiates them.
- [Anthropic's $380B Moment: What the IPO Signal Means for Claude Code](https://sdd.sh/2026/03/anthropics-380b-moment-what-the-ipo-signal-means-for-claude-code.md): Anthropic is targeting an October 2026 IPO to raise over $60 billion at a $380 billion valuation, with $19B in annualized revenue and 8 Fortune 10 customers. For developers building on Claude Code, the financial mechanics matter less than what they signal.
- [MCP Crosses 97 Million Downloads: The Protocol That Won](https://sdd.sh/2026/03/mcp-crosses-97-million-downloads-the-protocol-that-won.md): Sixteen months after Anthropic published a draft spec, MCP has crossed 97 million monthly SDK downloads — and OpenAI's adoption paired with retiring the Assistants API has effectively handed MCP the crown. Here's what that means for agentic development.
- [Claude Mythos: The Leaked Model That Scared the Security World](https://sdd.sh/2026/03/claude-mythos-the-leaked-model-that-scared-the-security-world.md): A CMS misconfiguration at Anthropic accidentally revealed 'Claude Mythos' — a model tier above Opus 4.6 that Anthropic itself calls an unprecedented cybersecurity risk. Here's what leaked, what it means for agentic coding, and why the security industry noticed immediately.
- [Jules Deep Dive: Google's Async Agent That Closes the CI Loop Without You](https://sdd.sh/2026/03/jules-deep-dive-googles-async-agent-that-closes-the-ci-loop-without-you.md): Jules is now generally available with Gemini 3.1 Pro at its core, an autonomous CI failure detection and fix loop, and audio changelogs. This is what a fully async coding agent actually looks like — and how it compares to the terminal-native model Claude Code represents.
- [Claude Code Agent Teams: One Developer, Fifteen AI Teammates](https://sdd.sh/2026/03/claude-code-agent-teams-one-developer-fifteen-ai-teammates.md): Claude Code's experimental Agent Teams feature lets a single session orchestrate up to 15 independent AI teammates, each with its own context window and toolset. Here's what the architecture looks like — and why a Rust C compiler built by 16 agents is a stress test worth understanding.
- [Parallel AI Agents: The Tools That Let You Run Ten Claudes at Once](https://sdd.sh/2026/03/parallel-ai-agents-the-tools-that-let-you-run-ten-claudes-at-once.md): One Claude Code session is powerful. Ten running in parallel is a different paradigm entirely. Here's the emerging ecosystem of multiplexers, orchestrators, and dashboards — and how to pick the right one.
- [Windsurf Arena Mode: Let the Models Fight It Out](https://sdd.sh/2026/03/windsurf-arena-mode-let-the-models-fight-it-out.md): Windsurf Arena Mode runs two AI agents on the same task in parallel isolated worktrees, then asks you to pick the winner. It's a clever answer to a real problem — but it also reveals something telling about where IDE-centric AI is stuck.
- [Your AI Agent Is Drowning in Tokens — Here's How to Fix It](https://sdd.sh/2026/03/your-ai-agent-is-drowning-in-tokens-heres-how-to-fix-it.md): A single `cargo test` can dump 4,800 tokens into your context window when only 11 matter. Multiply that across an agentic session and you're paying for noise that actively degrades your agent's reasoning. The fix exists — and it's not a bigger context window.
- [Claude Code AutoDream: Your AI Agent Finally Sleeps on It](https://sdd.sh/2026/03/claude-code-autodream-your-ai-agent-finally-sleeps-on-it.md): Anthropic quietly shipped AutoDream — a background memory consolidation system for Claude Code that runs between sessions, prunes stale notes, and fixes conflicting data. Think REM sleep for your coding agent.
- [Cursor Composer 2: The Model That Learns to Forget — and Sparked a Controversy](https://sdd.sh/2026/03/cursor-composer-2-the-model-that-learns-to-forget-and-sparked-a-controversy.md): Cursor's new coding model beats Claude Opus 4.6 on key benchmarks — but the real story is a training breakthrough called compaction-in-the-loop RL, and a transparency controversy that revealed Cursor quietly built it on a Chinese open-source model.
- [GPT-5.3-Codex: The First AI Model That Helped Build Itself — and Got a Scary Security Rating](https://sdd.sh/2026/03/gpt-5.3-codex-the-first-ai-model-that-helped-build-itself-and-got-a-scary-security-rating.md): OpenAI's GPT-5.3-Codex was instrumental in creating itself, introduced mid-turn steering for agentic workflows, and became the first OpenAI model rated 'High capability' for cybersecurity — which means it can reliably exploit real vulnerabilities.
- [Cursor Automations: Your IDE Just Became an Always-On Agent](https://sdd.sh/2026/03/cursor-automations-your-ide-just-became-an-always-on-agent.md): Cursor Automations turns your IDE into a reactive system that writes code, triages bugs, and responds to incidents while you sleep. Here's what it can do — and what it can't yet.
- [GitHub Copilot Gets Smarter — and Wants Your Code Data](https://sdd.sh/2026/03/github-copilot-gets-smarter-and-wants-your-code-data.md): Cross-agent memory, built-in security scanning, Jira integration, and a model picker make Copilot's coding agent genuinely capable. Then GitHub announced it's using your interaction data for training. Here's the full picture.
- [Claude Code Auto Mode: Anthropic Hands AI More Control (But Keeps It on a Leash)](https://sdd.sh/2026/03/claude-code-auto-mode-anthropic-hands-ai-more-control-but-keeps-it-on-a-leash.md): Auto Mode lets Claude decide which actions are safe to take without asking permission — but adds an AI safety layer that screens every action for prompt injection and risky behavior. Here's what changed and why it matters.
- [Cognition Buys Windsurf: The AI Coding Market Is Consolidating](https://sdd.sh/2026/03/cognition-buys-windsurf-the-ai-coding-market-is-consolidating.md): Cognition AI — the company behind Devin — acquired Windsurf for roughly $250 million. Combine that with Devin 2.0's 96% price cut and Windsurf's Codemaps, and Cognition is suddenly the most vertically integrated player in agentic coding. Here's what this means for developers.
- [Claude Code Channels: Your AI Agent, Now on Telegram and Discord](https://sdd.sh/2026/03/claude-code-channels-your-ai-agent-now-on-telegram-and-discord.md): Anthropic shipped Claude Code Channels on March 20, letting you message Claude Code directly from Telegram or Discord. The real story isn't convenience — it's the shift from synchronous IDE sessions to asynchronous agent partnerships, and what that means for how you work.
- [Xcode 26.3: Apple Goes All-In on Agentic Coding](https://sdd.sh/2026/03/xcode-26.3-apple-goes-all-in-on-agentic-coding.md): Apple's mid-cycle Xcode 26.3 release isn't a minor patch — it's a bet-the-ecosystem move that bakes Claude Agent and OpenAI Codex directly into the IDE. Here's what changed, what it means for iOS and Mac developers, and why MCP is the most important detail in the release notes.
- [Cursor vs. Copilot vs. Claude Code vs. Windsurf: Which AI Coding Tool Wins in 2026?](https://sdd.sh/2026/03/cursor-vs.-copilot-vs.-claude-code-vs.-windsurf-which-ai-coding-tool-wins-in-2026.md): Four serious contenders, four distinct philosophies. Here's a no-nonsense breakdown of the AI coding tool landscape in 2026 — with real pricing, real benchmarks, and a decision framework that actually helps you choose. Updated May 2 with GitHub Copilot usage-based billing, Cursor SDK and Security Review beta, Claude Code v2.1.119 and $2.5B ARR, and Windsurf's Devin integration roadmap.
- [MCP's 2026 Roadmap: From Prototype Protocol to Production Standard](https://sdd.sh/2026/03/mcps-2026-roadmap-from-prototype-protocol-to-production-standard.md): The MCP 2026 roadmap published by lead maintainer David Soria Parra reveals a protocol growing up fast — shifting from milestone releases to working groups, tackling stateless transport, enterprise auth, and governance maturity. Here's what's actually changing and why it matters for developers building on MCP today.
- [Claude Code March 2026: Voice Mode Isn't the Story](https://sdd.sh/2026/03/claude-code-march-2026-voice-mode-isnt-the-story.md): Voice mode grabbed the headlines. The 64k default output tokens, /loop, MCP elicitation, and --channels are the updates that will actually change how you use Claude Code day to day.