Skip to main content
  1. Articles/

OpenAI Builds Its Own Chip — With Help From Its Own AI

·977 words·5 mins·
Author
Florent Clairambault
CTO & Software engineer

OpenAI Builds Its Own Chip — With Help From Its Own AI

OpenAI spent approximately $14 billion serving ChatGPT in 2025. The inference bill for running frontier AI at scale is the single largest constraint on the economics of every AI product — and until now, every major AI lab has been entirely dependent on Nvidia GPUs to pay it.

On June 24, 2026, OpenAI and Broadcom changed that.

Jalapeño: the basics
#

Jalapeño is OpenAI’s first custom inference ASIC, co-developed with Broadcom and manufactured by TSMC on a 3nm process. The key specifications:

  • Architecture: Reticle-sized (the maximum die area the process node allows), systolic array design optimized for transformer attention patterns, 8 HBM stacks
  • Target performance: On par with Nvidia Blackwell on inference throughput
  • Target cost: ~50% lower cost per token versus current Nvidia GPUs (Broadcom CEO Hock Tan, via Bloomberg)
  • Development timeline: 9 months — the fastest development cycle for an advanced AI chip on record
  • Deployment: Small prototype deployments by end of 2026; full production ramp 2027-2028

The name fits the pattern: GPUs are general-purpose; Jalapeño is narrow, specialized, and hot.

The AI-designed-by-AI angle
#

The detail that stands out from OpenAI’s announcement: Greg Brockman confirmed that OpenAI’s own AI models assisted in developing Jalapeño. Specifically, they were used to accelerate parts of the chip design process — logic simulation, layout verification, and RTL generation — that normally take teams of chip engineers months to complete.

This is not unprecedented. Google has used ML models in TPU design since the v4 generation. But Jalapeño marks the first time a frontier AI lab has publicly confirmed using its own production AI to help design the hardware that will run that same AI.

The loop: GPT-5.x assisted in designing Jalapeño; Jalapeño will run GPT-6.x at 50% lower cost. That’s the economics of vertical integration compounding at the hardware layer.

For context: Anthropic covered the “AI building itself” angle at the software level last week — Claude generates 80% of production-merged code at Anthropic. Jalapeño represents OpenAI making the same claim one layer down, at the hardware level. Both moves point toward the same competitive dynamic: the labs that can turn their own models into productive engineering resources gain a structural cost advantage that pure-compute spending can’t match.

Why the economics matter
#

OpenAI’s $14B 2025 inference bill is the headline, but the more relevant number is the inference cost per useful output. Every customer-facing AI product — Claude Code, GitHub Copilot, ChatGPT — is in a race between capability improvements (which increase value per token) and cost compression (which expand the market by making each token cheaper to generate).

If Jalapeño delivers its 50% cost reduction, the arithmetic is significant: OpenAI’s inference margin improves substantially, and the headroom to cut API prices increases. That has direct implications for every AI coding tool that competes on price — including the Sonnet 4.6-tier products that power mid-market developer tools.

It also rewrites the Nvidia dependency story. Every major AI lab is currently exposed to Nvidia supply constraints and pricing power. Custom silicon — whether Google TPUs, AWS Trainium, or now Jalapeño — is the escape route. The difference with OpenAI is that it’s a pure AI research lab, not a hyperscaler, building its own silicon. That’s a level of vertical ambition that was uncommon 18 months ago.

Competitive context: where Anthropic stands
#

Anthropic has not announced a custom inference chip program, though an Anthropic own-chip story has been anticipated given the company’s infrastructure investments. Anthropic’s current compute strategy is heavily diversified across cloud providers: the $25B Amazon deal (Trainium3), the Akamai $1.8B edge inference contract, and the SpaceX Colossus lease (300MW, 220K+ Nvidia GPUs). That breadth mitigates supply risk without requiring chip design capacity.

OpenAI’s approach is different: a single custom ASIC designed to maximize inference efficiency at OpenAI’s specific workload profile. Whether that single-bet architecture outperforms Anthropic’s diversified multi-cloud strategy depends on execution quality and timing.

The 2027-2028 full production ramp is the relevant window. By then, Anthropic’s Trainium3 commitment with Amazon will be in full deployment, and the Akamai edge inference network will be at scale. The race isn’t who announces custom silicon first — it’s who gets to competitive-cost inference at production scale, and when.

What this means for developer tools
#

For anyone running Claude Code, GitHub Copilot, or any AI coding tool at scale, inference cost is a hidden multiplier on productivity economics. A 50% reduction in inference cost at the model provider level doesn’t automatically flow through to subscription prices — but it does expand the economic headroom for the features that currently get rate-limited.

The Claude Code compute cost behind an /ultrareview session, a Dynamic Workflows fan-out across hundreds of parallel subagents, or a five-level nested sub-agent tree is substantial. The $965B valuation Anthropic commanded in May reflects, in part, investor belief that inference costs will compress faster than revenue. Jalapeño advances that thesis for OpenAI; Trainium3 and the Akamai deal advance it for Anthropic.

The irony is that as inference gets cheaper, the differentiation between tools moves entirely to the agentic layer — orchestration quality, reliability, latency, and the depth of the tool ecosystem. And that’s where Anthropic’s architectural choices, made years before custom silicon was on the agenda, compound most aggressively.

Bottom line
#

Jalapeño is real, the cost numbers are significant, and the 9-month development timeline using AI assistance is the most interesting detail in the announcement. OpenAI is about to have a meaningfully different cost structure for its inference workloads than it does today.

The lab that can make its own hardware cheaper and use its own AI to get there faster is playing a different game than one that just buys more Nvidia. Both Anthropic and OpenAI are moving in this direction — OpenAI more explicitly, Anthropic more through infrastructure partnerships. The 2027-2028 window will show which approach compound faster.


Sources: OpenAI announcement; TechCrunch; Tom’s Hardware; VentureBeat

Related