Skip to main content
  1. Articles/

Scaling Claude Code Skills Across an Engineering Org

·2506 words·12 mins·

You gave Claude Code to your engineering team. Within a week, someone wrote a brilliant prompt for reviewing PRs. Someone else figured out a great workflow for debugging Kubernetes pods. A third person built a scaffolding command that saves twenty minutes per new component.

None of them know the others exist.

This is the skill fragmentation problem, and it hits every organization that adopts AI coding tools beyond the “individual contributor experimenting” phase. The tools are powerful. The knowledge of how to use them well stays trapped in individual heads, scattered across personal dotfiles and Slack threads that no one will ever search.

A company I’ve been working with — ~40 engineers, 300+ repositories, a mix of React, Angular, Go, Kotlin, and Python — ran into this wall about six months ago. Their solution: a shared skill marketplace for Claude Code. This article is a deep dive into what they built, why they made the choices they did, and what they’d do differently.

The Core Insight: Commands vs. Skills
#

The first architectural decision — and the one that shaped everything else — was separating commands from skills.

  • Commands are actions. They do things: review a PR, scaffold a component, fix CI, create a JIRA issue. They’re procedural, step-by-step, and they call tools.
  • Skills are knowledge. They describe things: how the REST APIs are designed, what the database conventions are, how the frontend architecture works. They’re reference material that Claude loads when it needs context.

This distinction matters because they have completely different lifecycles. Commands change when workflows change. Skills change when the platform changes. Mixing them means you’re constantly updating action-oriented prompts because someone renamed a database column, or vice versa.

In practice, a command for reviewing a PR references skills about frontend conventions, architecture patterns, and security standards — but doesn’t duplicate that knowledge. When the frontend team migrates from one pattern to another, they update the skill once and every command that references it gets the new context automatically.

plugins/
  tech/
    commands/
      review.md          # Action: review a PR
      fix-ci.md          # Action: fix CI failures
      new-component.md   # Action: scaffold a React component
    skills/
      react/
        SKILL.md          # Knowledge: React conventions
        components.md     # Knowledge: component patterns
        state.md          # Knowledge: state management
      architecture/
        SKILL.md          # Knowledge: system architecture
        services.md       # Knowledge: service dependencies

The Plugin Topology: Organize by Audience, Not Technology
#

The team tried organizing by technology first. It was a disaster. A “kubernetes” plugin and a “golang” plugin and a “react” plugin meant that when someone needed to deploy a Go service to Kubernetes, they needed three plugins loaded. Context windows aren’t free.

The topology that worked: organize by audience.

  • tech — Everything a developer needs day-to-day. Code review, CI, scaffolding, language conventions, architecture, infrastructure, observability. This is the big one: 26 commands, 19 skill categories.
  • product — Domain knowledge. Data models, business modules, legacy system schemas, analytics platforms. Product managers and domain-adjacent engineers use these.
  • setup — Onboarding and environment configuration. New engineer joins? Four commands get them from zero to productive.
  • process — Cross-functional workflows. User offboarding, bulk operations, things that touch multiple systems.

The key metric: how many plugins does a person need for their typical workday? For most developers, the answer is one (tech) plus maybe one more. That’s a manageable cognitive load and a reasonable context window footprint.

Naming: The scope:verb-target Convention
#

With 30+ commands across four plugins, discoverability becomes a real problem. The team settled on a strict naming convention:

/<scope>:<verb>-<target>

Examples:

  • /tech:review — review code
  • /tech:fix-ci — fix CI failures
  • /tech:new-component — scaffold a new component
  • /tech:create-pr — create a pull request
  • /tech:diagnose-pod — diagnose a crashed pod
  • /setup:global — set up the global dev environment
  • /process:reassign-user — reassign a departing user’s work

The scope prefix is the plugin name. The verb is always imperative. The target is what you’re acting on. This means that even if you’ve never seen the command before, /tech:fix-ci is self-explanatory. You can guess that /tech:new-hook exists if /tech:new-component does.

Bad naming they rejected:

  • review-frontend-pr (too specific — what about backend?)
  • kubernetes-pod-crash-debug (too long, noun-oriented)
  • fix (too vague — fix what?)

The JIRA Workflow: Sequential Gates That Prevent Garbage
#

The most opinionated part of the marketplace is a four-stage workflow for going from JIRA issue to merged code:

/tech:validate  →  /tech:decompose  →  /tech:spec  →  /tech:implement

Each stage is a gate. You can’t skip ahead.

1. Validate checks that the issue is complete, unambiguous, and in English. It looks for acceptance criteria, edge cases, and clear scope. If the issue says “improve the dashboard” with no further detail, it gets rejected with specific feedback on what’s missing.

2. Decompose takes a validated Story and breaks it into implementable Tasks — backend, frontend, infrastructure. Each task gets a structured requirements template (FR-001 for functional requirements, SC-001 for scenarios). This is where a vague “add export feature” becomes three concrete tasks with specific acceptance criteria.

3. Spec creates a local specification file in the target repository (specs/{ISSUE-KEY}.md). This is the spec-driven development part: before writing any code, there’s a committed, reviewable document that describes exactly what will be built, how it fits into the existing architecture, and what the verification criteria are.

4. Implement executes the spec with mandatory verification gates. It doesn’t just write code — it validates that the spec exists, checks that tests pass, verifies that the implementation matches the requirements, and produces a structured checklist.

The key insight: each gate catches a different class of error. Validate catches ambiguity. Decompose catches scope creep. Spec catches architectural mismatches. Implement catches bugs. By the time code is written, three layers of AI-assisted review have already happened.

Does this slow things down? For trivial changes, yes — and that’s fine. Not everything needs the full pipeline. But for any feature that touches multiple systems or takes more than a day, the upfront investment in validation and decomposition pays for itself ten times over in avoided rework.

Skills as a Knowledge Base: Versioning What Claude Knows
#

Here’s something that isn’t obvious until you try to scale AI tools across an org: Claude’s effectiveness is directly proportional to the quality of the context it has about your specific platform.

Generic Claude knows React. It doesn’t know your React — your component patterns, your state management conventions, your design system API, your import ordering rules. Every engineer was re-explaining these things in every conversation.

Skills solve this by codifying platform knowledge into versioned, structured documents that Claude loads automatically when relevant. The tech plugin alone has 19 skill categories:

  • Language conventions: React, Angular, Go, Kotlin, Python — not generic language guides, but conventions specific to the platform. Component structure. Testing patterns. Error handling approach.
  • Architecture: Service dependency map, communication patterns, data sync via message queues, BFF removal strategy.
  • Infrastructure: Helm chart conventions, Kubernetes cluster topology, CI/CD pipeline structure, observability setup.
  • Process: Git branching conventions, PR title format, release workflow, environment management.

Each skill is a SKILL.md file with YAML frontmatter (for description/metadata) and @file.md references to supporting documents:

---
description: React/TypeScript development guidelines for client-* repositories.
---

@components.md
@state-management.md
@testing.md
@styling.md
@imports.md

When Claude encounters a React file in one of the company’s repositories, it loads the React skill and immediately knows that the team uses SCSS modules (not Tailwind), React Query for server state (not Redux), React Aria for accessibility primitives, and a specific 10-group import ordering. No engineer has to explain this. No prompt has to include it.

The compounding effect: once you have good skills, every command that references them gets better automatically. The code review command doesn’t need its own copy of “how do we do React” — it references the React skill. When the frontend team updates the skill (say, adopting a new pattern), every downstream command benefits immediately.

Code Review: Five Specialized Lenses
#

Instead of one monolithic “review this PR” command, the team built five specialized review commands that each look at the code through a different lens:

  1. /tech:review — Full-spectrum review: quality, architecture, security, performance, conventions
  2. /tech:check-arch — Architectural debt detection: provider sprawl, raw HTTP calls vs typed clients, state management anti-patterns
  3. /tech:check-security — Security audit: XSS vectors, secrets in code, unsafe third-party scripts
  4. /tech:check-a11y — Accessibility: semantic HTML, ARIA usage, keyboard navigation, visual compliance
  5. /tech:check-perf — Performance: bundle size impact, rendering efficiency, data fetching patterns

Why five commands instead of one? Because context window budget matters. A full review loads all the skills. An accessibility check only needs the a11y standards and component conventions. Splitting them means faster, more focused reviews with less noise.

The architecture check (/tech:check-arch) is particularly interesting because it encodes ongoing migration knowledge. It knows that the team is migrating away from certain patterns, so it flags code that uses the old approach even if it’s technically correct. This is institutional knowledge that would otherwise live only in senior engineers’ heads.

Scaffolding: Codegen That Knows Your Codebase
#

Five scaffolding commands generate new code that matches existing conventions:

  • /tech:new-component — React component + SCSS module + Storybook story + test file
  • /tech:new-hook — Custom React hook with tests
  • /tech:new-page — Page component with route registration
  • /tech:new-slice — Redux Toolkit slice with typed actions and tests
  • /tech:new-ds-component — Design system component (for the shared library)

The crucial detail: these aren’t templates. They’re Claude Code commands that read the existing codebase to match the current style. If the last five components use a certain pattern, the scaffolded component follows that pattern. If the project recently migrated to a new approach, the scaffold picks it up because it reads real files, not a frozen template.

This is where the skill system shines. The /tech:new-component command references the React skill, which describes the intended architecture. So even if the codebase has a mix of old and new patterns (every real codebase does), the scaffold generates the correct new pattern, not a copy of the nearest existing file.

Onboarding: Zero to Productive in Four Commands
#

The setup plugin is deceptively simple — four commands — but it encodes months of “hey, how do I set up my environment?” Slack conversations:

/setup:global     # Git, Python, Go, workspace, credentials, env vars
/setup:backend    # Java/Kotlin SDK, AWS, build tools, database access
/setup:claude     # Register the marketplace, install plugins
/setup:check      # Verify everything works

Each command is idempotent: run it again and it skips what’s already done. Each command checks prerequisites before proceeding. Each command explains what it’s doing and why.

The /setup:check command is the most valuable — it’s a diagnostic that verifies every tool, credential, and configuration is correct. New engineer’s build fails? /setup:check tells them exactly what’s wrong and how to fix it.

Before these commands existed, onboarding took 1-2 days with significant hand-holding. Now it takes about an hour, mostly waiting for downloads.

Distribution: One Source, Two Channels
#

The team wanted the same knowledge base available in two contexts:

  1. Claude Code (in the terminal, with tool access)
  2. Claude.ai (in the browser, for conversations and planning)

Claude Code uses the plugin system natively — the marketplace registers via settings.json and plugins are loaded on demand. But Claude.ai uses a different skill format (zip files uploaded to the admin console).

A build script (sync-skills.sh) bridges the gap: it reads the plugin skills, inlines the @file.md references into a single document, zips each skill, and outputs them to a dist/ folder for upload. Same knowledge, different packaging.

# Convert Claude Code skills → Claude API skill zips
./scripts/sync-skills.sh
# Output: dist/skills/tech-react.zip, dist/skills/tech-architecture.zip, ...

This means a product manager can ask Claude.ai about the data model and get the same quality answer that a developer gets in Claude Code, because they’re drawing from the same skill.

Quality Enforcement: Hooks and Gates
#

Two mechanisms prevent the marketplace from rotting:

Pre-commit hook: If someone changes any plugin file without running /refresh (which regenerates the README, updates marketplace.json versions, and validates the structure), the commit is rejected. This ensures the catalog is always in sync with reality.

Plugin validation: Before merging a new command or skill, a validation step checks:

  • File structure matches conventions
  • YAML frontmatter is valid
  • Referenced files exist
  • Naming follows scope:verb-target
  • Description is present and meaningful

The /refresh command is itself a Claude Code command — it reads all plugins, regenerates the README catalog, updates version numbers, and validates the marketplace manifest. Dog-fooding at its finest.

What They Got Wrong
#

Over-engineering the JIRA workflow initially. The first version had six stages instead of four, including a separate “estimation” step and a “design review” step. Engineers bypassed them constantly. The team cut it to four stages that each provide clear, immediate value.

Not investing in skills early enough. They built commands first and skills second. This meant early commands had platform knowledge baked directly into their prompts — duplicated, inconsistent, and hard to update. The refactor to extract skills was painful but transformative.

Underestimating the maintenance burden. 26 commands and 19 skill categories is a lot of surface area. When the platform changes (new service, new convention, deprecated pattern), multiple skills may need updates. The team is considering automated staleness detection — diffing skills against actual codebase patterns — but hasn’t built it yet.

Making the tech plugin too big. 26 commands in one plugin means a fat manifest. They should have split it into tech-review, tech-scaffold, tech-workflow, and tech-ops from the start. Restructuring now would break everyone’s muscle memory, so they live with it.

The Numbers
#

Six months in:

  • 4 plugins, 31 commands, 25 skill categories
  • ~40 engineers using the marketplace daily
  • 300+ repositories with marketplace integration via shared settings
  • Onboarding time: ~2 days → ~1 hour
  • Code review coverage: from “when seniors have time” to “every PR, five dimensions”
  • Scaffolding consistency: no more “which component do I copy from?”

The hardest metric to quantify but the most impactful: reduction in repeated questions. When Claude knows the platform conventions, engineers stop asking each other “how do we do X here?” They ask Claude, and Claude gives a correct, consistent answer because it’s drawing from the same curated knowledge base.

How to Start
#

You don’t need 31 commands. Start with three:

  1. One review command that encodes your team’s code review standards. This gives you immediate, daily value and forces you to articulate what “good code” means in your context.

  2. One skill document that describes your most common technology conventions (React patterns, API design, database conventions — whatever your team argues about most). This forces you to separate knowledge from actions.

  3. One setup command that automates whatever new engineers always get wrong. This gives you a forcing function to document tribal knowledge.

Build these three, use them for a month, and you’ll see exactly where to expand next. The architecture — plugins, scoped naming, commands vs. skills — can come later. The important thing is to stop letting AI knowledge fragment across individual engineers and start treating it as shared infrastructure.

Because that’s ultimately what this is: infrastructure for AI-assisted development. Not the code. Not the prompts. The knowledge layer that makes both useful at organizational scale.

Related