Interop April 7, 2026 · Refreshed May 4, 2026 · 8 min read

Spec Kit + Plan Forge: Write the Spec, Enforce the Build

Director @ Microsoft

Two blacksmiths forging complementary creations — a green glowing seedling and an amber hammer-and-shield — that arc upward and interlock

I'll be honest: Plan Forge exists because of Spec Kit.

Vibe coding was killing me. I'd fire up an AI agent, describe what I wanted, and watch it produce something that looked great for 30 minutes — then slowly disintegrate into a mess of scope drift, forgotten decisions, and "maybe I should start over." I was burning tokens and getting nowhere.

Then I found Spec Kit. And it clicked. Specification-based planning and agentic coding is like peanut butter and jelly — they're good on their own, but together they're something else entirely. Spec Kit taught me that the fix wasn't a better model. It was structure. Define what you want. Plan before you build. Stop letting the agent improvise.

As my ideas grew - as I learned what worked and what still broke even with good specs - Plan Forge started growing legs. What began as "I need more guardrails around my specs" became a full pipeline: scope contracts, auto-loading instruction files, independent review sessions, multi-model consensus. That pipeline kept growing — into a full AI-Native SDLC Forge Shop with four stations (Smelt, Forge, Guard, Learn), 105 MCP tools, and a LiveGuard layer that keeps watching after the build leaves the shop. Now it's laying down rubber in the quarter mile.

But the foundation — the insight that you need to specify before you build — that came from Spec Kit. And the two tools are still genuinely better together than either is alone.

The Problem Both Tools Solve

Without structure, AI coding agents have three failure modes:

Scope drift — ask for a login page, get a login page plus a password reset flow plus an admin panel nobody requested
Architecture improvisation — the agent picks its own patterns, frameworks, and trade-offs without consulting your standards
Context amnesia — by session three, the agent has forgotten every decision from session one

Spec Kit addresses the first by forcing you to specify before you build. Plan Forge addresses all three by enforcing the spec during the build.

In v2.18, Plan Forge added Temper Guards to every instruction file — documented catalogs of the exact shortcuts that cause these three failure modes. When an agent thinks “this endpoint is internal-only, no auth needed” (architecture improvisation) or “I’ll add tests after the feature works” (scope drift into debt), the Temper Guard surfaces a concrete rebuttal. It’s a psychological defense layer on top of the structural gates.

What Each Tool Does Best

	Spec Kit	Plan Forge
Core philosophy	Spec-Driven Development	Full-lifecycle AI shop: Smelt → Forge → Guard → Learn
Primary strength	Defining what to build	Enforcing how it's built — and watching it after ship
Agent support	25+ AI agents natively	7 host agents (Copilot, Claude, Cursor, Codex, Gemini, Windsurf, Generic) + 14 reviewer personas
Community	85K+ stars, 144 contributors, 40+ extensions	Growing, MIT licensed, 9 tech presets, 105 MCP tools, 4 reviewer skills
Key mechanism	Slash commands (`/speckit.specify`, `/speckit.plan`)	Crucible intake + auto-loading guardrails + quorum consensus + LiveGuard runtime watch
Execution model	Spec → Plan → Implement	Smelt (Crucible interview) → Forge (7-step pipeline, gates per slice) → Guard (LiveGuard) → Learn (memory)
Review approach	`/speckit.analyze` after building	Independent reviewer in isolated session (Step 5) + LiveGuard drift / secret / dep scans post-ship
Memory	`memory/constitution.md`	Shared across all four stations: Copilot Memory + session bridge + OpenBrain semantic search + Health DNA

Spec Kit has the bigger ecosystem and broader agent support. Plan Forge goes deeper on runtime enforcement, post-ship watch, and enterprise quality gates — the whole SDLC, not just the build. Both are free. Both are MIT licensed. You genuinely can't go wrong with either.

The Combined Workflow

Here's where it gets interesting. Using both tools together gives you coverage that neither provides alone.

Phase 1: Define with Spec Kit

Start in Spec Kit's territory. Use its slash commands to build a structured specification:

# Set project principles
/speckit.constitution Create principles focused on code quality, testing, and security

# Define the feature
/speckit.specify Build a task management API with authentication,
                 role-based access, and real-time updates via WebSocket

# Generate an execution plan
/speckit.plan Use .NET 9, PostgreSQL, SignalR

# Break into tasks
/speckit.tasks

Spec Kit produces structured artifacts: specs/feature/spec.md, plan.md, tasks.md, and memory/constitution.md. These are well-defined, thorough, and leverage Spec Kit's strength at requirement elicitation.

Phase 2: The Smelt Station Auto-Imports

When you walk into Plan Forge's Smelt station — the Crucible interview that intakes a feature (this is what Step 0 / Specifier grew into) — it scans the project for Spec Kit artifacts. If it finds them, it offers to import directly — no re-specifying needed:

spec.md → maps to Plan Forge's feature specification section
plan.md → becomes the execution contract baseline
tasks.md → maps to execution slices for the Forge station
constitution.md → imports as PROJECT-PRINCIPLES.md

The handoff is seamless. Spec Kit's output becomes the Forge Shop's input.

Phase 3: The Forge Station — Harden and Enforce

This is where Plan Forge first added what Spec Kit doesn't: runtime enforcement during the build.

The imported spec gets hardened into an execution contract with explicit forbidden actions, scope boundaries, and validation gates. Guardrail files auto-load during coding — security rules when editing auth files, database patterns when editing queries, testing standards when editing test files. The agent doesn't need to remember the rules. The rules are injected automatically based on what file is being edited.

Each slice (task) executes with build + test gates. The agent cannot proceed to the next slice until the current one passes. Quorum Mode can fan a slice out to 2–3 models in parallel for high-stakes decisions. And when all slices complete, an independent reviewer — running in a fresh session with zero context from the builder — audits the entire implementation against the original spec.

Phase 4: The Guard and Learn Stations — After Ship

This is the genuinely new ground that didn't exist when this post first ran in April. LiveGuard takes over once the build leaves the Forge: drift scoring against the original spec, secret scanning, dependency watch, regression guards, incident capture, and remote alerts via Slack / Teams / PagerDuty / OpenClaw. The spec doesn't stop mattering at merge — it becomes the baseline LiveGuard measures the shipped system against.

Every finding — every drift event, every triaged bug, every postmortem — flows into the Learn station: a shared memory across all four stations (OpenBrain semantic search, Copilot Memory, session bridge, Health DNA). The next Spec Kit spec you import lands in a Forge Shop that knows what broke last time, what your team's escalation patterns look like, and which models are converging fastest on your stack.

Spec Kit is the architect's blueprint. Plan Forge is the building inspector who won't sign off until every beam is to code — and the property manager who keeps watching after move-in.

Shared Extension Ecosystem

Both tools support reusable extension packages, and the catalog format is compatible. Extensions marked speckit_compatible: true in Plan Forge's catalog work in both tools. The commands are parallel:

Action	Spec Kit	Plan Forge
Browse catalog	`specify extension search`	`pforge ext search`
Get details	`specify extension info <name>`	`pforge ext info <name>`
Install	`specify extension add <name>`	`pforge ext add <name>`

Build a multi-tenancy extension, a compliance extension, or a domain-specific guardrail package — it works in both ecosystems.

When to Use Which

Pick Spec Kit alone if: your team uses many different AI tools (not just VS Code), you want the largest community and extension library, and you prefer a lightweight spec-first methodology you can adopt incrementally. GitHub's backing means rapid iteration and strong long-term support.

Pick Plan Forge alone if: you want guardrails that auto-enforce during coding, runtime watch after ship via LiveGuard, and a shared memory that learns across runs. You get 14 reviewer personas, scope-contract locking, lifecycle hooks, deployment templates, multi-model consensus (Quorum Mode), and a four-station shop covering the whole SDLC.

Use both if: you want the full lifecycle. Spec Kit's requirement elicitation and specification quality is best-in-class. Plan Forge's Smelt → Forge → Guard → Learn shop turns those specs into shipped code, watches the running system against them, and feeds every finding back into memory. Together, you get specs that are well-defined, impossible to drift from during the build, and continuously measured against the running system after ship.

Honest Take

I built Plan Forge, so I'm biased. But I'll say it straight: Spec Kit has the bigger ecosystem, broader agent support, and GitHub's engineering team behind it. If I could only recommend one tool to someone starting out, the answer depends entirely on their pain point.

If your problem is "I don't know what to build" — start with Spec Kit. Its specification workflow is excellent.

If your problem is "I know what to build but the AI keeps drifting" — start with Plan Forge. Its enforcement pipeline (and now the LiveGuard watch layer that runs after ship) is what you need.

If your problem is both — use both. They were designed for this, and the integration has only gotten tighter.

Where to Go Next

If this post made you want to dig deeper, here's where to go:

Shop Tour — the visual walkthrough of all four stations, with the live dashboard, event hub, and how Smelt → Forge → Guard → Learn loops back on itself.
From Impossible to 7 Minutes — the year-long evolution story behind the Forge Shop. Pairs naturally with the Spec Kit origin story above.
The Loop That Never Ends — how the Learn station auto-smelts its own bugs back into new specs. The closed-loop in action.
One Framework, Seven AI Agents — if your team uses Claude, Cursor, Gemini, or Windsurf alongside Copilot, here's how the same Spec Kit handoff works across all of them.
A/B Test: Plan Forge vs Vibe Coding — same model, same task, same time. 60 tests vs 13. 99/100 vs 44/100. The numbers behind the “structure beats vibes” claim.
Spec Kit Interop — Manual Chapter — the full artifact-by-artifact mapping (spec.md → Smelt, tasks.md → Forge slices, constitution.md → agent-constraints, etc.) with field-mapping diagram, import procedure, and shared extension catalog details.

Get Spec Kit → | Get Plan Forge → | Full integration guide →

← Previous: The 80/20 Wall Next: Guardrails Lessons →