One workshop. Four Stations. Every phase of your software lifecycle — Smelt, Forge, Guard, Learn — under one roof, AI-run, product-owner-supervised.
Plan Forge is the orchestration harness that sits on top of GitHub Copilot (and other AI coding tools). It does not replace your model or your IDE — it adds the SDLC layer GitHub deliberately leaves to the ecosystem: planning, validation gates, memory, cost control, and reviewer separation.
It is also licensed MIT because your SDLC is yours, and your institutional memory lives in OpenBrain, a user-owned service, because your accumulated decisions should not be trapped inside any one AI vendor.
Three closed loops, not one. The Forge builds, Forge-Master supervises, and the Learn station feeds every finding back into the next plan — agents supervising agents, loops that run unattended for weeks and learn from every pass. Every discipline of a 20-person engineering team (architecture, security, QA, SRE, engineering management, continuous improvement) is encoded as agents and guardrails governed by 40 years of software engineering practice. You own the spec, the direction, and the final acceptance test. The shop owns everything in between.
Every station is a phase of the software lifecycle. Every station is run by AI, watched by you, and wired into the next.
Station 1 · Smelt
Raw idea → refined spec.
Every plan starts as ore. The Smelt station interviews you, extracts the real requirements, flags hidden assumptions, and pours out a Scope Contract the AI can't argue with.
▸ Output: a file the Forge station can execute without follow-up questions
Station 2 · Forge
Spec → shipped code.
The anvil. Plans are broken into slices and struck one at a time — each a commit, each with its own gate. The hammer doesn't swing twice on the same slice unless tests pass.
▸ Output: green tests, green CI, green ledger — or an honest stop
Station 3 · Guard
Standing watch over the floor.
The watchtower. LiveGuard scans secrets, env drift, CVE surface, and regression risk — before deploy, after every slice, every hour. Incidents become triaged fix plans, not pager calls at 3 AM.
The brain above the bench. Plan Forge remembers every incident, every fix, every review — and feeds it back into the next session as OpenBrain memories, bug-registry entries, and testbed findings. The shop gets sharper the longer you run it.
Four stations don't run themselves. Forge-Master and the Inner Loop walk the floor — rerouting work when a slice fails, retrying with what they learned, and feeding every outcome back into Smelt as the next ore.
This is what turns a 7-step pipeline into a closed loop: every failed test becomes a Crucible idea, every regression becomes a slice, every postmortem becomes a memory. The shop sharpens itself.
⟲Reflexion retries — when a slice fails, the worker re-reads the failure, drafts a fix, and tries again with bounded attempts.
📜Trajectories & postmortems — every attempt is recorded; failures auto-file meta-bugs against Plan Forge itself.
🛡️Adaptive gates — gates that catch real bugs get promoted; brittle ones get rewritten.
🧭Forge-Master — the read-only foreman: classifies your intent, pulls memory, and orchestrates tool calls so you don't chain them by hand.
🌐Cross-project federation — lessons from one repo become guardrails in the next. Opt-in, Dashboard-configurable.
The foreman walks the floor — Forge-Master orchestrates, the Inner Loop closes the cycle.
The Problem
AI Agents Drift Without Guardrails
Fast isn't the same as good. Without structure, AI-generated code tends to be untestable, insecure, and impossible to maintain at scale.
🔀
Silent Scope Creep
"I'll also add..." — features you never asked for, silently introduced. You find out at code review.
🎲
Undiscussed Decisions
Database pattern, framework choice, architecture — picked without asking. You discover the consequences too late.
🚫
Skipped Validation
Ships code that doesn't build or breaks tests. "It should work" is not a quality gate for production.
🧠
No Memory
Every session starts from zero. You re-explain the same architecture decisions to a blank-slate AI every single time.
"A blacksmith doesn't hand raw iron to a customer. They heat it, hammer it, and temper it until it holds its edge."
Plan Forge does the same for your AI-generated software.
The Solution
The 7-Step Hardening Pipeline
A structured pipeline across 4 focused sessions that converts rough feature ideas into shipped, reviewed, production-grade code.
Deterministic by default — with a reflective inner loop (retries with reflexion context, trajectories, auto-skills, postmortems) that turns every slice into a research step. The ten opt-in subsystems compose into a self-deterministic agent loop.
🪨Smelt· idea → scope contract
🔨Forge· code → no TODOs
🛡️Guard & Ship· audit → commit
🪨 Smelt→🔨 Forge→🛡️ Guard & Ship
0
Specify
What & why
Session 1
1
Pre-flight
Verify setup
Session 1
2
Harden
Scope contract
Session 1
3
Execute
Slice by slice
Session 2
4
Sweep
No TODOs left
Session 2
5
Review
Drift detection
Session 3
6
Ship
Commit & close
Session 4
⚡Forge-Master· one advisor, three touchpoints
Intake · funnel ideas into Smelt
Inside the slice · reasons during Execute
Ops & advice · post-ship Q&A
3✦
Execute — with an inner loop⟳
Every slice is a research step. When a gate fails, the loop closes instead of the run:
Postmortems, trajectories, decisions, drift scores — all indexed. The next Smelt starts with what this run learned.
↺feeds next Smelt
The 7 steps above are the floor plan. Forge-Master walks the floor, the inner loop lives inside the anvil, and Learn is the shop's memory.
Full diagram →
💡 Why 4 Separate Sessions Instead of One Long Chat?
Most people use AI as a single long conversation. Plan Forge breaks that pattern on purpose — because the builder shouldn't grade its own exam.
Single Session (chatbot way)
• Context exhausts halfway through
• Same AI reviews its own work
• Scope creep goes unnoticed
• Failure = restart the whole conversation
Plan Forge (4 sessions)
• Fresh context for each role
• Independent reviewer catches blind spots
• Scope drift detected immediately
• Failure = re-run only the failed session
🔒
Scope Locked in Session 1
The execution contract is created and locked. Nothing can be added mid-build without restarting the contract. Drift becomes structurally impossible.
🔍
Independent Audit Session
The reviewer runs in a completely separate session with fresh context. The executor can't self-audit — that's like grading your own exam.
✅
Validated at Every Slice
Build and test must pass at every slice boundary before moving to the next. No shipping code that doesn't compile or that breaks existing tests.
Three Ways to Run the Same Pipeline
🖱️
Pipeline Agents
Click-through flow with handoff buttons. Smoothest experience. Context carries automatically.
VS Code + GitHub Copilot
📎
Prompt Templates
Attach step files in Copilot Chat. See exactly what each step does. Best for learning the pipeline.
VS Code + GitHub Copilot
📋
Copy-Paste Prompts
Copy prompts from the runbook. Works in any AI tool — Claude, Cursor, ChatGPT, terminal agents.
Any AI Tool
Features
Enterprise-Grade by Default
Guardrails, specialized reviewers, autonomous execution, cost tracking, and a live dashboard — every project gets the full suite.
🛡️
Two-Layer Guardrails
Automatic protection at two levels. The baseline ships with every preset. The project profile is generated per-project via an interview prompt.
Layer 1 — Universal Baseline
Architecture principles, OWASP security, TDD, error handling, type safety, async patterns. Ships automatically. No configuration needed.
Layer 2 — Project Profile
Coverage targets, latency SLAs, compliance requirements, domain-specific rules — generated from a one-time interview and stored in the repo.
🤖
Specialized Reviewer Agents
Dedicated AI reviewer personas — each focused on one concern, checking it deeply.
One command runs an entire hardened plan — spawning AI workers, validating at every slice boundary, tracking cost, and reporting results. Monitor everything in a live dashboard.
Run Plans
pforge run-plan <plan> — automatic execution with DAG scheduling, validation gates, token tracking. Full Auto or Assisted modes.
Instruction files load automatically based on the file being edited. Editing SQL? Database guardrails load. Editing auth? Security guardrails load. No action needed.
🗳️
Quorum Mode
Multi-model consensus. Dispatch each slice to 3 AI models in parallel for independent dry-run analysis, synthesize the best approach, then execute with higher confidence.
→--quorum=auto triggers only on complex slices (score ≥ 6)
→ Complexity scored by 7 weighted signals (scope, deps, security, DB keywords...)
→ A/B tested: +20% more tests, extracted DRY helpers, better edge case coverage
Closed-loop bug discovery. The audit drain probes live routes, triages findings into bug/spec/classifier lanes, and iterates until convergence. Runs post-plan or on demand.
→pforge audit-loop — manual one-shot drain
→audit.mode: "auto" — threshold-gated, default off
→ Production always forbidden — dev and staging only
🧭
Forge-Master Studio v2.63+
A read-only reasoning orchestrator with its own dashboard. Classifies intent, pulls OpenBrain memory, and chains read-only forge tools on your behalf — so you can ask open-ended questions instead of wiring tool calls by hand.
→ Studio tab in the main dashboard — prompt gallery, streaming chat, live tool-call trace
→forge_master_ask MCP tool for agents that want one-shot reasoning
→pforge forge-master status|logs CLI for scripts + health checks
15 prompt templates for generating consistent code patterns — entities, services, controllers, tests, DTOs, workers, and more — all following your project's architecture.
🔗
Lifecycle Hooks
Hooks run automatically at agent lifecycle points — blocking edits to forbidden paths, warning on TODO markers, auto-formatting, and alerting when code ships without tests.
☁️
Cloud Agent Ready
Works with the Copilot cloud agent out of the box. Add copilot-setup-steps.yml to provision guardrails, MCP tools, and validation gates before the agent writes a single line.
Instruction files auto-load identically in cloud + local
106 MCP tools wired via .vscode/mcp.json
Complements CodeQL, secret scanning, and Copilot code review
🔨
The Smith
Run pforge smith to inspect your forge before you build. Diagnoses your environment, VS Code configuration, setup health, version currency, and common problems — with actionable fix suggestions for every issue.
✅git 2.44.0
✅22 instruction files (expected: ≥18)
❌chat.promptFiles not set
FIX: Add to .vscode/settings.json
🔄
CI Validation Action
Drop one line into your GitHub workflow to validate plans on every PR — setup health, file counts, placeholders, orphans, plan artifacts, and code sweep.
- uses: srnichols/plan-forge-validate@v1
🌱
Spec Kit Compatible
Auto-imports Spec Kit specs, plans, and constitutions. Shared extension catalog. Write specs with Spec Kit, enforce them with Plan Forge.
106 MCP tools (Core + LiveGuard + Watcher + Crucible + Tempering + Bug Registry + Testbed + Forge-Master + Lattice + Sync + Memory + Notify + Doctor) — run plans autonomously with forge_run_plan, multi-model analysis with forge_analyze --quorum and forge_diagnose, full health check with forge_liveguard_run, query the code graph with forge_lattice_query, sync agent memory with forge_sync_memories, generate images with forge_generate_image, track costs with forge_cost_report, monitor in real-time via WebSocket hub + dashboard at localhost:3100/dashboard. Full Auto, Assisted, and Quorum execution modes. xAI Grok + OpenAI support via API provider registry.
When a slice fails, the orchestrator automatically walks the escalationChain and retries on the next model — no manual intervention. Historical performance data in model-performance.json drives automatic model selection for best quality/cost ratio.
Auto-Escalation
Slice fails → walks escalationChain → retries on next model. Emits slice-escalated event.
Model Performance
model-performance.json tracks success rate, cost, duration. Auto-selects cheapest model with >80% pass rate.
Smart Estimate
--estimate shows recommended model per slice with historical success rate before you run.
🔄
Auto-Update + Dual-Publish
pforge smith checks GitHub for newer releases (24 h cache, silent offline). pforge ext publish outputs both a Plan Forge and a Spec Kit-compatible catalog entry in one command.
⚠ New version available — run pforge update
✓ Extensions dual-published to Plan Forge + Spec Kit catalogs
Wait — What's the Difference Between All These File Types?
Plan Forge sessions end when the code ships. LiveGuard watches everything after — drift, secrets, dependencies, regressions, incidents. 14 post-coding tools that detect, respond, and learn automatically.
🔍
Detect
Continuously score your codebase against architecture guardrail rules. Catches empty-catch blocks, unsafe types, sync-over-async, SQL injection, and deferred work markers.
Violations don't just get reported — they trigger automatic incident capture, fix proposal generation, and regression verification. One call does it all.
▸ Auto-chain: drift → incident → fix proposal
▸ Fix plans include code snippets with ▸▸▸ violation markers
▸ Auto-resolve incidents when regression guard passes
▸ MTTR tracking from capture to resolution
▸ Forbidden file enforcement with exit code 1
🧠
Learn
Every finding is captured. The system tunes itself — escalation chains reorder by success rate, cost estimates calibrate from actuals, quorum thresholds adapt. The forge gets smarter every run.
▸ Auto-tune escalation chain by model success rate
▸ Cost estimates calibrate from historical actuals
▸ Adaptive quorum threshold (self-tuning token spend)
▸ Recurring incident detection + auto-escalation
▸ Project Health DNA fingerprint for decay detection
forge_liveguard_runOne call replaces 8
Full health check in a single MCP tool call. Runs drift scan, sweep, secret scan, regression guard, dep watch, alert triage, and health trend — returns a unified status.
Every decision you make, every pattern you establish, every lesson learned — captured once and searchable forever. Across sessions, across AI tools, across projects.
1
Capture During Execution
Plan Forge agents automatically capture architecture decisions, patterns, and postmortems as they work — tagged by project, phase, and slice. The Shipper (Step 6) batch-captures lessons learned after every feature.
2
Search by Meaning, Not Keywords
Semantic vector search understands intent. Ask "what did we decide about caching?" and find it — even if you never used exactly those words in the original thought.
3
Any Tool, Any Session, Any Project
Works with GitHub Copilot, Claude, Cursor, ChatGPT, Gemini, Windsurf, and 9+ more via the MCP protocol. Captured in one session, retrieved in the next — no re-explaining.
💾 captured: "Decision: Using PostgreSQL
row-level security over app-level tenant
filtering. RLS scales better at query time. project: saas-platform · phase-2-slice-4"
# 4 months later, starting a new project:
You: "How did we handle multi-tenancy last time?"
✅ Found (94% match):
"Decision: PostgreSQL row-level security
over app-level filtering. RLS scales better." Source: plan-forge-phase-2-slice-4 · 4 months ago
# The AI already knows your standards.
Agent: "Applying RLS pattern from saas-platform
decision — consistent with your architecture."
Unified System Architecture
Plan Forge Is One Piece of a Bigger Picture
Combine Plan Forge with OpenBrain and OpenClaw to build a closed-loop development system — where AI agents plan, build, remember, and communicate across every surface you use.
✅ Plan Forge works fully standalone — these are optional power-ups, not requirements.
🔨
Plan Forge
The Blueprint
What to build, how to build it, and when to stop. Guardrails, pipeline, and execution contracts.
9 pre-configured guardrail packs (8 app + 1 IaC) for the most common enterprise stacks. One setup command installs the right instruction files, agents, skills, and templates.
.NET / C#
ASP.NET Core, Blazor, Entity Framework, xUnit. 22 instruction files.
.\setup.ps1 -Preset dotnet
TypeScript
React, Node.js, Express, Vitest, pnpm. Includes frontend instructions.
Bicep, Terraform, PowerShell, azd. Full CAF + WAF + Landing Zone guardrails.
.\setup.ps1 -Preset azure-iac
9 presets (8 app + 1 IaC). All presets also install 8 cross-stack agents: API contracts, accessibility, multi-tenancy, CI/CD, observability, dependency, compliance, and error handling.
Quick Start
Up and Running in Minutes
One setup script bootstraps your entire guardrails stack — instruction files, agents, skills, hooks, and Copilot config.
✅ VS Code✅ GitHub Copilot✅ GitThat's it — no other dependencies
⚡ EASY BUTTON
Paste one prompt — your AI installs everything automatically
Open Copilot Chat (Agent Mode), Claude Code, or Cursor in your project — paste this prompt. The AI reads the setup guide, detects your stack, installs everything, customizes the files, and validates. Zero manual steps.
Paste into any AI chat — Copilot, Claude, Cursor
Clone https://github.com/srnichols/plan-forge into a temporary directory. Read its AGENT-SETUP.md file completely and follow the instructions exactly:
1. Scan THIS project's root directory and auto-detect the tech stack from marker files (*.csproj = dotnet, go.mod = go, package.json + tsconfig.json = typescript, pom.xml = java, pyproject.toml = python, *.bicep = azure-iac). If multiple stacks exist, combine them.
2. Detect which AI tool is running this prompt and set the -Agent flag:
- GitHub Copilot → -Agent copilot (default, can omit)
- Claude Code → -Agent claude
- Cursor → -Agent cursor
- Codex CLI → -Agent codex
- Not sure → -Agent all (installs all agent formats)
3. Run the Plan Forge setup script non-interactively:
.\setup.ps1 -Preset <detected> -Agent <detected-agent> -ProjectPath "." -ProjectName "<this folder name>" -Force
4. After setup completes, customize the generated files:
- Edit .github/copilot-instructions.md with this project's actual name, tech stack, build/test/lint commands, and architecture
- If CLAUDE.md was generated, verify it looks correct
- Edit docs/plans/DEPLOYMENT-ROADMAP.md with a first phase placeholder
5. Run .\pforge.ps1 smith to inspect the forge and confirm all checks pass.
6. If pforge-mcp/server.mjs was installed, run: cd pforge-mcp && npm install (activates 69 forge MCP tools).
7. If specs/ or memory/constitution.md exist (Spec Kit project), note that Step 0 will auto-detect and offer to import them.
8. Call forge_capabilities to verify all tools are available and discover workflows, config options, and OpenBrain memory integration.
9. Show me a summary of what was installed and any issues found.
Works with GitHub Copilot (Agent Mode), Claude Code, Cursor, or any AI tool with terminal access.
or do it manually
1
Use as a GitHub Template
Terminal
# Click "Use this Template" on GitHub, or clone directly:
git clone https://github.com/srnichols/plan-forge.git my-project-plans
cd my-project-plans
2
Run the Setup Wizard
PowerShell / Bash
# Interactive wizard picks your stack (setup.sh for macOS/Linux):
.\setup.ps1
# Or specify directly (PowerShell / Bash):
.\setup.ps1 -Preset dotnet # .NET / C#
.\setup.ps1 -Preset typescript # TypeScript / React
.\setup.ps1 -Preset python # Python / FastAPI
.\setup.ps1 -Preset java # Java / Spring Boot
.\setup.ps1 -Preset go # Go / Chi / Gin
.\setup.ps1 -Preset swift # Swift / SwiftUI
.\setup.ps1 -Preset rust # Rust / Axum
.\setup.ps1 -Preset php # PHP / Laravel
.\setup.ps1 -Preset azure-iac # Azure Bicep / Terraform / azd# Add support for other AI agents (optional):
.\setup.ps1 -Preset dotnet -Agent claude # + Claude Code
.\setup.ps1 -Preset dotnet -Agent all # + Claude + Cursor + Codex + Gemini + Windsurf + Generic
Installs instruction files, agents, skills, lifecycle hooks, and copilot-instructions.md into your project. Add -Agent to generate native files for Claude, Cursor, Codex, Gemini, Windsurf, or Generic.
3
Describe Your Feature — Your AI Does the Rest
VS Code + Copilot: Agent Mode → select Specifier from the agent picker. Claude Code: invoke /planforge-step0-specify-feature. Cursor: run the planforge.step0-specify-feature command. Type one sentence — the pipeline runs from there.
Copilot Chat → Agent Mode → Specifier
That's literally it.
I want to add user authentication with JWT tokens and
role-based access control to the admin panel.
What happens next — automatically, via handoff buttons:
S
Specifier — interviews you, surfaces ambiguities, produces the feature spec
H
Plan Hardener — converts the spec into a locked execution contract, runs pre-flight checks
E
Executor — builds slice by slice, runs build + tests at every boundary, no TODOs left behind
R
Reviewer Gate — independent fresh session audits for drift, security gaps, and missed requirements
✓
Shipper — commits, updates the roadmap, captures lessons to memory for next time
Using Claude, Cursor, or Codex? The -Agent flag generated native skills and commands — invoke them directly in your tool. Or use the copy-paste prompts from the runbook in any AI tool.
Give your AI this single prompt. It reads AGENT-SETUP.md, auto-detects the tech stack, runs the setup script non-interactively, and customizes the generated files. No manual steps.
Paste this into any AI chat (Copilot, Claude, Cursor, etc.)
Read the file AGENT-SETUP.md in the plan-forge repo root.
Follow the instructions exactly:
1. Scan this project and detect the tech stack
2. Detect which AI agent you are (Copilot/Claude/Cursor/Codex) for the -Agent flag
3. Run setup.ps1 (or setup.sh) non-interactively with the correct -Preset and -Agent flags plus -Force
4. Customize copilot-instructions.md with this project's actual details
5. Run pforge smith to validate the setup
6. If pforge-mcp/server.mjs exists, run: cd pforge-mcp && npm install (activates MCP tools)
7. Call forge_capabilities to verify all tools are available
8. Note any Spec Kit artifacts found (specs/, memory/constitution.md)
Repo: https://github.com/srnichols/plan-forge
Target project: (current directory)
What it detects*.csproj → dotnet · go.mod → go · pom.xml → java · tsconfig.json → typescript · pyproject.toml → python · *.bicep → azure-iac
What it installs
~22 instruction files, ~14 agents, ~12 skills, 7 lifecycle hooks, 5 pipeline prompts, copilot-instructions.md, 106 MCP tools
What it customizes
Fills in your project name, tech stack, build commands, and architecture details automatically
🆕
Brand new to AI guardrails
1
Read the README — specifically What Is This? (Plain English) — to understand the 4-layer system.
2
Clone the template and run .\setup.ps1 — the interactive wizard asks your stack and bootstraps everything.
Run .\setup.ps1 -Preset <your-stack> — installs instruction files, agents, and skills directly into your project's .github/ folder.
2
Read CUSTOMIZATION.md to fill in your project profile, generate domain-specific guardrails, and set up the project principles workshop.
3
Read docs/COPILOT-VSCODE-GUIDE.md — how Agent Mode works, how instruction files auto-load, managing context budget, and memory bridging.
4
Open Copilot Chat → Agent Mode → pick Specifier → describe your first feature. The pipeline runs from there.
📟
A CLI-first developer
1
Run .\setup.ps1 -Preset <your-stack> or ./setup.sh --preset <your-stack>. Works on Windows (PowerShell), macOS, and Linux.
2
Read docs/CLI-GUIDE.md — how to run the pipeline from the terminal using pforge.ps1 / pforge.sh helper scripts.
3
Copy prompts from the runbook into any terminal-based AI agent (Claude Code, Copilot CLI, Aider, etc.). Same pipeline, no IDE required.
🏢
An enterprise or SaaS team
1
Run setup with your primary stack. For microservices with infra: .\setup.ps1 -Preset dotnet,azure-iac installs both app guardrails and IaC guardrails in one pass.
2
Run the Project Profile workshop (/project-profile) — generates coverage targets, latency SLAs, and compliance requirements from a short interview, stored per-repo.
3
Run the Project Principles workshop — captures non-negotiables, forbidden patterns, and architectural commitments that every AI session loads automatically.
4
Add an org-rules.instructions.md with your internal standards (naming conventions, approved libraries, compliance gates). It auto-loads on every session.
Compatibility
Works With Your AI Tools
Advanced integration with VS Code + GitHub Copilot. First-class support for Claude Code, Cursor, and Codex CLI. Copy-paste prompts work with any AI tool.
🐙 GitHub Copilot🤖 Claude Code⚡ Cursor📟 Codex CLI💎 Gemini🌊 Windsurf💬 ChatGPT🔧 Any AI tool
MIT Licensed · GitHub Template · No vendor lock-in · Works with any AI tool
Dogfooding
Plan Forge Builds Plan Forge
Every feature in this repo — the MCP server, the dashboard, the orchestrator, the quorum system — was developed using the same hardened plan pipeline that ships to users. If the pipeline can build itself without drift, it can build your project too.
28+
phases executed
v2.90
current version
0
manual rollbacks
3285
self-tests passing
The dashboard screenshots in our docs were captured by a Playwright script — itself built slice-by-slice using Plan Forge.
From the Forge
Latest from the Blog
Lessons, patterns, and A/B test results from building with AI coding agents.