FAQ
Everything you need to know about Plan Forge.
Install it as a GitHub template — clone, run the setup wizard, and you're inside the shop. Full Shop Tour →
One hammer is not a workshop.
.github/ directory.
The pipeline works in three ways: Pipeline Agents (optimized for VS Code + Copilot), Prompt Templates (VS Code Copilot Chat), and Copy-Paste Prompts (works in any AI tool — Claude, Cursor, ChatGPT, Gemini, Windsurf, terminal agents).
First-class agent support: Setup generates rich native files for all 7 supported agent types via the -Agent flag. Use -Agent all to install everything at once:
CLAUDE.md with embedded guardrails + /planforge-* slash commands.cursorrules + .cursor/rules/*.mdc with embedded guardrailsAGENTS.md + skill scripts for all pipeline stepsGEMINI.md with embedded guardrails + /planforge-* commands.windsurfrules + Cascade instructions with embedded guardrailsAI-ASSISTANT.md with copy-paste guardrails for tools not listed aboveCopilot files are always installed. Additional agents layer on top — use one, some, or all. All 7 formats include all auto-loading guardrail files, all pipeline prompts, and all ~12 reviewer agents.
No — and that's the point. Plan Forge is the AI-Native SDLC. Humans own three things and three things only:
dotnet, typescript, python, etc.), declare Project Principles, set forbidden patterns. One-time setup, then locked.Everything else — architecture, security, performance, DB, deploy, API design, accessibility, multi-tenancy, CI/CD, observability, dependency audit, compliance, QA, SRE, engineering management, continuous improvement, release management — is handled by 20 specialized agents and 17+ auto-loading guardrail files. Every discipline of a 20-person engineering team, governed by 40 years of software engineering practice.
If the answer to "do humans write any code?" is "yes, sometimes, when we feel like it," then it isn't an AI-Native SDLC — it's an AI-assisted IDE. Plan Forge is the former.
You do — always. A green pipeline is a strong signal, not a sign-off.
When a slice finishes, the shop has independently verified that:
That proves the code is correct against the spec. It does not prove the spec is what you actually wanted. Only Product Owner UAT can answer that — and only you can run it. Push the build to staging, exercise the feature like a real user, and decide.
This separation is deliberate. Agents that grade their own homework are how AI demos lie. Plan Forge keeps the final yes-or-no with the human who owns the outcome. The audit loop ran for two weeks unattended on a real production site and surfaced 30+ defects the maintainer didn't know existed — but the maintainer still had to decide which ones mattered enough to fix. That call is yours, by design.
AGENT-SETUP.md has full brownfield instructions.
-Preset dotnet,azure-iac for an app with infrastructure code.
chat.useCustomizationsInParentRepositories in VS Code settings so child workspaces inherit parent guardrails. Run multi-preset setup with different stacks for different directories (e.g., -Preset typescript,azure-iac -ProjectPath ./packages/api).
A blacksmith inspects the forge, checks the tools, and makes sure everything is ready before the work begins. pforge smith does the same for your project — it diagnoses five areas in seconds:
Every issue includes a FIX: suggestion with the exact command or setting to resolve it. Run it after setup, after updates, or whenever something feels off.
Yes — run setup with -Agent windsurf. This generates .windsurfrules and Cascade instruction files with all auto-loading guardrail files embedded, all pipeline prompts as native commands, and all ~12 reviewer agents as invocable skills. Windsurf's Cascade agent reads these automatically — no manual attachment needed.
To install all agent formats at once: .\setup.ps1 -Preset <stack> -Agent all
Yes — run setup with -Agent gemini. This generates GEMINI.md with all guardrail files embedded and /planforge-* slash commands for every pipeline step. Gemini CLI reads GEMINI.md automatically at session start.
Gemini CLI is also supported as a model provider in Quorum Mode — add it to your .forge/config.json API provider registry.
Two options — both work, and the orchestrator checks both automatically:
XAI_API_KEY or OPENAI_API_KEY in your shell profile or CI secrets..forge/secrets.json (recommended for local dev): Create a JSON file with your keys: { "XAI_API_KEY": "xai-...", "OPENAI_API_KEY": "sk-..." }. The .forge/ directory is gitignored by default — secrets are never committed.Lookup order: environment variable → .forge/secrets.json → null. Any model name matching grok-* auto-routes to api.x.ai/v1.
The Generic agent (-Agent generic) generates AI-ASSISTANT.md — a self-contained document with all 16 guardrails, all pipeline prompts, and all reviewer agents as copy-paste blocks. It works with any AI tool: ChatGPT, Perplexity, GitHub Models, Mistral, local LLMs via Ollama, or any future tool.
Use it as a fallback bridge for tools not explicitly supported, or alongside a specific agent format when you want a portable reference that works everywhere.
Each Plan Forge session starts fresh by design, so the reviewer catches what the builder missed. But that means prior decisions, patterns, and lessons are lost between features. OpenBrain bridges this gap. As of v3.6, OpenBrain is Plan Forge's first-class L3 memory layer (no separate extension needed). The Shipper (Step 6) captures decisions and postmortems to a semantic memory store, and the Plan Hardener (Step 2) searches it before locking down each new plan. Configure it in .forge.json or run pforge brain hint for setup options.
Built-in session memory (/memories/repo/) ships without any configuration, it captures conventions and forbidden patterns as markdown files that load automatically. OpenBrain adds semantic search across thousands of prior decisions, cross-project pattern reuse, and memory that survives repo migrations.
Yes. The deploy.instructions.md guardrail covers GitHub Actions and Azure DevOps pipelines with OIDC auth, environment approval gates, and rollback strategies. The new-pipeline.prompt.md template scaffolds full pipelines with what-if previews, bridge environments (dev → staging → prod), and manual approval gates baked in.
For Azure-specific deployments, the azure-iac preset adds the /infra-deploy skill which handles pre-flight checks, what-if/Terraform plan, environment promotion, and approval gate integration with azd, Bicep, and Terraform.
Quorum Mode dispatches each slice to 3 AI models in parallel (Claude Opus, GPT-5.3-Codex, Grok 4.20 Reasoning) for independent dry-run analysis. Each model produces a detailed implementation plan without executing code. A reviewer agent then synthesizes the best elements — picking the strongest approach per file/component — and produces a unified execution plan. The final builder uses that consensus plan instead of the raw slice instructions.
Think of it as a design review by three senior engineers before any code is written.
Yes — A/B tested on an Invoice Engine feature (rate tiers, volume discounts, tax calculation, banker's rounding). Both runs passed all gates, but quorum produced measurably higher-quality code:
The quality difference isn't in correctness (both pass gates) but in craftsmanship — code that's easier to maintain, debug, and extend.
In the A/B test, quorum added ~35% to the token cost ($0.84 vs $0.62) but took 2.7x longer (32 min vs 12 min). The extra time is the 3-model dry-run analysis + reviewer synthesis — the actual build takes roughly the same time. Total cost for a full quorum run is still under $1.
Use --estimate --quorum before running to see the projected overhead breakdown per slice. With --quorum=auto, only complex slices incur the cost — simple CRUD runs normally.
Auto mode scores each slice's complexity (1-10) using 7 weighted signals: file scope count, cross-module dependencies, security keywords, database/migration keywords, gate count, task count, and historical failure rate. Only slices scoring at or above the threshold (default: 6) get the 3-model consensus treatment. Everything else runs normally.
This is the recommended default for most projects — you get quality where it matters (auth flows, billing logic, migrations) without burning tokens on simple CRUD slices. Override the threshold with --quorum-threshold 8.
Pre-configured quorum profiles that select models, thresholds, and timeouts in one flag:
--quorum=power — Claude Opus 4.6 + GPT-5.3-Codex + Grok 4.20 Reasoning. Reviewer: Opus. Threshold 5. Timeout 5 min. Best for complex features where quality matters most.--quorum=speed — Claude Sonnet 4.6 + GPT-5.4-mini + Grok 4.1 Fast Reasoning. Reviewer: Sonnet. Threshold 7. Timeout 2 min. Best for rapid iteration where cost and speed matter.Available via CLI (--quorum=power), MCP (quorum: "power"), and config (.forge.json → quorum.preset: "power").
Auto-escalation automatically re-routes a failing slice to the next model in your escalationChain instead of retrying on the same model. If a slice fails on gpt-5.2-codex, it will automatically retry on claude-sonnet-4.6, then claude-opus-4.6 — no manual intervention required.
Configure it in .forge.json:
"escalationChain": ["auto", "claude-sonnet-4.6", "claude-opus-4.6"]
"auto" in the chain defers to your modelRouting.execute setting. Each escalation emits a slice-escalated event (visible in the dashboard and events.log). The number of attempts is controlled by maxRetries in your config.
Plan Forge uses two layers of model routing that work together:
1. Role-based routing (modelRouting) — assign a different model to each execution role in .forge.json: default (spec/harden/review), execute (code writing), and review (gate checks). This lets you tune cost vs. quality per stage.
2. Performance-based auto-selection — the orchestrator reads .forge/model-performance.json (built up from past runs) and automatically selects the cheapest model with a >80% historical success rate for each slice type. --estimate shows the recommended model and its success rate before you run.
Override any routing at runtime with pforge run-plan <plan> --model <model>.
Plan Forge ships two browser-based interfaces — both served from the MCP server with no build step required:
localhost:3100/dashboard) — 8 real-time tabs via WebSocket: Progress (live slice cards), Runs (history), Cost (per-model breakdown), Actions (one-click run/abort), Replay (session logs), Extensions (catalog browser), Config (live editor), and Traces (OTLP waterfall).localhost:3100/ui) — read-only single-page app that lists all plans in your project, renders slice metadata cards, visualises DAG dependencies, and shows the scope contract. No execution controls — those remain on the dashboard.Start both with node pforge-mcp/server.mjs, or dashboard-only (no MCP client needed) with node pforge-mcp/server.mjs --dashboard-only.
pforge brain hint.
Yes — that's LiveGuard, the post-coding intelligence layer arriving in v2.27.0–v2.28.0. The forge pipeline handles build-time: specify, plan, execute, and ship. LiveGuard picks up where the forge stops and watches the deployed code:
forge_drift_report — detects when the codebase drifts away from the plan's architectural baselineforge_incident_capture — logs incidents with MTTR and on-call trackingforge_dep_watch — alerts on new CVEs in your dependency snapshotforge_secret_scan — scans staged diffs for high-entropy strings; never logs valuesforge_env_diff — compares .env* files for missing or extra keys across environmentsforge_health_trend — tracks MTTR, drift score, and MTTBF over timeforge_alert_triage — surfaces a ranked list of the most critical signals across all guardsforge_hotspot — identifies high-churn, high-failure files worth extra attentionforge_runbook — stores and retrieves operational runbooks for each alert typeforge_deploy_journal — logs every deployment with pre/post health deltaforge_regression_guard — tracks whether previously passing validation gates stay passingAll 14 LiveGuard tools appear in the LIVEGUARD section of the dashboard (localhost:3100/dashboard), separated from the FORGE section by a visual divider. See Manual Chapter 15 — What Is LiveGuard? and Chapter 16 — LiveGuard Tools Reference.
extension.json manifests. Distribute via GitHub repos, git submodules, or ZIP files. Teams install with pforge ext install. Add an org-rules.instructions.md for company-wide naming conventions, approved libraries, and compliance gates.
pforge ext search to browse the community catalog, pforge ext info <name> for details, and pforge ext add <name> to download and install in one step. The catalog uses a Spec Kit-compatible format — extensions marked speckit_compatible work in both tools.
.vscode/mcp.json (and .claude/mcp.json for Claude), which exposes 19 forge tools as native MCP functions: forge_smith, forge_validate, forge_sweep, forge_status, forge_diff, forge_analyze, forge_diagnose, forge_run_plan, forge_abort, forge_plan_status, forge_cost_report, forge_capabilities, forge_ext_search, forge_ext_info, forge_new_phase, forge_skill_status, forge_run_skill, forge_generate_image, and forge_memory_capture. Your AI agent can call these directly — no terminal commands needed. The MCP server is composable with OpenBrain for persistent memory.
Yes — the Bridge feature sends notifications to Slack, Teams, Telegram, or any webhook when plan runs start, succeed, or fail. Add a bridge section to your .forge.json:
"bridge": {
"enabled": true,
"channels": [
{ "type": "slack", "webhookUrl": "https://hooks.slack.com/...",
"approvalRequired": true, "serverUrl": "https://yourapp.com" },
{ "type": "telegram", "botToken": "...", "chatId": "..." }
]
}
Set approvalRequired: true on any channel to pause execution after all slices pass and send an Approve / Reject button. The run only finalises after a human clicks Approve. Timeout (default 30 min) auto-rejects if no response.
The bridge connects to the Plan Forge WebSocket hub as a subscriber — it observes events without modifying the hub. See docs/CLI-GUIDE.md for the full bridge configuration reference.
Yes — the MCP server exposes POST /api/runs/trigger and POST /api/runs/abort for inbound control from any external system, including OpenClaw, CI pipelines, or custom automation:
# Start a plan run (fire-and-forget — returns immediately)
curl -X POST http://localhost:3100/api/runs/trigger \
-H "Authorization: Bearer <approvalSecret>" \
-H "Content-Type: application/json" \
-d '{ "plan": "docs/plans/my-feature.md" }'
# Abort the active run
curl -X POST http://localhost:3100/api/runs/abort \
-H "Authorization: Bearer <approvalSecret>"
The trigger endpoint prevents concurrent runs, returns a triggerId, and emits run-started on the WebSocket hub. If the bridge is configured with approvalRequired: true, Plan Forge pauses at the end and sends an Approve/Reject message to your Telegram or Slack.
Set bridge.approvalSecret in .forge.json to require bearer-token auth on write endpoints. Without a secret, endpoints are open (suitable for local-only setups).
Yes — two memory endpoints are available when OpenBrain is configured:
POST /api/memory/search — semantic search across past decisions and patterns. Returns the matching thoughts payload to forward to OpenBrain.POST /api/memory/capture — normalise and broadcast a memory-captured hub event. Returns a structured capture_thought payload for the caller to forward to OpenBrain.# Search project memory
curl -X POST http://localhost:3100/api/memory/search \
-H "Content-Type: application/json" \
-d '{ "query": "authentication patterns", "topK": 5 }'
# Capture a new thought
curl -X POST http://localhost:3100/api/memory/capture \
-H "Authorization: Bearer <approvalSecret>" \
-H "Content-Type: application/json" \
-d '{ "content": "Decided to use OIDC for auth layer", "tags": ["auth","decision"] }'
Plan Forge normalises the payload and emits the hub event — the caller is responsible for forwarding to OpenBrain. This keeps the memory boundary clean: Plan Forge doesn't own OpenBrain writes directly. The forge_memory_capture MCP tool provides the same capability for in-session agents.
Three discovery layers are available — an agent only needs one:
GET /.well-known/plan-forge.json or GET /api/capabilities — returns the full machine-readable surface: all MCP tools, all REST endpoints (with methods, auth requirements, and body shapes), config schema, version, and memory/bridge status. Best for agents that self-configure on first connect.docs/llms.txt (served at /llms.txt) — plain-text description of all 13 REST endpoints. Formatted for LLM ingestion.AGENT-SETUP.md Section 6 — curl examples for all external integration endpoints with copy-paste config.The recommended pattern for OpenClaw or any agent: on first connection, call GET /api/capabilities, parse restApi.endpoints, and store the surface locally. Subsequent connections reuse the cached surface and only refresh on version change.
Add the Plan Forge Validate GitHub Action to your workflow:
- uses: srnichols/plan-forge-validate@v1
with:
sweep: true # Run TODO/FIXME sweep
fail-on-warnings: false # Warnings don't block merge
It checks six areas: setup health, file counts per preset, unresolved placeholders, orphaned agents, plan artifacts (scope contracts + slices), and a completeness sweep. Every failure shows exactly what's wrong.
The action has zero dependencies beyond bash and git, runs in ~5 seconds, and outputs passed, failed, warnings, and result for use in downstream steps.
pforge analyze <plan-file>. It scores your implementation against the plan across 4 dimensions: requirement traceability, scope compliance, test coverage, and validation gates. Returns a consistency score out of 100. Also available as the forge_analyze MCP tool and via analyze: true in the GitHub Action.
The Copilot cloud agent works on GitHub issues autonomously — cloning your repo, making code changes, and opening PRs. Plan Forge integrates via .github/copilot-setup-steps.yml, which GitHub runs to provision the agent's environment before it starts coding.
Copy templates/copilot-setup-steps.yml from the Plan Forge repo to your project's .github/ directory. Set the correct --preset for your stack. The cloud agent then starts each session with:
.vscode/mcp.jsonpforge smith health check run automaticallyThe short version: Copilot cloud agent plans. Plan Forge hardens.
Yes. The cloud agent reads .github/copilot-instructions.md and .github/instructions/*.instructions.md using the same applyTo mechanism as local VS Code. Security rules activate on auth files, database patterns activate on query files, and architecture principles load on every file — no changes to your instruction files needed.
The only difference: copilot-setup-steps.yml handles dependency installation that a local dev machine already has. Once provisioned, the guardrail behavior is identical.
Plan Forge operates at the development layer — slice gates (build + test) catch problems before the code ever reaches GitHub's CI/CD pipeline. CodeQL, secret scanning, and Copilot code review then add additional coverage after the PR is opened. The layers are complementary, not overlapping:
Both are open-source, MIT-licensed frameworks for disciplined AI-assisted development — and both are excellent. They solve different parts of the problem:
Spec Kit (by GitHub) focuses on Spec-Driven Development — turning ideas into executable specifications via slash commands (/speckit.specify, /speckit.plan, /speckit.implement). It has a massive community (85K+ stars, 144 contributors), supports 25+ AI agents natively, and offers a rich extension and preset ecosystem with 40+ community extensions. It shines at defining what to build and generating implementation from specs.
Plan Forge focuses on hardened execution — locking specs into scope contracts the AI cannot deviate from, enforcing standards with 17–18 auto-loading guardrail files per stack, providing 19 specialized reviewer agents (security, architecture, performance, compliance, error handling, etc.), and validating at every slice boundary. It shines at ensuring the AI builds exactly what was specified with enterprise-grade quality.
They're genuinely complementary: use Spec Kit to write the spec, Plan Forge to enforce it. Or pick the one that matches your team's priorities (see the next question).
Pick Spec Kit if your team uses multiple AI tools (not just VS Code), you want the largest community and extension ecosystem, and you prefer a lightweight spec-first methodology you can adopt incrementally. GitHub's backing means strong long-term viability and rapid iteration.
Pick Plan Forge if you want deep guardrails that auto-enforce during coding, you need specialized reviewer agents, and you care about enterprise patterns like deployment templates, lifecycle hooks, and scope-contract enforcement. First-class support for VS Code + Copilot, Claude Code, and Cursor — with MCP tools for native integration.
Honest take: Spec Kit has the bigger ecosystem and broader agent support today. Plan Forge goes deeper on runtime enforcement and enterprise quality gates. Both are free. You really can't go wrong.
Yes — Plan Forge auto-detects Spec Kit artifacts. When you start Step 0 (Specifier), it scans for specs/*/spec.md, plan.md, and memory/constitution.md. If found, it offers to import them directly — no re-specifying needed.
The extension catalogs also use the same format, so Spec Kit-compatible extensions work in both tools. See the full integration guide for the combined workflow.
They solve different problems and work best together. Plan Forge uses three memory layers — each with a distinct role:
| Layer | What It Is | Best For |
|---|---|---|
| Copilot Memory | /memories/ — Copilot's built-in note storage (user / session / repo scopes) | Free-form notes, personal patterns, ad-hoc insights |
| Plan Forge Session Bridge | Structured /memories/repo/current-phase.md managed by pipeline prompts | Carrying Session 1 → 2 → 3 state through the hardening pipeline |
| OpenBrain | Semantic vector memory via MCP search_thoughts / capture_thought | Auto-injecting relevant prior decisions before each slice — no manual prompting needed |
All three are complementary. A typical phase uses all three: Copilot Memory for quick mid-session notes, the session bridge files for structured phase handoffs, and OpenBrain for surfacing past decisions automatically.
/forge-quench) is a code simplification skill that systematically reduces complexity while preserving exact behavior. It follows the Chesterton's Fence principle — always understand WHY code is complex before simplifying it. The 5-step workflow: Measure → Understand First → Propose → Apply & Prove → Report. Each simplification is committed individually with tests run after every change.
docs/SKILL-BLUEPRINT.md) is the formal specification for Plan Forge skill files. Every skill follows this format: Frontmatter → Trigger → Steps → Safety Rules → Temper Guards → Warning Signs → Exit Proof → Persistent Memory. Extension contributors use the blueprint to create skills that are consistent with the built-in ones.
Still have questions?