Plan Forge is the AI-Native SDLC Forge Shop — one workshop, four stations (Smelt, Forge, Guard, Learn) that together cover every phase of the software lifecycle. Smelt turns a raw idea into a Scope Contract; Forge executes slice-by-slice with validation gates and independent review; Guard is LiveGuard post-deploy defense (secrets, drift, regressions, incidents); Learn is the memory layer (bug registry, testbed, Health DNA, OpenBrain). Install it as a GitHub template.

Quorum Mode dispatches each slice to 3 AI models in parallel for independent dry-run analysis. A reviewer agent synthesizes the best elements into a unified execution plan. Think of it as a design review by three senior engineers before any code is written.

Does Plan Forge audit my deployed app automatically?

No — audit.mode defaults to 'off' in .forge.json. You must explicitly opt in by setting audit.mode to 'auto' (threshold-gated) or 'always' (unconditional after plan completion). For one-off audits, run 'pforge audit-loop' from the CLI. Production environments are always forbidden — the audit loop only targets dev and staging.

FAQ — Plan Forge

Q: How is this different from just using Copilot / Cursor / Claude directly?

Those tools are great at generating code — that's one hammer. Plan Forge gives you the whole shop. Smelt locks scope before the agent writes a line. Forge runs slice-by-slice with validation gates and independent review. Guard (LiveGuard) defends post-deploy against secrets, drift, and regressions. Learn captures every bug, every fix, every incident into OpenBrain memory so tomorrow's plan is colder and less wrong. One hammer is not a workshop.

Q: Does it work with any AI tool or just Copilot?

Plan Forge supports 7 agent types: Copilot (default), Claude Code, Cursor, Codex CLI, Gemini CLI, Windsurf, and a Generic format for any AI tool. Use -Agent all to install everything at once. All formats include all guardrail files, pipeline prompts, and reviewer agents.

Q: What tech stacks are supported?

Nine presets out of the box: .NET/C#, TypeScript/React, Python/FastAPI, Java/Spring Boot, Go, Swift, Rust, PHP/Laravel, and Azure IaC (Bicep/Terraform). There's also a custom preset for any other stack. Multi-preset combinations are supported.

Q: Do I need OpenBrain to use Plan Forge?

No. Plan Forge works fully without OpenBrain. Memory integration is an optional extension. You get the full pipeline, all guardrails, and all agents without it. OpenBrain just makes the experience compound over time.

Q: Does Plan Forge replace my CI/CD pipeline?

No. Plan Forge operates at the development layer, before code reaches your CI/CD pipeline. Its validation gates run locally during slice execution. Your existing CI/CD pipeline handles deployment, environments, and production gates.

💡 General

What is Plan Forge?

Plan Forge is the AI-Native SDLC Forge Shop — one workshop, four stations covering every phase of the software lifecycle:

🪨 Smelt — raw idea → Scope Contract (specifier + hardener)
🔨 Forge — Scope Contract → shipped code (slice gates, quorum, fresh-session review)
🛡️ Guard — post-deploy defense (LiveGuard: secrets, drift, regressions, incidents)
🧠 Learn — memory & retrospectives (bug registry, testbed, Health DNA, OpenBrain)

Install it as a GitHub template — clone, run the setup wizard, and you're inside the shop. Full Shop Tour →

What do you mean by "self-deterministic agent loop"?

The slice executor is deterministic: same plan, same config, same outcome. On top of that spine, ten opt-in inner-loop subsystems (reflexion, trajectories, auto-skills, gate synthesis, postmortems, federation, reviewer, competitive worktree execution, auto-fix proposals, cost-anomaly detection) let the loop observe itself and feed what it learns back into the next slice, the next plan, or a sibling project. Every feedback arrow is opt-in or advisory; the execution contract never mutates silently. That combination — deterministic execution plus reflective context — is what we call a self-deterministic agent loop. See the canonical overview with two Mermaid diagrams.

How is this different from just using Copilot / Cursor / Claude directly?

Those tools are great at generating code — that's one hammer. Plan Forge gives you the whole shop:

Smelt locks scope before the agent writes a line.
Forge runs slice-by-slice with validation gates and a fresh-session independent review.
Guard (LiveGuard) defends post-deploy against secrets, drift, and regressions.
Learn captures every bug, fix, and incident into OpenBrain so tomorrow's plan is colder and less wrong.

One hammer is not a workshop.

Is Plan Forge free?

Yes. MIT licensed — free for personal, commercial, and enterprise use. No sign-ups, no telemetry, no vendor lock-in. It's a GitHub template that installs files into your project's .github/ directory.

Does it work with any AI tool or just Copilot?

The pipeline works in three ways: Pipeline Agents (optimized for VS Code + Copilot), Prompt Templates (VS Code Copilot Chat), and Copy-Paste Prompts (works in any AI tool — Claude, Cursor, ChatGPT, Gemini, Windsurf, terminal agents).

First-class agent support: Setup generates rich native files for all 7 supported agent types via the -Agent flag. Use -Agent all to install everything at once:

copilot (default) — VS Code + GitHub Copilot: auto-loading instructions, agent picker, handoff buttons
claude — Claude Code: CLAUDE.md with embedded guardrails + /planforge-* slash commands
cursor — Cursor: .cursorrules + .cursor/rules/*.mdc with embedded guardrails
codex — Codex CLI: AGENTS.md + skill scripts for all pipeline steps
gemini — Gemini CLI: GEMINI.md with embedded guardrails + /planforge-* commands
windsurf — Windsurf: .windsurfrules + Cascade instructions with embedded guardrails
generic — Any AI tool: AI-ASSISTANT.md with copy-paste guardrails for tools not listed above

Copilot files are always installed. Additional agents layer on top — use one, some, or all. All 7 formats include all auto-loading guardrail files, all pipeline prompts, and all ~12 reviewer agents.

Do humans write any code with Plan Forge?

No — and that's the point. Plan Forge is the AI-Native SDLC. Humans own three things and three things only:

A few technology choices, made once. Pick a preset (dotnet, typescript, python, etc.), declare Project Principles, set forbidden patterns. One-time setup, then locked.
Spec and direction. You play Product Owner. Crucible interviews you; you decide what to build and why. The shop will not deviate from the Scope Contract you sign off on.
Manual acceptance testing. Agents do not read minds. They implement what the spec says, not what you intended. Only you can decide whether shipped code is what you actually wanted — that is UAT, same as it has been for 40 years.

Everything else — architecture, security, performance, DB, deploy, API design, accessibility, multi-tenancy, CI/CD, observability, dependency audit, compliance, QA, SRE, engineering management, continuous improvement, release management — is handled by 20 specialized agents and 17+ auto-loading guardrail files. Every discipline of a 20-person engineering team, governed by 40 years of software engineering practice.

If the answer to "do humans write any code?" is "yes, sometimes, when we feel like it," then it isn't an AI-Native SDLC — it's an AI-assisted IDE. Plan Forge is the former.

Who decides when a feature is done?

You do — always. A green pipeline is a strong signal, not a sign-off.

When a slice finishes, the shop has independently verified that:

All tests pass (unit, integration, contract, regression)
No secrets leaked, no drift introduced, no dependencies poisoned
An independent fresh-session reviewer (Session 3) signed off against the Scope Contract
Forge-Master Auditor independently graded the run against the plan and prior runs
LiveGuard post-deploy checks (env diff, regression guard) pass

That proves the code is correct against the spec. It does not prove the spec is what you actually wanted. Only Product Owner UAT can answer that — and only you can run it. Push the build to staging, exercise the feature like a real user, and decide.

This separation is deliberate. Agents that grade their own homework are how AI demos lie. Plan Forge keeps the final yes-or-no with the human who owns the outcome. The audit loop ran for two weeks unattended on a real production site and surfaced 30+ defects the maintainer didn't know existed — but the maintainer still had to decide which ones mattered enough to fix. That call is yours, by design.

⚙️ Setup & Configuration

Can I add Plan Forge to an existing project (brownfield)?

Yes. The setup wizard detects existing files and merges rather than overwrites. It only adds missing guardrail files — your project-specific content is preserved. AGENT-SETUP.md has full brownfield instructions.

What tech stacks are supported?

Nine presets out of the box: .NET/C#, TypeScript/React, Python/FastAPI, Java/Spring Boot, Go, Swift, Rust, PHP/Laravel (8 app presets), and Azure IaC (Bicep/Terraform, 1 IaC preset). There's also a custom preset for any other stack. Multi-preset combinations are supported — e.g., -Preset dotnet,azure-iac for an app with infrastructure code.

Do I need to configure every guardrail file manually?

No. All 18+ guardrail files ship pre-written with best practices and auto-load based on the file type being edited — no action needed. To customize, run the Project Profile workshop (a one-time interview that generates project-specific guardrails) or edit any instruction file directly.

How do I use this in a monorepo?

Set chat.useCustomizationsInParentRepositories in VS Code settings so child workspaces inherit parent guardrails. Run multi-preset setup with different stacks for different directories (e.g., -Preset typescript,azure-iac -ProjectPath ./packages/api).

What is "The Smith" and when should I run it?

A blacksmith inspects the forge, checks the tools, and makes sure everything is ready before the work begins. pforge smith does the same for your project — it diagnoses five areas in seconds:

Environment — git, VS Code CLI, PowerShell/bash, GitHub CLI
VS Code Config — agent mode, parent repo customizations, prompt file discovery
Setup Health — .forge.json, file counts per preset, copilot-instructions.md
Version Currency — is your Plan Forge install up to date?
Common Problems — duplicate files, orphaned agents, missing applyTo, unresolved placeholders

Every issue includes a FIX: suggestion with the exact command or setting to resolve it. Run it after setup, after updates, or whenever something feels off.

Does Plan Forge work with Windsurf?

Yes — run setup with -Agent windsurf. This generates .windsurfrules and Cascade instruction files with all auto-loading guardrail files embedded, all pipeline prompts as native commands, and all ~12 reviewer agents as invocable skills. Windsurf's Cascade agent reads these automatically — no manual attachment needed.

To install all agent formats at once: .\setup.ps1 -Preset <stack> -Agent all

Does Plan Forge work with Gemini CLI?

Yes — run setup with -Agent gemini. This generates GEMINI.md with all guardrail files embedded and /planforge-* slash commands for every pipeline step. Gemini CLI reads GEMINI.md automatically at session start.

Gemini CLI is also supported as a model provider in Quorum Mode — add it to your .forge/config.json API provider registry.

How do I configure API keys for Grok/OpenAI?

Two options — both work, and the orchestrator checks both automatically:

Environment variables (recommended for CI/CD): Set XAI_API_KEY or OPENAI_API_KEY in your shell profile or CI secrets.
.forge/secrets.json (recommended for local dev): Create a JSON file with your keys: { "XAI_API_KEY": "xai-...", "OPENAI_API_KEY": "sk-..." }. The .forge/ directory is gitignored by default — secrets are never committed.

Lookup order: environment variable → .forge/secrets.json → null. Any model name matching grok-* auto-routes to api.x.ai/v1.

What is the Generic agent and when should I use it?

The Generic agent (-Agent generic) generates AI-ASSISTANT.md — a self-contained document with all 16 guardrails, all pipeline prompts, and all reviewer agents as copy-paste blocks. It works with any AI tool: ChatGPT, Perplexity, GitHub Models, Mistral, local LLMs via Ollama, or any future tool.

Use it as a fallback bridge for tools not explicitly supported, or alongside a specific agent format when you want a portable reference that works everywhere.

How does memory bridge the 3-session model?

Each Plan Forge session starts fresh by design, so the reviewer catches what the builder missed. But that means prior decisions, patterns, and lessons are lost between features. OpenBrain bridges this gap. As of v3.6, OpenBrain is Plan Forge's first-class L3 memory layer (no separate extension needed). The Shipper (Step 6) captures decisions and postmortems to a semantic memory store, and the Plan Hardener (Step 2) searches it before locking down each new plan. Configure it in .forge.json or run pforge brain hint for setup options.

Built-in session memory (/memories/repo/) ships without any configuration, it captures conventions and forbidden patterns as markdown files that load automatically. OpenBrain adds semantic search across thousands of prior decisions, cross-project pattern reuse, and memory that survives repo migrations.

Does Plan Forge support CI/CD approval gates and bridge environments?

Yes. The deploy.instructions.md guardrail covers GitHub Actions and Azure DevOps pipelines with OIDC auth, environment approval gates, and rollback strategies. The new-pipeline.prompt.md template scaffolds full pipelines with what-if previews, bridge environments (dev → staging → prod), and manual approval gates baked in.

For Azure-specific deployments, the azure-iac preset adds the /infra-deploy skill which handles pre-flight checks, what-if/Terraform plan, environment promotion, and approval gate integration with azd, Bicep, and Terraform.

🔨 The Pipeline

Why separate sessions? Why not do everything in one chat?

The executor shouldn't self-audit — that's like grading your own exam. Session isolation forces fresh context so the reviewer catches drift and bugs the builder is blind to. Each session loads the same guardrails but brings independent judgment.

Do I have to use all 7 steps for every feature?

No. The pipeline has complexity routing — Step 0 classifies work as Micro, Small, Medium, or Large. Quick bug fixes can skip hardening. Small features get a lightweight plan. Only medium and large features run the full pipeline. You stay in control.

What if I need to change the plan mid-execution?

There's a structured Plan Amendment Protocol. You can modify the scope contract through a defined process — not by silently drifting. Changes are documented and the remaining slices are re-evaluated against the updated contract.

What happens if a slice fails validation?

The agent must fix the failure before proceeding. Build and test gates are hard gates — not suggestions. There's also a rollback protocol for reverting a failed slice cleanly, and stop conditions that immediately halt execution for critical issues.

🗳️ Quorum Mode

What is Quorum Mode?

Quorum Mode dispatches each slice to 3 AI models in parallel (Claude Opus, GPT-5.3-Codex, Grok 4.20 Reasoning) for independent dry-run analysis. Each model produces a detailed implementation plan without executing code. A reviewer agent then synthesizes the best elements — picking the strongest approach per file/component — and produces a unified execution plan. The final builder uses that consensus plan instead of the raw slice instructions.

Think of it as a design review by three senior engineers before any code is written.

Does it actually produce better code?

Yes — A/B tested on an Invoice Engine feature (rate tiers, volume discounts, tax calculation, banker's rounding). Both runs passed all gates, but quorum produced measurably higher-quality code:

20% more tests (18 vs 15)
Extracted DRY helpers (IsWeekend(), CalculateVolumeDiscount()) vs inline code
Robust test dates (relative offsets) vs hardcoded literals that break when dates pass
Extra edge case coverage — voided invoice regeneration, invoice number sequencing
Modern .NET patterns — ArgumentException.ThrowIfNullOrWhiteSpace vs generic ValidationException

The quality difference isn't in correctness (both pass gates) but in craftsmanship — code that's easier to maintain, debug, and extend.

How much more does Quorum Mode cost?

In the A/B test, quorum added ~35% to the token cost ($0.84 vs $0.62) but took 2.7x longer (32 min vs 12 min). The extra time is the 3-model dry-run analysis + reviewer synthesis — the actual build takes roughly the same time. Total cost for a full quorum run is still under $1.

Use --estimate --quorum before running to see the projected overhead breakdown per slice. With --quorum=auto, only complex slices incur the cost — simple CRUD runs normally.

What is --quorum=auto and when should I use it?

Auto mode scores each slice's complexity (1-10) using 7 weighted signals: file scope count, cross-module dependencies, security keywords, database/migration keywords, gate count, task count, and historical failure rate. Only slices scoring at or above the threshold (default: 6) get the 3-model consensus treatment. Everything else runs normally.

This is the recommended default for most projects — you get quality where it matters (auth flows, billing logic, migrations) without burning tokens on simple CRUD slices. Override the threshold with --quorum-threshold 8.

What happens if one of the 3 models is unavailable?

Graceful degradation. If 2 of 3 models respond, the reviewer works with 2 analyses. If fewer than 2 respond, it falls back to normal single-model execution — your pipeline never blocks on model availability. If the reviewer itself fails, the best single dry-run response is used as the enhanced prompt.

What are quorum power and speed presets?

Pre-configured quorum profiles that select models, thresholds, and timeouts in one flag:

--quorum=power — Claude Opus 4.6 + GPT-5.3-Codex + Grok 4.20 Reasoning. Reviewer: Opus. Threshold 5. Timeout 5 min. Best for complex features where quality matters most.
--quorum=speed — Claude Sonnet 4.6 + GPT-5.4-mini + Grok 4.1 Fast Reasoning. Reviewer: Sonnet. Threshold 7. Timeout 2 min. Best for rapid iteration where cost and speed matter.

Available via CLI (--quorum=power), MCP (quorum: "power"), and config (.forge.json → quorum.preset: "power").

⚡ Advanced Features

What is auto-escalation?

Auto-escalation automatically re-routes a failing slice to the next model in your escalationChain instead of retrying on the same model. If a slice fails on gpt-5.2-codex, it will automatically retry on claude-sonnet-4.6, then claude-opus-4.6 — no manual intervention required.

Configure it in .forge.json:

"escalationChain": ["auto", "claude-sonnet-4.6", "claude-opus-4.6"]

"auto" in the chain defers to your modelRouting.execute setting. Each escalation emits a slice-escalated event (visible in the dashboard and events.log). The number of attempts is controlled by maxRetries in your config.

How does model routing work?

Plan Forge uses two layers of model routing that work together:

1. Role-based routing (modelRouting) — assign a different model to each execution role in .forge.json: default (spec/harden/review), execute (code writing), and review (gate checks). This lets you tune cost vs. quality per stage.

2. Performance-based auto-selection — the orchestrator reads .forge/model-performance.json (built up from past runs) and automatically selects the cheapest model with a >80% historical success rate for each slice type. --estimate shows the recommended model and its success rate before you run.

Override any routing at runtime with pforge run-plan <plan> --model <model>.

What is the Web UI?

Plan Forge ships two browser-based interfaces — both served from the MCP server with no build step required:

Live Dashboard (localhost:3100/dashboard) — 8 real-time tabs via WebSocket: Progress (live slice cards), Runs (history), Cost (per-model breakdown), Actions (one-click run/abort), Replay (session logs), Extensions (catalog browser), Config (live editor), and Traces (OTLP waterfall).
Plan Browser (localhost:3100/ui) — read-only single-page app that lists all plans in your project, renders slice metadata cards, visualises DAG dependencies, and shows the scope contract. No execution controls — those remain on the dashboard.

Start both with node pforge-mcp/server.mjs, or dashboard-only (no MCP client needed) with node pforge-mcp/server.mjs --dashboard-only.

🧠 Memory & OpenBrain

Do I need OpenBrain to use Plan Forge?

No. Plan Forge works fully without OpenBrain . Memory integration is optional but recommended. As of v3.6, OpenBrain is Plan Forge's first-class L3 memory layer (no separate extension to install), and 106 files have OpenBrain hooks, all gated behind "if configured." You get the full pipeline, all guardrails, and all agents without it. OpenBrain just makes the experience compound over time. Enable it with pforge brain hint.

What does OpenBrain cost?

Self-hosted with Ollama: $0/month. Local embeddings, local LLM, local PostgreSQL. Cloud option (Azure OpenAI): ~$15/month. OpenBrain is MIT-licensed and fully self-hosted.

🏢 Enterprise & Teams

Does Plan Forge help monitor the app after coding? New in v2.27

Yes — that's LiveGuard, the post-coding intelligence layer arriving in v2.27.0–v2.28.0. The forge pipeline handles build-time: specify, plan, execute, and ship. LiveGuard picks up where the forge stops and watches the deployed code:

🛡️ forge_drift_report — detects when the codebase drifts away from the plan's architectural baseline
🚨 forge_incident_capture — logs incidents with MTTR and on-call tracking
📦 forge_dep_watch — alerts on new CVEs in your dependency snapshot
🔐 forge_secret_scan — scans staged diffs for high-entropy strings; never logs values
🌱 forge_env_diff — compares .env* files for missing or extra keys across environments
📈 forge_health_trend — tracks MTTR, drift score, and MTTBF over time
🎯 forge_alert_triage — surfaces a ranked list of the most critical signals across all guards
🔥 forge_hotspot — identifies high-churn, high-failure files worth extra attention
📋 forge_runbook — stores and retrieves operational runbooks for each alert type
🚢 forge_deploy_journal — logs every deployment with pre/post health delta
✔️ forge_regression_guard — tracks whether previously passing validation gates stay passing

All 14 LiveGuard tools appear in the LIVEGUARD section of the dashboard (localhost:3100/dashboard), separated from the FORGE section by a visual divider. See Manual Chapter 15 — What Is LiveGuard? and Chapter 16 — LiveGuard Tools Reference.

Can I share guardrails across teams?

Yes. Create extensions with your organization's standards — code as versioned packages with extension.json manifests. Distribute via GitHub repos, git submodules, or ZIP files. Teams install with pforge ext install. Add an org-rules.instructions.md for company-wide naming conventions, approved libraries, and compliance gates.

Can I browse and install extensions from a catalog?

Yes. Run pforge ext search to browse the community catalog, pforge ext info <name> for details, and pforge ext add <name> to download and install in one step. The catalog uses a Spec Kit-compatible format — extensions marked speckit_compatible work in both tools.

Can I use forge commands without the CLI?

Yes. The setup wizard generates .vscode/mcp.json (and .claude/mcp.json for Claude), which exposes 19 forge tools as native MCP functions: forge_smith, forge_validate, forge_sweep, forge_status, forge_diff, forge_analyze, forge_diagnose, forge_run_plan, forge_abort, forge_plan_status, forge_cost_report, forge_capabilities, forge_ext_search, forge_ext_info, forge_new_phase, forge_skill_status, forge_run_skill, forge_generate_image, and forge_memory_capture. Your AI agent can call these directly — no terminal commands needed. The MCP server is composable with OpenBrain for persistent memory.

Can I get Slack/Telegram notifications and approval gates for plan runs?

Yes — the Bridge feature sends notifications to Slack, Teams, Telegram, or any webhook when plan runs start, succeed, or fail. Add a bridge section to your .forge.json:

"bridge": {
  "enabled": true,
  "channels": [
    { "type": "slack",    "webhookUrl": "https://hooks.slack.com/...",
      "approvalRequired": true, "serverUrl": "https://yourapp.com" },
    { "type": "telegram", "botToken": "...", "chatId": "..." }
  ]
}

Set approvalRequired: true on any channel to pause execution after all slices pass and send an Approve / Reject button. The run only finalises after a human clicks Approve. Timeout (default 30 min) auto-rejects if no response.

The bridge connects to the Plan Forge WebSocket hub as a subscriber — it observes events without modifying the hub. See docs/CLI-GUIDE.md for the full bridge configuration reference.

Can an external agent like OpenClaw trigger a plan run remotely?

Yes — the MCP server exposes POST /api/runs/trigger and POST /api/runs/abort for inbound control from any external system, including OpenClaw, CI pipelines, or custom automation:

# Start a plan run (fire-and-forget — returns immediately)
curl -X POST http://localhost:3100/api/runs/trigger \
  -H "Authorization: Bearer <approvalSecret>" \
  -H "Content-Type: application/json" \
  -d '{ "plan": "docs/plans/my-feature.md" }'

# Abort the active run
curl -X POST http://localhost:3100/api/runs/abort \
  -H "Authorization: Bearer <approvalSecret>"

The trigger endpoint prevents concurrent runs, returns a triggerId, and emits run-started on the WebSocket hub. If the bridge is configured with approvalRequired: true, Plan Forge pauses at the end and sends an Approve/Reject message to your Telegram or Slack.

Set bridge.approvalSecret in .forge.json to require bearer-token auth on write endpoints. Without a secret, endpoints are open (suitable for local-only setups).

Can external tools read and write OpenBrain project memory via the REST API?

Yes — two memory endpoints are available when OpenBrain is configured:

POST /api/memory/search — semantic search across past decisions and patterns. Returns the matching thoughts payload to forward to OpenBrain.
POST /api/memory/capture — normalise and broadcast a memory-captured hub event. Returns a structured capture_thought payload for the caller to forward to OpenBrain.

# Search project memory
curl -X POST http://localhost:3100/api/memory/search \
  -H "Content-Type: application/json" \
  -d '{ "query": "authentication patterns", "topK": 5 }'

# Capture a new thought
curl -X POST http://localhost:3100/api/memory/capture \
  -H "Authorization: Bearer <approvalSecret>" \
  -H "Content-Type: application/json" \
  -d '{ "content": "Decided to use OIDC for auth layer", "tags": ["auth","decision"] }'

Plan Forge normalises the payload and emits the hub event — the caller is responsible for forwarding to OpenBrain. This keeps the memory boundary clean: Plan Forge doesn't own OpenBrain writes directly. The forge_memory_capture MCP tool provides the same capability for in-session agents.

How does an external agent discover Plan Forge's REST API surface?

Three discovery layers are available — an agent only needs one:

Programmatic: GET /.well-known/plan-forge.json or GET /api/capabilities — returns the full machine-readable surface: all MCP tools, all REST endpoints (with methods, auth requirements, and body shapes), config schema, version, and memory/bridge status. Best for agents that self-configure on first connect.
LLM-readable: docs/llms.txt (served at /llms.txt) — plain-text description of all 13 REST endpoints. Formatted for LLM ingestion.
Human reference: AGENT-SETUP.md Section 6 — curl examples for all external integration endpoints with copy-paste config.

The recommended pattern for OpenClaw or any agent: on first connection, call GET /api/capabilities, parse restApi.endpoints, and store the surface locally. Subsequent connections reuse the cached surface and only refresh on version change.

The compliance-reviewer agent audits for GDPR, CCPA, SOC2, and PII handling. The Project Profile workshop captures your specific compliance requirements. The security guardrails auto-load when editing auth, API, or data access code. For regulated industries, the extension ecosystem lets you package domain-specific compliance guardrails (e.g., HIPAA, PCI-DSS).

🔄 CI/CD

How do I validate plans automatically on every PR?

Add the Plan Forge Validate GitHub Action to your workflow:

- uses: srnichols/plan-forge-validate@v1
  with:
    sweep: true              # Run TODO/FIXME sweep
    fail-on-warnings: false  # Warnings don't block merge

It checks six areas: setup health, file counts per preset, unresolved placeholders, orphaned agents, plan artifacts (scope contracts + slices), and a completeness sweep. Every failure shows exactly what's wrong.

The action has zero dependencies beyond bash and git, runs in ~5 seconds, and outputs passed, failed, warnings, and result for use in downstream steps.

How do I verify my code actually matches the plan?

Run pforge analyze <plan-file>. It scores your implementation against the plan across 4 dimensions: requirement traceability, scope compliance, test coverage, and validation gates. Returns a consistency score out of 100. Also available as the forge_analyze MCP tool and via analyze: true in the GitHub Action.

☁️ Copilot Cloud Agent

How does Plan Forge work with the Copilot cloud agent?

The Copilot cloud agent works on GitHub issues autonomously — cloning your repo, making code changes, and opening PRs. Plan Forge integrates via .github/copilot-setup-steps.yml, which GitHub runs to provision the agent's environment before it starts coding.

Copy templates/copilot-setup-steps.yml from the Plan Forge repo to your project's .github/ directory. Set the correct --preset for your stack. The cloud agent then starts each session with:

All guardrail instruction files installed and auto-loading
All 106 MCP tools available via .vscode/mcp.json
pforge smith health check run automatically

The short version: Copilot cloud agent plans. Plan Forge hardens.

Do guardrails work the same in the cloud agent as in local VS Code?

Yes. The cloud agent reads .github/copilot-instructions.md and .github/instructions/*.instructions.md using the same applyTo mechanism as local VS Code. Security rules activate on auth files, database patterns activate on query files, and architecture principles load on every file — no changes to your instruction files needed.

The only difference: copilot-setup-steps.yml handles dependency installation that a local dev machine already has. Once provisioned, the guardrail behavior is identical.

How does Plan Forge complement CodeQL and secret scanning?

Plan Forge operates at the development layer — slice gates (build + test) catch problems before the code ever reaches GitHub's CI/CD pipeline. CodeQL, secret scanning, and Copilot code review then add additional coverage after the PR is opened. The layers are complementary, not overlapping:

Plan Forge slice gates — build failures, test regressions, scope drift (during execution)
Copilot code review — style, correctness, suggestions (PR opened)
CodeQL — security vulnerabilities, data flow (PR/push CI)
Secret scanning — leaked credentials (commit time)

⚖️ Comparisons

How does Plan Forge compare to Spec Kit?

Both are open-source, MIT-licensed frameworks for disciplined AI-assisted development — and both are excellent. They solve different parts of the problem:

Spec Kit (by GitHub) focuses on Spec-Driven Development — turning ideas into executable specifications via slash commands (/speckit.specify, /speckit.plan, /speckit.implement). It has a massive community (85K+ stars, 144 contributors), supports 25+ AI agents natively, and offers a rich extension and preset ecosystem with 40+ community extensions. It shines at defining what to build and generating implementation from specs.

Plan Forge focuses on hardened execution — locking specs into scope contracts the AI cannot deviate from, enforcing standards with 17–18 auto-loading guardrail files per stack, providing 19 specialized reviewer agents (security, architecture, performance, compliance, error handling, etc.), and validating at every slice boundary. It shines at ensuring the AI builds exactly what was specified with enterprise-grade quality.

They're genuinely complementary: use Spec Kit to write the spec, Plan Forge to enforce it. Or pick the one that matches your team's priorities (see the next question).

If I can only pick one — Spec Kit or Plan Forge?

Pick Spec Kit if your team uses multiple AI tools (not just VS Code), you want the largest community and extension ecosystem, and you prefer a lightweight spec-first methodology you can adopt incrementally. GitHub's backing means strong long-term viability and rapid iteration.

Pick Plan Forge if you want deep guardrails that auto-enforce during coding, you need specialized reviewer agents, and you care about enterprise patterns like deployment templates, lifecycle hooks, and scope-contract enforcement. First-class support for VS Code + Copilot, Claude Code, and Cursor — with MCP tools for native integration.

Honest take: Spec Kit has the bigger ecosystem and broader agent support today. Plan Forge goes deeper on runtime enforcement and enterprise quality gates. Both are free. You really can't go wrong.

Can I use Plan Forge with an existing Spec Kit project?

Yes — Plan Forge auto-detects Spec Kit artifacts. When you start Step 0 (Specifier), it scans for specs/*/spec.md, plan.md, and memory/constitution.md. If found, it offers to import them directly — no re-specifying needed.

The extension catalogs also use the same format, so Spec Kit-compatible extensions work in both tools. See the full integration guide for the combined workflow.

Does Plan Forge replace my CI/CD pipeline?

No. Plan Forge operates at the development layer, before code reaches your CI/CD pipeline. Its validation gates (build/test) run locally during slice execution. Your existing CI/CD pipeline handles deployment, environments, and production gates. Plan Forge catches problems before they reach your pipeline.

What's the difference between Copilot Memory and Plan Forge?

They solve different problems and work best together. Plan Forge uses three memory layers — each with a distinct role:

Layer	What It Is	Best For
Copilot Memory	`/memories/` — Copilot's built-in note storage (user / session / repo scopes)	Free-form notes, personal patterns, ad-hoc insights
Plan Forge Session Bridge	Structured `/memories/repo/current-phase.md` managed by pipeline prompts	Carrying Session 1 → 2 → 3 state through the hardening pipeline
OpenBrain	Semantic vector memory via MCP `search_thoughts` / `capture_thought`	Auto-injecting relevant prior decisions before each slice — no manual prompting needed

All three are complementary. A typical phase uses all three: Copilot Memory for quick mid-session notes, the session bridge files for structured phase handoffs, and OpenBrain for surfacing past decisions automatically.

🔥 Concepts

What are Temper Guards?

Temper Guards are tables embedded in every instruction file that document common shortcuts AI agents use to cut corners — like "this is too simple to test" or "we'll add auth later" — paired with concrete rebuttals explaining why the shortcut breaks production quality. Named after the metallurgical tempering process that strengthens steel against brittle failure.

What is Forge Quench?

Forge Quench (/forge-quench) is a code simplification skill that systematically reduces complexity while preserving exact behavior. It follows the Chesterton's Fence principle — always understand WHY code is complex before simplifying it. The 5-step workflow: Measure → Understand First → Propose → Apply & Prove → Report. Each simplification is committed individually with tests run after every change.

What is the Skill Blueprint?

The Skill Blueprint (docs/SKILL-BLUEPRINT.md) is the formal specification for Plan Forge skill files. Every skill follows this format: Frontmatter → Trigger → Steps → Safety Rules → Temper Guards → Warning Signs → Exit Proof → Persistent Memory. Extension contributors use the blueprint to create skills that are consistent with the built-in ones.

Frequently Asked Questions