Capability Reference
Everything Plan Forge can do — tools, commands, agents, skills, telemetry, and integrations. One page, complete coverage.
New here? Start with the vocabulary
This page uses a handful of Plan Forge terms over and over. Here's what each one means in plain language — once these click, the rest of the page reads easily. Already fluent? Skip ahead to The Four Stations.
MCP tool
A command your AI assistant (GitHub Copilot, Claude, Cursor — any MCP client) can run for you right inside your editor. Plan Forge ships 105 of them, all named forge_*. You never have to memorize them — just ask in chat.
Agent
A specialized reviewer persona — for example a security, database, or performance reviewer. An agent reads your code and gives focused feedback; it can't edit files, so it's safe to ask for a second opinion anytime.
Skill
A guided, multi-step workflow you trigger by typing a slash command in chat — like /code-review or /test-sweep. Think of it as a checklist the AI runs for you, the same way every time.
Plan & slice
A plan is a markdown file describing a feature, broken into small ordered steps. Each step is a slice — a chunk small enough to build and verify on its own. Plan Forge builds your feature one slice at a time.
Gate
An automated check — tests, lint, or scope rules — that must pass before a slice is accepted. If a gate fails, Plan Forge retries the slice instead of moving on. This is what keeps the AI from drifting off track.
Quorum mode
For tricky steps, Plan Forge asks three AI models the same question, compares their answers, and merges the best one — a built-in second and third opinion. Optional, and you control when it kicks in.
The four stations — the journey of your code
Everything on this page belongs to one of four stages. 🪨 Smelt turns a raw idea into a clear plan · 🔨 Forge builds the actual code, slice by slice · 🛡️ Guard (LiveGuard) keeps watching after your code ships · 🧠 Learn remembers what worked so the next project goes faster.
A real run, measured — these are example numbers from executing a 3-step plan, not marketing estimates.
LiveGuard — Post-Coding Intelligence
Shipped — GA since v2.30The forge builds your code. LiveGuard watches after it ships. 14 MCP tools, 22 REST endpoints, 3 lifecycle hooks, and an optional OpenClaw analytics bridge — all surfaced in a LIVEGUARD section of the unified dashboard. Secret scanning and env-diff landed in v2.28; self-healing fix proposals and composite forge_liveguard_run landed in v2.29–v2.30; Watcher bridge in v2.34–v2.35.
The Four Stations
Plan Forge is an AI-Native SDLC Forge Shop. Every capability on this page lives in one of four stations. See the full Shop Tour for deep-dive walkthroughs.
Smelt
Raw idea → Scope Contract
- Specifier agent ·
/specify - Hardener ·
/harden-plan - Project Principles
- Crucible (idea intake)
- Tempering gates
Forge
Contract → Shipped code
pforge run-plan- Slice gates + quorum mode
- Agent-per-slice routing
- Auto-escalation
- Fresh-session review
Guard
Post-deploy defense (LiveGuard)
- Secret scan · Env diff
- Drift report · Regression guard
- Incident capture · Triage
- Watcher + Watcher-live
- Remote bridge (Telegram/Slack)
Learn
Memory & retrospectives
- OpenBrain (L3 memory)
- Bug registry (closed-loop)
- Testbed scenarios
- Health DNA fingerprint
- Forge Intelligence
MCP Tools
These are the commands your AI assistant can run for you — in GitHub Copilot Chat, Claude, Cursor, or any MCP client. You don't call them by hand; you describe what you want in plain English and the assistant picks the right tool. New to Plan Forge? Just ask forge_capabilities first — it returns the whole map (tools, workflows, config, and a glossary) so the assistant knows everything that's available.
forge_capabilitiesFull API surface — tools, workflows, config, memory, glossary
forge_run_planBuilds your whole plan end-to-end — runs each slice in the right order, checks the gates, tracks token cost, and retries failures automatically
forge_abortAbort active execution between slices
forge_plan_statusHow did the last run go? — latest status, per-slice results, and cost
forge_cost_reportSpend by model, monthly aggregation
forge_smithEnvironment diagnostics + actionable fixes
forge_validateSetup file validation
forge_sweepTODO/FIXME/stub marker scanner
forge_statusPhase status from roadmap
forge_diffFlags any change that strayed outside what the plan said it would touch (scope drift)
forge_analyzeConsistency scoring — single or quorum (multi-model consensus)
forge_diagnoseMulti-model bug investigation with quorum synthesis
forge_ext_searchBrowse extension catalog
forge_ext_infoExtension details
forge_new_phaseCreate plan + roadmap entry
forge_skill_statusQuery recent skill execution events
forge_run_skillExecute skills programmatically with dry-run
forge_generate_imageGenerate images via xAI Aurora or OpenAI DALL-E
forge_memory_captureNormalise & broadcast a memory-captured event; returns capture_thought payload for OpenBrain
forge_github_statusCheck GitHub API connectivity, Copilot subscription status, and GitHub Models API availability — returns auth state, rate limits, per-service health
forge_github_metricsLive GitHub repo metrics via gh CLI — stars, forks, PRs, commit activity
forge_team_dashboardMulti-developer plan coordination — per-operator stats + conflict-risk assessment
forge_team_activityRecent run summaries from .forge/team-activity.jsonl
forge_delegate_reviewDelegate the current branch's PR to the Copilot Coding Agent for review
forge_export_planConvert a loose Copilot cloud-agent plan into a hardened Phase-X-PLAN.md
forge_estimate_quorumProjected plan cost under all four quorum modes — required before showing any dollar amount
forge_estimate_sliceProjected cost for a single slice — cheaper than full-plan estimate
forge_graph_queryQuery the Plan Forge knowledge graph — phase, file, neighbor, recent-changes
forge_patterns_listRecurring patterns across runs — gate-failure recurrences, model failure rates, cost anomalies
forge_meta_bug_fileFile a self-repair meta-bug against Plan Forge itself (plan/orchestrator/prompt defects)
forge_classifier_issueFile a classifier rule update issue when a tempering finding routes to the 'classifier' lane
Post-deploy defense layer. Continuously watches a shipped project for architecture drift, dependency vulnerabilities, leaked secrets, regression failures, and health decay — capturing incidents and ranking cross-signal alerts. forge_liveguard_run rolls the whole suite into one composite scan; the two Watcher tools let a second VS Code session tail another project's run read-only.
forge_drift_reportArchitecture drift vs. plan baseline
forge_incident_captureIncident log, MTTR, on-call tracking
forge_dep_watchDependency vulnerability change detection
forge_regression_guardValidation gate pass/fail history
forge_runbookOperational runbook store and retrieval
forge_hotspotHigh-churn / high-failure file detection
forge_health_trendLong-term health trend + MTTBF scoring
forge_alert_triageCross-signal ranked alert list with severity
forge_deploy_journalDeploy log with pre/post health delta
forge_secret_scanHigh-entropy secret detection in staged diffs — values always redacted
forge_env_diffEnv variable key divergence across .env files — keys only, values never read
forge_fix_proposalv2.29Generates scoped 1-2 slice fix plan from regression/drift/incident/secret failure — capped, human-approved only
forge_quorum_analyzev2.29Assembles structured quorum prompt from LiveGuard data for multi-model analysis — no LLM calls in server
forge_liveguard_runv2.30Composite scan: drift + sweep + secrets + regression + deps + alerts + health in one call
forge_watchv2.34/v2.35Read-only watcher — tail another project's pforge run from a second VS Code session. Snapshot or analyze mode (claude-opus-4.7). Returns counts, anomalies, recommendations, diff cursor.
forge_watch_livev2.35Live tail — streams events for a fixed duration via target's WebSocket hub or events.log polling fallback. Read-only subscriber.
14 LiveGuard tools (v2.27–v2.30) plus 2 Watcher tools (v2.34/v2.35). All available as MCP tools and REST endpoints. See Chapter 16 — LiveGuard Tools Reference for full documentation. 🗺️ Diagram: tools by trigger window · Health DNA scoring.
The pre-forge funnel. Converts rough ideas into scoped plan files through a lane-aware interview (tweak / feature / full), atomic phase-number claims, and Plan Hardener handoff at finalize. Enforces that every plan has a crucibleId: frontmatter or was grandfathered via --manual-import.
forge_crucible_submitStart a smelt — infers lane, creates record, emits crucible-smelt-started
forge_crucible_askNext interview question with recommended default sourced from L3 memory / principles / prior phases (or null if none)
forge_crucible_previewRender current draft + list unresolved {{TBD:}} fields
forge_crucible_finalizeAtomically claim next phase number, write docs/plans/<phase>.md, hand off to Plan Hardener
forge_crucible_listList smelts by status (in-progress / finalized / abandoned)
forge_crucible_abandonMark smelt abandoned and release any claimed phase number
forge_crucible_importImport a Spec Kit project into a smelt — deterministic, LLM-free field mapping (Cursor / Claude Code / Codex / CI)
forge_crucible_statusList smelts by source & status, or inspect a single smelt — audit imported smelts and the smelt archive
Crucible is v2.37 (in development — shipping across 6 slices). Documentation chapter lands in the user manual at v2.37.0 release. 📖 Manual: The Crucible · 🗺️ Diagram: CRITICAL_FIELDS gate.
Post-hardening quality pipeline. Scores a plan's Scope Contract clarity, validation gates, slice sizing, and forbidden actions. Maintains an approved-baseline threshold so regressions block future commits.
forge_tempering_runRun full pipeline (scan + score) against a Crucible-finalized plan; writes temper-score snapshot
forge_tempering_scanScan for temper-quality signals (contract clarity, gates, slice sizing, forbidden actions)
forge_tempering_statusRead latest tempering results per plan (score, findings, baseline delta)
forge_tempering_approve_baselineApprove current tempering score as the new baseline threshold
forge_tempering_drainRun the audit drain loop — iterates content-audit scan → triage → fix until convergence (v2.80+)
forge_triage_routeRoute a finding through the triage classifier — returns lane (bug/spec/classifier) + payload (v2.80+)
📖 Manual: The Audit Loop — the drain + triage flow · 🗺️ Diagram: three-lane triage funnel.
First-class bug tracking inside Plan Forge — register, filter, transition, and validate fixes. Surfaces in the dashboard timeline + Bug Registry tab, and LiveGuard incidents can auto-link to registered bugs.
forge_bug_registerRegister a bug with severity, title, description, affected files, linked plan/slice
forge_bug_listList bugs with status/severity/plan filters
forge_bug_update_statusTransition state (open → investigating → in-progress → resolved → closed)
forge_bug_validate_fixVerify proposed fix against bug description + linked slice gates
End-to-end scenario runner against an isolated testbed repository. Guards every release with Chapter 8 happy-path regression validation; failures produce findings linked to the causing change.
forge_testbed_runExecute a single scenario by ID against the configured testbed project
forge_testbed_happypathRun all happy-path scenarios sequentially, aggregate pass/fail summary
forge_testbed_findingsRead cumulative testbed findings (failures, flaky scenarios, runtime trends)
The Lattice code-graph engine builds a semantic chunk index and BFS call-graph over any git repository (5 MCP tools). Hallmark attaches a lightweight hallmark/v1 provenance envelope to any artifact so drift detection can verify source integrity across sessions (2 MCP tools + CLI mirror + SDK). Anvil is the content-hash-keyed memoization cache that prevents re-indexing unchanged files and owns the L2→L3 dead-letter queue (5 MCP tools + CLI mirror). See Chapter 25 — How the Shop Remembers for the plain-English tour.
forge_lattice_indexBuild or update the Lattice chunk index; --since enables incremental re-indexing from a git SHA
forge_lattice_statIndex statistics: chunk count, edge count, language breakdown, Anvil hit rate, index size
forge_lattice_queryFull-text search over the chunk index; returns bounded 80-char snippets ranked by camelCase-aware token-overlap score (v3.5.1+)
forge_lattice_callersFind all callers of a named symbol using the edge graph
forge_lattice_blastBFS call-graph traversal up to depth 5; returns truncated: true when frontier is capped
forge_hallmark_show · verifyMCP — read or drift-check a hallmark/v1 provenance record (schema version, tool name, captured timestamp, content hash). Mirrored at pforge hallmark show · verify. SDK at pforge-sdk/hallmark.
forge_anvil_stat · clear · rebuild · dlq_list · dlq_drainMCP — memoization cache stats, selective invalidation by tool or git SHA, dead-letter queue list/drain. Mirrored at pforge anvil stat · clear · rebuild · dlq list|drain. Lives under .forge/anvil/.
Lattice, Hallmark, and Anvil ship in v2.95.0. Hallmark and Anvil are exposed as both MCP tools and CLI commands — the MCP forms let agents invoke them in-session, the CLI mirrors let shell scripts and humans use the same operations. See pforge lattice --help, pforge hallmark --help, pforge anvil --help. 🗺️ Diagram: knowledge-graph schema.
Bridges forge memory upward into GitHub Copilot's own Memory store — the next IDE session auto-discovers project decisions, lessons, and patterns without requiring OpenBrain configuration. Soft-sync is additive and hash-deduped, so safe to run repeatedly. Together with Hallmark provenance, Anvil DLQ, and the Lattice code-graph, this completes the v3.x memory upgrades that let cheaper, faster models produce flagship-grade results. Full plain-English tour: Chapter 25 — How the Shop Remembers. 🗺️ Diagram: the Copilot trilogy.
forge_sync_memoriesGenerate .github/copilot-memory-hints.md from forge decisions — trajectory notes, auto-skills, brain L2 entries. CLI: pforge sync-memories.
forge_sync_instructionsGenerate .github/copilot-instructions.md from project profile + principles + .forge.json. Completes the Copilot integration trilogy. CLI: pforge sync-instructions.
A read-only reasoning orchestrator. Classifies user intent into one of 8 lanes (build, operational, troubleshoot, advisory, offtopic, tempering, principle-judgment, meta-bug-triage), retrieves OpenBrain memory context, and orchestrates other forge tools on the agent's behalf. Phase-43 flipped the CTO defaults on (observer enabled, L3 memory enabled, autoEscalate on, quorumAdvisory=auto), shipped lane-specific system-prompt overlays (advisory-cto, build-interviewer, troubleshoot-sre), and added the closed-loop forge_master_audit tool plus the pforge audit CLI. Phase-29 added the Forge-Master Studio dashboard tab with a curated prompt gallery, streaming chat, and a live tool-call trace pane. 📖 Manual: Forge-Master · 🗺️ Diagram: 3-stage intent classifier.
forge_master_askAccepts a freeform message. Returns a structured reasoning response built from intent classification, lane-specific prompt overlay, memory retrieval, and allowlist-gated read-only tool calls (≈38 read tools restored in Phase-43).
forge_master_audit (Phase-43)Holistic CTO audit. Pulls drift, cost, open bugs, watcher alerts, deploy journal, and Crucible smelts; returns bounded report: summary, top 3 risks with evidence, P0/P1/P2 actions (≤5), cost note. Drives the weekly digest and incident-triggered review.
Studio tab · prompt gallery · chat stream · tool-call traceDashboard UI at localhost:3100/dashboard. Also available as CLI via pforge forge-master status|logs and the new pforge audit [--since --tier --schedule --on-incident].
forge_review_addCapture a review thread (audit, gate failure, drift finding) linked to plan/slice
forge_review_listList open/resolved review threads
forge_review_resolveMark a review thread resolved with outcome + rationale
forge_notify_sendEmit notification through configured channels (Telegram, Slack, webhook, email)
forge_notify_testSmoke-test every notification channel; returns success/failure per channel
forge_home_snapshotBuild the dashboard Home tab payload (run state, drift, incidents, cost, health DNA)
forge_timelineUnified cursor-paged timeline across runs, incidents, deploys, bugs, Crucible, Tempering
forge_searchCross-surface search over plans, events, bugs, incidents, memory (filters by type/date/severity)
forge_memory_reportOpenBrain memory usage — captures per day, hit rate on searches, top-recalled thoughts
forge_org_rulesExport aggregated .github/instructions/*.md as a single org-rules document
forge_doctor_quorumHealth-check every quorum participant — auth, latency, rate-limit headers, availability
forge_delegate_to_agentDelegate a prompt/slice to a specialized reviewer agent (database, security, performance, …)
forge_self_updateCheck for the latest Plan Forge release, fetch release notes, and optionally install
Total: 105 MCP tools across all subsystems. Call forge_capabilities or open pforge-mcp/tools.json for the machine-readable surface.
Autonomous Execution
📖 Manual: Advanced Execution · 🗺️ Diagram: parallel slice DAG
Full Auto
One command. pforge run-plan spawns gh copilot CLI for each slice. Gates validate at every boundary. Supports Claude, GPT, and Gemini via --model.
Assisted
You code in VS Code Copilot. Orchestrator prompts you per slice and validates gates automatically. Best of both: human creativity + automated quality.
Cloud Agent
Copilot cloud agent provisions the environment via copilot-setup-steps.yml. Guardrails auto-load, all 105 MCP tools are available, and forge_run_plan executes slices autonomously on GitHub Issues. Use --worker copilot-coding-agent to route each slice to a Copilot cloud agent session via GitHub Issue dispatch.
Parallel
[P]-tagged slices run concurrently. DAG-aware scheduling with scope conflict detection. Up to maxParallelism: 3 workers.
Agent-Per-Slice Routing
Assign a different AI model to each execution role. The orchestrator auto-selects based on the current operation — tune cost vs. quality at every stage without changing your plan files.
📖 Manual: Multi-Agent Routing · 🗺️ Diagram: host-aware routing
claude-opus-4.6
Spec, harden, review operations
gpt-5.2-codex
Writing code, generating tests
claude-sonnet-4.6
Gate checks, drift detection
"modelRouting": { "default": "claude-opus-4.6", "execute": "gpt-5.2-codex", "review": "claude-sonnet-4.6" }
Auto-Escalation
When a slice fails on one model, the orchestrator automatically walks the escalationChain and retries on the next model — no manual intervention. Emits a slice-escalated event on each re-route.
📖 Manual: Advanced Execution · 🗺️ Diagram: escalation chain
Configured model (or modelRouting.execute)
Walks chain in order — "auto" defers to execute routing
slice-escalated — sliceId, reason, models
"escalationChain": ["auto", "claude-sonnet-4.6", "claude-opus-4.6"]
Model Performance Tracking
Per-slice performance data is appended to .forge/model-performance.json after every run. The orchestrator reads this on startup and auto-selects the cheapest model with >80% historical success rate for each slice type.
Auto-Selection
--estimate shows recommended model per slice with historical success rate. Agent-per-slice routing uses this data to tune cost vs. quality automatically.
Dashboard Cost Tab
Model Comparison table shows: run count, pass rate (color-coded), average duration, cost per run, total tokens — aggregated from model-performance.json.
Quorum Mode
Multi-model consensus: dispatch complex slices to 3 AI models for independent analysis, synthesize the best approach, then execute with higher confidence. A/B tested: +20% more tests, better code structure, fewer brittle patterns vs single-model execution. Read the full A/B test results →
📖 Manual: Quorum Mode · 🗺️ Diagram: quorum flow · complexity rubric
Complexity Scoring
7 weighted signals: file scope (20%), cross-module deps (20%), security keywords (15%), database keywords (15%), gate count (10%), task count (10%), historical failure rate (10%).
Auto Mode
--quorum=auto triggers quorum only on high-complexity slices (score ≥ 6). Simple CRUD runs normally. Best of both: quality where it matters, speed where it doesn't.
Graceful Degradation
If <2 models respond, falls back to normal execution. If reviewer fails, uses best single dry-run. No model unavailability blocks your pipeline.
A/B Tested
Invoice Engine (rate tiers, discounts, tax, rounding): quorum produced 20% more tests, extracted DRY helpers, used idiomatic .NET patterns, and caught edge cases the single model missed.
A/B Test: Invoice Engine (4 slices, rate tiers + discounts + tax + banker's rounding)
| Metric | Standard | Quorum (3 models) | Delta |
|---|---|---|---|
| Pass rate | 4/4 | 4/4 | Tie |
| Duration | 12 min | 32 min | +168% |
| Tests generated | 15 | 18 | +20% |
| DRY helpers | Inline | Extracted | Better |
| Test dates | Hardcoded (fragile) | Relative (robust) | Better |
| Edge case coverage | Standard | +voided regen, +sequence | Better |
Quorum Presets
| Preset | Models | Reviewer | Threshold | Timeout |
|---|---|---|---|---|
--quorum=power | Claude Opus 4.6 + GPT-5.3-Codex + Grok 4.20 Reasoning | Opus | 5 | 5 min |
--quorum=speed | Claude Sonnet 4.6 + GPT-5.4-mini + Grok 4.1 Fast Reasoning | Sonnet | 7 | 2 min |
Available via CLI (--quorum=power), MCP (quorum: "power"), and config (.forge.json → quorum.preset: "power").
Web UI — Live Dashboard
localhost:3100/dashboard — 38 real-time tabs via WebSocket, grouped into Forge, LiveGuard, Forge-Master, and Settings. No build step. Also runs standalone: node pforge-mcp/server.mjs --dashboard-only (8 core tabs shown below)
📖 Manual: The Dashboard · 🗺️ Diagram: tab taxonomy (38 tabs)
Progress
Live slice cards
Runs
History table
Cost
Model breakdown
Actions
One-click tools
Replay
Session logs
Extensions
Catalog browser
Config
Visual editor
Traces
OTLP waterfall
Agents & Skills
📖 Manual: Instructions & Agents · Multi-Agent Mode
~19 Reviewer Agents
Stack (6-7 per preset): architecture, database, deploy, performance, security, test-runner (+ stack-specific extras)
Cross-stack (8): accessibility, api-contract, cicd, compliance, dependency, error-handling, multi-tenancy, observability
Pipeline (5): specifier → plan-hardener → executor → reviewer-gate → shipper
Audit (1): classifier-reviewer (audit-loop triage)
AI Tool Adapters
pforge init -Agent <tool> generates adapter files for each platform:
copilot — .github/copilot-instructions.md (default)
claude — CLAUDE.md + .claude/commands/
cursor — .cursorrules + .cursor/rules/
windsurf — .windsurfrules + .windsurf/workflows/
gemini — GEMINI.md + .gemini/commands/ + MCP config
generic — .ai/instructions.md (configurable dir)
all — all adapters at once
13 Slash Command Skills
/database-migration · /staging-deploy · /test-sweep
/dependency-audit · /security-audit · /code-review
/release-notes · /api-doc-gen · /onboarding
/health-check · /forge-execute · /forge-troubleshoot
/forge-quench
/forge-quench | Reduce code complexity while preserving behavior — Chesterton's Fence |
Every skill follows the Skill Blueprint format and includes Temper Guards, Warning Signs, and Exit Proof sections.
Temper Guards & Warning Signs — Every instruction file includes tables of common shortcuts agents use (with rebuttals) and observable anti-patterns that indicate the file's guidance is being violated.
Observability & Memory
📖 Manual: Memory Architecture · 🗺️ Diagram: three-tier memory capture
Memory Layers
Plan Forge uses three distinct memory systems — each with a specific role in the 3-session pipeline. They're complementary, not competing.
| Layer | What It Is | Scope | Best For |
|---|---|---|---|
| Copilot Memory | /memories/ built-in note storage (user / session / repo scopes) | User / Session / Repo | Free-form notes, personal patterns, ad-hoc insights |
| Plan Forge Session Bridge | Structured /memories/repo/current-phase.md + lessons-learned.md | Repository | Carrying Session 1 → 2 → 3 state through the hardening pipeline |
| OpenBrain | Semantic vector memory via MCP search_thoughts / capture_thought | Global | Auto-injecting prior decisions before each slice — no manual prompting |
OTLP Telemetry
Every run produces trace.json with resource context, span kinds (SERVER/INTERNAL/CLIENT), severity levels, and log summaries.
- Per-run manifest + global index (append-only, corruption-tolerant)
- Dashboard Traces tab with waterfall timeline
- Optional OTLP collector forwarding (Jaeger, Aspire, Grafana)
OpenBrain Context Injection Docs
Plan Forge's L3 memory layer (built in as of v3.6, no extension needed). Prior decisions and conventions are searched and injected as context before each slice begins, bridging the 3-session model with long-term memory.
- Context injected before each slice (
search_thoughts) - Decisions captured after each slice (
capture_thought) - Cost anomaly detection (>2x average triggers insight)
- Run summary captured for future phase planning
Stack Presets
📖 Manual: Customization & Presets
| Preset | Instructions | Agents | Prompts | Skills |
|---|---|---|---|---|
| .NET | 17 | 19 | 15 | 9 |
| TypeScript | 18 | 19 | 15 | 9 |
| Python | 17 | 19 | 15 | 9 |
| Java | 17 | 19 | 15 | 9 |
| Go | 17 | 19 | 15 | 9 |
| PHP | 17 | 19 | 15 | 9 |
| Rust | 17 | 19 | 15 | 9 |
| Swift | 16 | 19 | 13 | 9 |
| Azure IaC | 12 | 18 | 6 | 3 |
REST API — External Integration
The MCP server exposes a REST API for external agents, CI systems, and tools like OpenClaw. Discover the full surface via GET /api/capabilities or GET /.well-known/plan-forge.json on first connect.
📖 Manual: REST API Reference · 🗺️ Diagram: integration surfaces
POST /api/runs/trigger— start a plan run remotelyPOST /api/runs/abort— abort the active runGET /api/runs/status— current run state
POST /api/memory/search— semantic search (OpenBrain)POST /api/memory/capture— normalise + emit memory event
GET /api/capabilities— full machine-readable surfaceGET /.well-known/plan-forge.json— RFC 8615 discoveryGET /llms.txt— LLM-readable endpoint reference
Write endpoints accept Authorization: Bearer <secret> or ?token=<secret>. Set bridge.approvalSecret in .forge.json. Without a secret, endpoints are open (local-only use).
Full curl examples and config template: AGENT-SETUP.md Section 6.
Bridge — External Notifications
The Plan Forge Bridge subscribes to the WebSocket hub and dispatches run events to external platforms. Rate-limited (1/5s per channel), with automatic reconnect.
📖 Manual: Remote Bridge · 🗺️ Diagram: bridge fan-out
📨
Telegram
Bot API
💬
Slack
Incoming webhook
🎮
Discord
Webhook
🔗
Generic
Any HTTP endpoint
{
"bridge": {
"enabled": true,
"channels": [
{ "type": "telegram", "url": "https://api.telegram.org/bot<TOKEN>/sendMessage", "chatId": "<ID>", "level": "important" },
{ "type": "slack", "url": "https://hooks.slack.com/services/...", "level": "all" },
{ "type": "discord", "url": "https://discord.com/api/webhooks/...", "level": "critical" },
{ "type": "webhook", "url": "https://your-endpoint.example.com/hook", "level": "all" }
]
}
}
Levels: all (every event) · important (run start/end + failures) · critical (failures only)
CI/CD Hook Event
The ci-triggered event is emitted when a CI workflow is dispatched from a plan run. Observable via the WebSocket hub or captured in the run's events.log. The slice-escalated event is emitted when a slice is re-routed to a new model via the escalation chain.
ci-triggered
Dispatched when a CI workflow is triggered from a plan run.
workflow— workflow file or IDref— git ref (branch or SHA)inputs— dispatch input parameters
slice-escalated
Emitted when auto-escalation re-routes a slice to the next model in the chain.
sliceId— which slice was escalatedreason— why escalation triggeredmodels— models tried / next model
Updating an Existing Install
pforge smith automatically checks GitHub for a newer Plan Forge release — 5 s timeout, 24 h cache in .forge/version-check.json, silent when offline.
✓ Preferred: upgrade in place
pforge self-update --force # latest GitHub release pforge update # auto-mode (v2.56.0+) pforge update --from-github # force GitHub tag
Preserves .forge.json, copilot-instructions.md, project principles, and plan files.
✗ Do not clone to update
git clone https://github.com/srnichols/plan-forge.git
Re-cloning is the first-time install path. For existing installs it can drag -dev bytes onto a clean release and clobber local config.
Control the update source with pforge config set update-source <auto|github-tags|local-sibling> (v2.56.0+). See Manual Appendix G.
Dual-Publish Extensions
pforge ext publish <path> validates the extension and outputs two catalog entries in one command.
Plan Forge Catalog
catalog.json format — installable with pforge ext install and browseable via pforge ext search.
Spec Kit Compatible
extensions.json format for the Spec Kit registry. Extensions marked speckit_compatible: true work in both tools.
GitHub Stack Integration
First-class integration with GitHub Copilot, GitHub Models, and GitHub Actions for cloud-based execution and security-driven plan generation.
📖 Manual: Plan Forge on the GitHub Stack · 🗺️ Diagram: GitHub stack architecture
forge_github_status
Check GitHub API connectivity, Copilot subscription status, and GitHub Models API availability. Returns auth state, rate limits, and per-service health. CLI: pforge github-status
| githubAuth | authenticated / unauthenticated |
| copilotPlan | individual / business / enterprise / none |
| modelsApiAvailable | true when models.github.ai/inference is reachable |
| rateLimitRemaining | Remaining GitHub API requests for the hour |
GitHub Models
models.github.ai/inference is the recommended API provider for Plan Forge — the default inference endpoint when GITHUB_TOKEN (or gh auth login) is configured.
Supported models: gpt-4o-mini (default), gpt-4o, claude-sonnet-4, claude-opus-4. Set GITHUB_TOKEN to enable; no separate API key required beyond GitHub auth.
Copilot Coding Agent Worker
Dispatch slice execution to the Copilot coding agent instead of the local CLI. Each slice becomes a GitHub Issue; the agent picks it up, opens a PR, and the orchestrator polls for completion.
Requires copilot-setup-steps.yml in .github/ and Copilot for Business or Enterprise. Pre-flight calls forge_github_status — warn on the assignability check promotes to a hard fail to prevent silent dispatch drops.
plan-from-sarif
Generate a remediation plan from a GitHub Code Scanning SARIF report. Groups findings by CWE / rule ID and emits a hardened Plan Forge plan where each slice targets a specific vulnerability class.
High-severity findings are auto-registered via forge_bug_register. Integrates with forge_secret_scan. Gate: pforge run-plan docs/plans/<sarif-plan>.md.
github-metrics
Pull GitHub repository metrics (PR velocity, code frequency, contributor cadence) into the LiveGuard health context.
Metrics written to .forge/github-metrics.json and surfaced on the Dashboard GitHub tab. forge_health_trend incorporates PR cycle time as a signal when the file is present. Requires GITHUB_TOKEN with repo scope.
Ready to forge?
Machine-readable: forge_capabilities MCP tool · .well-known/plan-forge.json