Capability Reference

Everything Plan Forge can do — tools, commands, agents, skills, telemetry, and integrations. One page, complete coverage.

105 MCP tools displayed in a honeycomb grid
105 MCP Tools ~19 Agents 13 Skills 38 Dashboard Tabs Quorum Mode 🛡️ LiveGuard — GA
👋

New here? Start with the vocabulary

This page uses a handful of Plan Forge terms over and over. Here's what each one means in plain language — once these click, the rest of the page reads easily. Already fluent? Skip ahead to The Four Stations.

MCP tool

A command your AI assistant (GitHub Copilot, Claude, Cursor — any MCP client) can run for you right inside your editor. Plan Forge ships 105 of them, all named forge_*. You never have to memorize them — just ask in chat.

Agent

A specialized reviewer persona — for example a security, database, or performance reviewer. An agent reads your code and gives focused feedback; it can't edit files, so it's safe to ask for a second opinion anytime.

Skill

A guided, multi-step workflow you trigger by typing a slash command in chat — like /code-review or /test-sweep. Think of it as a checklist the AI runs for you, the same way every time.

Plan & slice

A plan is a markdown file describing a feature, broken into small ordered steps. Each step is a slice — a chunk small enough to build and verify on its own. Plan Forge builds your feature one slice at a time.

Gate

An automated check — tests, lint, or scope rules — that must pass before a slice is accepted. If a gate fails, Plan Forge retries the slice instead of moving on. This is what keeps the AI from drifting off track.

Quorum mode

For tricky steps, Plan Forge asks three AI models the same question, compares their answers, and merges the best one — a built-in second and third opinion. Optional, and you control when it kicks in.

The four stations — the journey of your code

Everything on this page belongs to one of four stages. 🪨 Smelt turns a raw idea into a clear plan · 🔨 Forge builds the actual code, slice by slice · 🛡️ Guard (LiveGuard) keeps watching after your code ships · 🧠 Learn remembers what worked so the next project goes faster.

A real run, measured — these are example numbers from executing a 3-step plan, not marketing estimates.

16s
3 slices executed
24/24
pipeline gates pass
337
self-test files
23
AI models supported
🛡️

LiveGuard — Post-Coding Intelligence

Shipped — GA since v2.30

The forge builds your code. LiveGuard watches after it ships. 14 MCP tools, 22 REST endpoints, 3 lifecycle hooks, and an optional OpenClaw analytics bridge — all surfaced in a LIVEGUARD section of the unified dashboard. Secret scanning and env-diff landed in v2.28; self-healing fix proposals and composite forge_liveguard_run landed in v2.29–v2.30; Watcher bridge in v2.34–v2.35.

14 MCP tools 22 REST endpoints 5 dashboard tabs 3 lifecycle hooks OpenClaw analytics bridge

The Four Stations

Plan Forge is an AI-Native SDLC Forge Shop. Every capability on this page lives in one of four stations. See the full Shop Tour for deep-dive walkthroughs.

🪨

Smelt

Raw idea → Scope Contract

  • Specifier agent · /specify
  • Hardener · /harden-plan
  • Project Principles
  • Crucible (idea intake)
  • Tempering gates
🔨

Forge

Contract → Shipped code

  • pforge run-plan
  • Slice gates + quorum mode
  • Agent-per-slice routing
  • Auto-escalation
  • Fresh-session review
🛡️

Guard

Post-deploy defense (LiveGuard)

  • Secret scan · Env diff
  • Drift report · Regression guard
  • Incident capture · Triage
  • Watcher + Watcher-live
  • Remote bridge (Telegram/Slack)
🧠

Learn

Memory & retrospectives

  • OpenBrain (L3 memory)
  • Bug registry (closed-loop)
  • Testbed scenarios
  • Health DNA fingerprint
  • Forge Intelligence

MCP Tools

These are the commands your AI assistant can run for you — in GitHub Copilot Chat, Claude, Cursor, or any MCP client. You don't call them by hand; you describe what you want in plain English and the assistant picks the right tool. New to Plan Forge? Just ask forge_capabilities first — it returns the whole map (tools, workflows, config, and a glossary) so the assistant knows everything that's available.

forge_capabilities

Full API surface — tools, workflows, config, memory, glossary

forge_run_plan

Builds your whole plan end-to-end — runs each slice in the right order, checks the gates, tracks token cost, and retries failures automatically

forge_abort

Abort active execution between slices

forge_plan_status

How did the last run go? — latest status, per-slice results, and cost

forge_cost_report

Spend by model, monthly aggregation

forge_smith

Environment diagnostics + actionable fixes

forge_validate

Setup file validation

forge_sweep

TODO/FIXME/stub marker scanner

forge_status

Phase status from roadmap

forge_diff

Flags any change that strayed outside what the plan said it would touch (scope drift)

forge_analyze

Consistency scoring — single or quorum (multi-model consensus)

forge_diagnose

Multi-model bug investigation with quorum synthesis

forge_ext_search

Browse extension catalog

forge_ext_info

Extension details

forge_new_phase

Create plan + roadmap entry

forge_skill_status

Query recent skill execution events

forge_run_skill

Execute skills programmatically with dry-run

forge_generate_image

Generate images via xAI Aurora or OpenAI DALL-E

forge_memory_capture

Normalise & broadcast a memory-captured event; returns capture_thought payload for OpenBrain

forge_github_status

Check GitHub API connectivity, Copilot subscription status, and GitHub Models API availability — returns auth state, rate limits, per-service health

forge_github_metrics

Live GitHub repo metrics via gh CLI — stars, forks, PRs, commit activity

forge_team_dashboard

Multi-developer plan coordination — per-operator stats + conflict-risk assessment

forge_team_activity

Recent run summaries from .forge/team-activity.jsonl

forge_delegate_review

Delegate the current branch's PR to the Copilot Coding Agent for review

forge_export_plan

Convert a loose Copilot cloud-agent plan into a hardened Phase-X-PLAN.md

forge_estimate_quorum

Projected plan cost under all four quorum modes — required before showing any dollar amount

forge_estimate_slice

Projected cost for a single slice — cheaper than full-plan estimate

forge_graph_query

Query the Plan Forge knowledge graph — phase, file, neighbor, recent-changes

forge_patterns_list

Recurring patterns across runs — gate-failure recurrences, model failure rates, cost anomalies

forge_meta_bug_file

File a self-repair meta-bug against Plan Forge itself (plan/orchestrator/prompt defects)

forge_classifier_issue

File a classifier rule update issue when a tempering finding routes to the 'classifier' lane

🛡️ LiveGuard Tools (14 shipped v2.27–v2.30 + 2 Watcher v2.34/v2.35) composite run · forge_liveguard_run

Post-deploy defense layer. Continuously watches a shipped project for architecture drift, dependency vulnerabilities, leaked secrets, regression failures, and health decay — capturing incidents and ranking cross-signal alerts. forge_liveguard_run rolls the whole suite into one composite scan; the two Watcher tools let a second VS Code session tail another project's run read-only.

forge_drift_report

Architecture drift vs. plan baseline

forge_incident_capture

Incident log, MTTR, on-call tracking

forge_dep_watch

Dependency vulnerability change detection

forge_regression_guard

Validation gate pass/fail history

forge_runbook

Operational runbook store and retrieval

forge_hotspot

High-churn / high-failure file detection

forge_health_trend

Long-term health trend + MTTBF scoring

forge_alert_triage

Cross-signal ranked alert list with severity

forge_deploy_journal

Deploy log with pre/post health delta

forge_secret_scan

High-entropy secret detection in staged diffs — values always redacted

forge_env_diff

Env variable key divergence across .env files — keys only, values never read

forge_fix_proposalv2.29

Generates scoped 1-2 slice fix plan from regression/drift/incident/secret failure — capped, human-approved only

forge_quorum_analyzev2.29

Assembles structured quorum prompt from LiveGuard data for multi-model analysis — no LLM calls in server

forge_liveguard_runv2.30

Composite scan: drift + sweep + secrets + regression + deps + alerts + health in one call

forge_watchv2.34/v2.35

Read-only watcher — tail another project's pforge run from a second VS Code session. Snapshot or analyze mode (claude-opus-4.7). Returns counts, anomalies, recommendations, diff cursor.

forge_watch_livev2.35

Live tail — streams events for a fixed duration via target's WebSocket hub or events.log polling fallback. Read-only subscriber.

14 LiveGuard tools (v2.27–v2.30) plus 2 Watcher tools (v2.34/v2.35). All available as MCP tools and REST endpoints. See Chapter 16 — LiveGuard Tools Reference for full documentation. 🗺️ Diagram: tools by trigger window · Health DNA scoring.

🔥 Crucible Tools (8 tools — v2.37 in development) raw idea → hardened spec funnel

The pre-forge funnel. Converts rough ideas into scoped plan files through a lane-aware interview (tweak / feature / full), atomic phase-number claims, and Plan Hardener handoff at finalize. Enforces that every plan has a crucibleId: frontmatter or was grandfathered via --manual-import.

forge_crucible_submit

Start a smelt — infers lane, creates record, emits crucible-smelt-started

forge_crucible_ask

Next interview question with recommended default sourced from L3 memory / principles / prior phases (or null if none)

forge_crucible_preview

Render current draft + list unresolved {{TBD:}} fields

forge_crucible_finalize

Atomically claim next phase number, write docs/plans/<phase>.md, hand off to Plan Hardener

forge_crucible_list

List smelts by status (in-progress / finalized / abandoned)

forge_crucible_abandon

Mark smelt abandoned and release any claimed phase number

forge_crucible_import

Import a Spec Kit project into a smelt — deterministic, LLM-free field mapping (Cursor / Claude Code / Codex / CI)

forge_crucible_status

List smelts by source & status, or inspect a single smelt — audit imported smelts and the smelt archive

Crucible is v2.37 (in development — shipping across 6 slices). Documentation chapter lands in the user manual at v2.37.0 release. 📖 Manual: The Crucible · 🗺️ Diagram: CRITICAL_FIELDS gate.

🔨 Tempering Tools (6 tools — v2.40+) temper-quality scoring

Post-hardening quality pipeline. Scores a plan's Scope Contract clarity, validation gates, slice sizing, and forbidden actions. Maintains an approved-baseline threshold so regressions block future commits.

forge_tempering_run

Run full pipeline (scan + score) against a Crucible-finalized plan; writes temper-score snapshot

forge_tempering_scan

Scan for temper-quality signals (contract clarity, gates, slice sizing, forbidden actions)

forge_tempering_status

Read latest tempering results per plan (score, findings, baseline delta)

forge_tempering_approve_baseline

Approve current tempering score as the new baseline threshold

forge_tempering_drain

Run the audit drain loop — iterates content-audit scan → triage → fix until convergence (v2.80+)

forge_triage_route

Route a finding through the triage classifier — returns lane (bug/spec/classifier) + payload (v2.80+)

📖 Manual: The Audit Loop — the drain + triage flow · 🗺️ Diagram: three-lane triage funnel.

🐛 Bug Registry Tools (4 tools — v2.45+) native bug tracking

First-class bug tracking inside Plan Forge — register, filter, transition, and validate fixes. Surfaces in the dashboard timeline + Bug Registry tab, and LiveGuard incidents can auto-link to registered bugs.

forge_bug_register

Register a bug with severity, title, description, affected files, linked plan/slice

forge_bug_list

List bugs with status/severity/plan filters

forge_bug_update_status

Transition state (open → investigating → in-progress → resolved → closed)

forge_bug_validate_fix

Verify proposed fix against bug description + linked slice gates

📖 Manual: Bug Registry · 🗺️ Diagram: bug status machine.

🧪 Testbed Tools (3 tools — v2.50+) happy-path scenarios

End-to-end scenario runner against an isolated testbed repository. Guards every release with Chapter 8 happy-path regression validation; failures produce findings linked to the causing change.

forge_testbed_run

Execute a single scenario by ID against the configured testbed project

forge_testbed_happypath

Run all happy-path scenarios sequentially, aggregate pass/fail summary

forge_testbed_findings

Read cumulative testbed findings (failures, flaky scenarios, runtime trends)

📖 Manual: The Testbed.

🕸️ Lattice, Hallmark & Anvil — Code Intelligence (5 MCP tools + CLI — v2.95+) code-graph · provenance · cache

The Lattice code-graph engine builds a semantic chunk index and BFS call-graph over any git repository (5 MCP tools). Hallmark attaches a lightweight hallmark/v1 provenance envelope to any artifact so drift detection can verify source integrity across sessions (2 MCP tools + CLI mirror + SDK). Anvil is the content-hash-keyed memoization cache that prevents re-indexing unchanged files and owns the L2→L3 dead-letter queue (5 MCP tools + CLI mirror). See Chapter 25 — How the Shop Remembers for the plain-English tour.

forge_lattice_index

Build or update the Lattice chunk index; --since enables incremental re-indexing from a git SHA

forge_lattice_stat

Index statistics: chunk count, edge count, language breakdown, Anvil hit rate, index size

forge_lattice_query

Full-text search over the chunk index; returns bounded 80-char snippets ranked by camelCase-aware token-overlap score (v3.5.1+)

forge_lattice_callers

Find all callers of a named symbol using the edge graph

forge_lattice_blast

BFS call-graph traversal up to depth 5; returns truncated: true when frontier is capped

forge_hallmark_show · verify

MCP — read or drift-check a hallmark/v1 provenance record (schema version, tool name, captured timestamp, content hash). Mirrored at pforge hallmark show · verify. SDK at pforge-sdk/hallmark.

forge_anvil_stat · clear · rebuild · dlq_list · dlq_drain

MCP — memoization cache stats, selective invalidation by tool or git SHA, dead-letter queue list/drain. Mirrored at pforge anvil stat · clear · rebuild · dlq list|drain. Lives under .forge/anvil/.

Lattice, Hallmark, and Anvil ship in v2.95.0. Hallmark and Anvil are exposed as both MCP tools and CLI commands — the MCP forms let agents invoke them in-session, the CLI mirrors let shell scripts and humans use the same operations. See pforge lattice --help, pforge hallmark --help, pforge anvil --help. 🗺️ Diagram: knowledge-graph schema.

🧠 Copilot Memory Sync (2 MCP tools — v2.99+) memory bridge · cheaper models

Bridges forge memory upward into GitHub Copilot's own Memory store — the next IDE session auto-discovers project decisions, lessons, and patterns without requiring OpenBrain configuration. Soft-sync is additive and hash-deduped, so safe to run repeatedly. Together with Hallmark provenance, Anvil DLQ, and the Lattice code-graph, this completes the v3.x memory upgrades that let cheaper, faster models produce flagship-grade results. Full plain-English tour: Chapter 25 — How the Shop Remembers. 🗺️ Diagram: the Copilot trilogy.

forge_sync_memories

Generate .github/copilot-memory-hints.md from forge decisions — trajectory notes, auto-skills, brain L2 entries. CLI: pforge sync-memories.

forge_sync_instructions

Generate .github/copilot-instructions.md from project profile + principles + .forge.json. Completes the Copilot integration trilogy. CLI: pforge sync-instructions.

🧭 Forge-Master Studio (2 tools + dashboard — v2.63+, Phase-43 audit) open-ended reasoning · read-only · CTO defaults
Forge-Master orchestrating ghostly apprentice-smiths at their anvils — a visual metaphor for multi-agent tool orchestration

A read-only reasoning orchestrator. Classifies user intent into one of 8 lanes (build, operational, troubleshoot, advisory, offtopic, tempering, principle-judgment, meta-bug-triage), retrieves OpenBrain memory context, and orchestrates other forge tools on the agent's behalf. Phase-43 flipped the CTO defaults on (observer enabled, L3 memory enabled, autoEscalate on, quorumAdvisory=auto), shipped lane-specific system-prompt overlays (advisory-cto, build-interviewer, troubleshoot-sre), and added the closed-loop forge_master_audit tool plus the pforge audit CLI. Phase-29 added the Forge-Master Studio dashboard tab with a curated prompt gallery, streaming chat, and a live tool-call trace pane. 📖 Manual: Forge-Master · 🗺️ Diagram: 3-stage intent classifier.

forge_master_ask

Accepts a freeform message. Returns a structured reasoning response built from intent classification, lane-specific prompt overlay, memory retrieval, and allowlist-gated read-only tool calls (≈38 read tools restored in Phase-43).

forge_master_audit (Phase-43)

Holistic CTO audit. Pulls drift, cost, open bugs, watcher alerts, deploy journal, and Crucible smelts; returns bounded report: summary, top 3 risks with evidence, P0/P1/P2 actions (≤5), cost note. Drives the weekly digest and incident-triggered review.

Studio tab · prompt gallery · chat stream · tool-call trace

Dashboard UI at localhost:3100/dashboard. Also available as CLI via pforge forge-master status|logs and the new pforge audit [--since --tier --schedule --on-incident].

🧭 Collaboration, Notifications & Dashboard (10 tools) reviews, alerts, search, memory, release
forge_review_add

Capture a review thread (audit, gate failure, drift finding) linked to plan/slice

forge_review_list

List open/resolved review threads

forge_review_resolve

Mark a review thread resolved with outcome + rationale

forge_notify_send

Emit notification through configured channels (Telegram, Slack, webhook, email)

forge_notify_test

Smoke-test every notification channel; returns success/failure per channel

forge_home_snapshot

Build the dashboard Home tab payload (run state, drift, incidents, cost, health DNA)

forge_timeline

Unified cursor-paged timeline across runs, incidents, deploys, bugs, Crucible, Tempering

forge_search

Cross-surface search over plans, events, bugs, incidents, memory (filters by type/date/severity)

forge_memory_report

OpenBrain memory usage — captures per day, hit rate on searches, top-recalled thoughts

forge_org_rules

Export aggregated .github/instructions/*.md as a single org-rules document

forge_doctor_quorum

Health-check every quorum participant — auth, latency, rate-limit headers, availability

forge_delegate_to_agent

Delegate a prompt/slice to a specialized reviewer agent (database, security, performance, …)

forge_self_update

Check for the latest Plan Forge release, fetch release notes, and optionally install

Total: 105 MCP tools across all subsystems. Call forge_capabilities or open pforge-mcp/tools.json for the machine-readable surface.

Autonomous Execution

📖 Manual: Advanced Execution · 🗺️ Diagram: parallel slice DAG

Full Auto

One command. pforge run-plan spawns gh copilot CLI for each slice. Gates validate at every boundary. Supports Claude, GPT, and Gemini via --model.

Assisted

You code in VS Code Copilot. Orchestrator prompts you per slice and validates gates automatically. Best of both: human creativity + automated quality.

Cloud Agent

Copilot cloud agent provisions the environment via copilot-setup-steps.yml. Guardrails auto-load, all 105 MCP tools are available, and forge_run_plan executes slices autonomously on GitHub Issues. Use --worker copilot-coding-agent to route each slice to a Copilot cloud agent session via GitHub Issue dispatch.

Parallel

[P]-tagged slices run concurrently. DAG-aware scheduling with scope conflict detection. Up to maxParallelism: 3 workers.

Agent-Per-Slice Routing

Assign a different AI model to each execution role. The orchestrator auto-selects based on the current operation — tune cost vs. quality at every stage without changing your plan files.

📖 Manual: Multi-Agent Routing · 🗺️ Diagram: host-aware routing

default
claude-opus-4.6

Spec, harden, review operations

execute
gpt-5.2-codex

Writing code, generating tests

review
claude-sonnet-4.6

Gate checks, drift detection

// .forge.json
"modelRouting": { "default": "claude-opus-4.6", "execute": "gpt-5.2-codex", "review": "claude-sonnet-4.6" }

Auto-Escalation

When a slice fails on one model, the orchestrator automatically walks the escalationChain and retries on the next model — no manual intervention. Emits a slice-escalated event on each re-route.

📖 Manual: Advanced Execution · 🗺️ Diagram: escalation chain

Attempt 0

Configured model (or modelRouting.execute)

Attempt 1+

Walks chain in order — "auto" defers to execute routing

Event

slice-escalatedsliceId, reason, models

// .forge.json
"escalationChain": ["auto", "claude-sonnet-4.6", "claude-opus-4.6"]

Model Performance Tracking

Per-slice performance data is appended to .forge/model-performance.json after every run. The orchestrator reads this on startup and auto-selects the cheapest model with >80% historical success rate for each slice type.

📖 Manual: Cost & Economics

Auto-Selection

--estimate shows recommended model per slice with historical success rate. Agent-per-slice routing uses this data to tune cost vs. quality automatically.

Dashboard Cost Tab

Model Comparison table shows: run count, pass rate (color-coded), average duration, cost per run, total tokens — aggregated from model-performance.json.

Quorum Mode

Multi-model consensus: dispatch complex slices to 3 AI models for independent analysis, synthesize the best approach, then execute with higher confidence. A/B tested: +20% more tests, better code structure, fewer brittle patterns vs single-model execution. Read the full A/B test results →

📖 Manual: Quorum Mode · 🗺️ Diagram: quorum flow · complexity rubric

// Quorum workflow per slice
executeSlice(slice)
├─ scoreComplexity() → 1-10 score (7 weighted signals)
├─ score < threshold → normal execution
└─ score ≥ threshold → quorumDispatch()
├─ Claude Opus 4.6 → dry-run plan ─┐
├─ GPT-5.3-Codex → dry-run plan ─┼─ Promise.all()
└─ Grok 4.20 → dry-run plan ─┘
quorumReview() ← synthesize best approach per file
spawnWorker(enhancedPrompt) ← execute with consensus
gate ✓

Complexity Scoring

7 weighted signals: file scope (20%), cross-module deps (20%), security keywords (15%), database keywords (15%), gate count (10%), task count (10%), historical failure rate (10%).

Auto Mode

--quorum=auto triggers quorum only on high-complexity slices (score ≥ 6). Simple CRUD runs normally. Best of both: quality where it matters, speed where it doesn't.

Graceful Degradation

If <2 models respond, falls back to normal execution. If reviewer fails, uses best single dry-run. No model unavailability blocks your pipeline.

A/B Tested

Invoice Engine (rate tiers, discounts, tax, rounding): quorum produced 20% more tests, extracted DRY helpers, used idiomatic .NET patterns, and caught edge cases the single model missed.

A/B Test: Invoice Engine (4 slices, rate tiers + discounts + tax + banker's rounding)

MetricStandardQuorum (3 models)Delta
Pass rate4/44/4Tie
Duration12 min32 min+168%
Tests generated1518+20%
DRY helpersInlineExtractedBetter
Test datesHardcoded (fragile)Relative (robust)Better
Edge case coverageStandard+voided regen, +sequenceBetter

Quorum Presets

PresetModelsReviewerThresholdTimeout
--quorum=powerClaude Opus 4.6 + GPT-5.3-Codex + Grok 4.20 ReasoningOpus55 min
--quorum=speedClaude Sonnet 4.6 + GPT-5.4-mini + Grok 4.1 Fast ReasoningSonnet72 min

Available via CLI (--quorum=power), MCP (quorum: "power"), and config (.forge.jsonquorum.preset: "power").

Web UI — Live Dashboard

localhost:3100/dashboard — 38 real-time tabs via WebSocket, grouped into Forge, LiveGuard, Forge-Master, and Settings. No build step. Also runs standalone: node pforge-mcp/server.mjs --dashboard-only (8 core tabs shown below)

📖 Manual: The Dashboard · 🗺️ Diagram: tab taxonomy (38 tabs)

📊

Progress

Live slice cards

📋

Runs

History table

💰

Cost

Model breakdown

Actions

One-click tools

🔄

Replay

Session logs

🧩

Extensions

Catalog browser

⚙️

Config

Visual editor

🔍

Traces

OTLP waterfall

Agents & Skills

📖 Manual: Instructions & Agents · Multi-Agent Mode

~19 Reviewer Agents

Stack (6-7 per preset): architecture, database, deploy, performance, security, test-runner (+ stack-specific extras)

Cross-stack (8): accessibility, api-contract, cicd, compliance, dependency, error-handling, multi-tenancy, observability

Pipeline (5): specifier → plan-hardener → executor → reviewer-gate → shipper

Audit (1): classifier-reviewer (audit-loop triage)

AI Tool Adapters

pforge init -Agent <tool> generates adapter files for each platform:

copilot.github/copilot-instructions.md (default)

claudeCLAUDE.md + .claude/commands/

cursor.cursorrules + .cursor/rules/

windsurf.windsurfrules + .windsurf/workflows/

geminiGEMINI.md + .gemini/commands/ + MCP config

generic.ai/instructions.md (configurable dir)

all — all adapters at once

13 Slash Command Skills

/database-migration · /staging-deploy · /test-sweep

/dependency-audit · /security-audit · /code-review

/release-notes · /api-doc-gen · /onboarding

/health-check · /forge-execute · /forge-troubleshoot

/forge-quench

/forge-quenchReduce code complexity while preserving behavior — Chesterton's Fence

Every skill follows the Skill Blueprint format and includes Temper Guards, Warning Signs, and Exit Proof sections.

Temper Guards & Warning Signs — Every instruction file includes tables of common shortcuts agents use (with rebuttals) and observable anti-patterns that indicate the file's guidance is being violated.

Observability & Memory

📖 Manual: Memory Architecture · 🗺️ Diagram: three-tier memory capture

Memory Layers

Plan Forge uses three distinct memory systems — each with a specific role in the 3-session pipeline. They're complementary, not competing.

LayerWhat It IsScopeBest For
Copilot Memory/memories/ built-in note storage (user / session / repo scopes)User / Session / RepoFree-form notes, personal patterns, ad-hoc insights
Plan Forge Session BridgeStructured /memories/repo/current-phase.md + lessons-learned.mdRepositoryCarrying Session 1 → 2 → 3 state through the hardening pipeline
OpenBrainSemantic vector memory via MCP search_thoughts / capture_thoughtGlobalAuto-injecting prior decisions before each slice — no manual prompting

OTLP Telemetry

Every run produces trace.json with resource context, span kinds (SERVER/INTERNAL/CLIENT), severity levels, and log summaries.

  • Per-run manifest + global index (append-only, corruption-tolerant)
  • Dashboard Traces tab with waterfall timeline
  • Optional OTLP collector forwarding (Jaeger, Aspire, Grafana)

OpenBrain Context Injection Docs

Plan Forge's L3 memory layer (built in as of v3.6, no extension needed). Prior decisions and conventions are searched and injected as context before each slice begins, bridging the 3-session model with long-term memory.

  • Context injected before each slice (search_thoughts)
  • Decisions captured after each slice (capture_thought)
  • Cost anomaly detection (>2x average triggers insight)
  • Run summary captured for future phase planning

Stack Presets

📖 Manual: Customization & Presets

PresetInstructionsAgentsPromptsSkills
.NET1719159
TypeScript1819159
Python1719159
Java1719159
Go1719159
PHP1719159
Rust1719159
Swift1619139
Azure IaC121863

REST API — External Integration

The MCP server exposes a REST API for external agents, CI systems, and tools like OpenClaw. Discover the full surface via GET /api/capabilities or GET /.well-known/plan-forge.json on first connect.

📖 Manual: REST API Reference · 🗺️ Diagram: integration surfaces

Run Control
  • POST /api/runs/trigger — start a plan run remotely
  • POST /api/runs/abort — abort the active run
  • GET /api/runs/status — current run state
Memory
  • POST /api/memory/search — semantic search (OpenBrain)
  • POST /api/memory/capture — normalise + emit memory event
Discovery
  • GET /api/capabilities — full machine-readable surface
  • GET /.well-known/plan-forge.json — RFC 8615 discovery
  • GET /llms.txt — LLM-readable endpoint reference
Auth

Write endpoints accept Authorization: Bearer <secret> or ?token=<secret>. Set bridge.approvalSecret in .forge.json. Without a secret, endpoints are open (local-only use).

Full curl examples and config template: AGENT-SETUP.md Section 6.

Bridge — External Notifications

The Plan Forge Bridge subscribes to the WebSocket hub and dispatches run events to external platforms. Rate-limited (1/5s per channel), with automatic reconnect.

📖 Manual: Remote Bridge · 🗺️ Diagram: bridge fan-out

📨

Telegram

Bot API

💬

Slack

Incoming webhook

🎮

Discord

Webhook

🔗

Generic

Any HTTP endpoint

// .forge.json bridge config
{
  "bridge": {
    "enabled": true,
    "channels": [
      { "type": "telegram", "url": "https://api.telegram.org/bot<TOKEN>/sendMessage", "chatId": "<ID>", "level": "important" },
      { "type": "slack",    "url": "https://hooks.slack.com/services/...", "level": "all" },
      { "type": "discord",  "url": "https://discord.com/api/webhooks/...", "level": "critical" },
      { "type": "webhook",  "url": "https://your-endpoint.example.com/hook", "level": "all" }
    ]
  }
}

Levels: all (every event) · important (run start/end + failures) · critical (failures only)

CI/CD Hook Event

The ci-triggered event is emitted when a CI workflow is dispatched from a plan run. Observable via the WebSocket hub or captured in the run's events.log. The slice-escalated event is emitted when a slice is re-routed to a new model via the escalation chain.

📖 Manual: Event Catalog

ci-triggered

Dispatched when a CI workflow is triggered from a plan run.

  • workflow — workflow file or ID
  • ref — git ref (branch or SHA)
  • inputs — dispatch input parameters

slice-escalated

Emitted when auto-escalation re-routes a slice to the next model in the chain.

  • sliceId — which slice was escalated
  • reason — why escalation triggered
  • models — models tried / next model

Updating an Existing Install

pforge smith automatically checks GitHub for a newer Plan Forge release — 5 s timeout, 24 h cache in .forge/version-check.json, silent when offline.

New version available: vX.Y.Z → run pforge self-update

✓ Preferred: upgrade in place

pforge self-update --force  # latest GitHub release
pforge update              # auto-mode (v2.56.0+)
pforge update --from-github # force GitHub tag

Preserves .forge.json, copilot-instructions.md, project principles, and plan files.

✗ Do not clone to update

git clone https://github.com/srnichols/plan-forge.git

Re-cloning is the first-time install path. For existing installs it can drag -dev bytes onto a clean release and clobber local config.

Control the update source with pforge config set update-source <auto|github-tags|local-sibling> (v2.56.0+). See Manual Appendix G.

Dual-Publish Extensions

pforge ext publish <path> validates the extension and outputs two catalog entries in one command.

Plan Forge Catalog

catalog.json format — installable with pforge ext install and browseable via pforge ext search.

Spec Kit Compatible

extensions.json format for the Spec Kit registry. Extensions marked speckit_compatible: true work in both tools.

GitHub Stack Integration

First-class integration with GitHub Copilot, GitHub Models, and GitHub Actions for cloud-based execution and security-driven plan generation.

📖 Manual: Plan Forge on the GitHub Stack · 🗺️ Diagram: GitHub stack architecture

forge_github_status

Check GitHub API connectivity, Copilot subscription status, and GitHub Models API availability. Returns auth state, rate limits, and per-service health. CLI: pforge github-status

githubAuthauthenticated / unauthenticated
copilotPlanindividual / business / enterprise / none
modelsApiAvailabletrue when models.github.ai/inference is reachable
rateLimitRemainingRemaining GitHub API requests for the hour

GitHub Models

models.github.ai/inference is the recommended API provider for Plan Forge — the default inference endpoint when GITHUB_TOKEN (or gh auth login) is configured.

Supported models: gpt-4o-mini (default), gpt-4o, claude-sonnet-4, claude-opus-4. Set GITHUB_TOKEN to enable; no separate API key required beyond GitHub auth.

Copilot Coding Agent Worker

Dispatch slice execution to the Copilot coding agent instead of the local CLI. Each slice becomes a GitHub Issue; the agent picks it up, opens a PR, and the orchestrator polls for completion.

pforge run-plan <plan> --worker copilot-coding-agent

Requires copilot-setup-steps.yml in .github/ and Copilot for Business or Enterprise. Pre-flight calls forge_github_statuswarn on the assignability check promotes to a hard fail to prevent silent dispatch drops.

plan-from-sarif

Generate a remediation plan from a GitHub Code Scanning SARIF report. Groups findings by CWE / rule ID and emits a hardened Plan Forge plan where each slice targets a specific vulnerability class.

pforge plan-from-sarif <sarif-file> [--severity high,critical] [--output docs/plans/]

High-severity findings are auto-registered via forge_bug_register. Integrates with forge_secret_scan. Gate: pforge run-plan docs/plans/<sarif-plan>.md.

github-metrics

Pull GitHub repository metrics (PR velocity, code frequency, contributor cadence) into the LiveGuard health context.

pforge github-metrics [--repo <owner/repo>] [--window 30d]

Metrics written to .forge/github-metrics.json and surfaced on the Dashboard GitHub tab. forge_health_trend incorporates PR cycle time as a signal when the file is present. Requires GITHUB_TOKEN with repo scope.

Ready to forge?

Machine-readable: forge_capabilities MCP tool · .well-known/plan-forge.json