Plan ForgePlan Forge

Capability Reference

Everything Plan Forge can do — tools, commands, agents, skills, telemetry, and integrations. One page, complete coverage.

14 MCP Tools 19 Agents 8 Skills 8 Dashboard Tabs Quorum Mode
16s
3 slices executed
24/24
pipeline gates pass
83
self-tests passing
23
AI models supported

MCP Tools

All tools callable via Copilot Chat, Claude, Cursor, or any MCP client. Start with forge_capabilities for full discovery.

forge_capabilities

Full API surface — tools, workflows, config, memory, glossary

forge_run_plan

Execute plan — DAG scheduling, gates, token tracking, retry

forge_abort

Abort active execution between slices

forge_plan_status

Latest run status from .forge/runs/

forge_cost_report

Spend by model, monthly aggregation

forge_smith

Environment diagnostics + actionable fixes

forge_validate

Setup file validation

forge_sweep

TODO/FIXME/stub marker scanner

forge_status

Phase status from roadmap

forge_diff

Scope drift detection

forge_analyze

Consistency scoring (0-100)

forge_ext_search

Browse extension catalog

forge_ext_info

Extension details

forge_new_phase

Create plan + roadmap entry

Autonomous Execution

Full Auto

One command. pforge run-plan spawns gh copilot CLI for each slice. Gates validate at every boundary. Supports Claude, GPT, and Gemini via --model.

Assisted

You code in VS Code Copilot. Orchestrator prompts you per slice and validates gates automatically. Best of both: human creativity + automated quality.

Parallel

[P]-tagged slices run concurrently. DAG-aware scheduling with scope conflict detection. Up to maxParallelism: 3 workers.

Quorum Mode

Multi-model consensus: dispatch complex slices to 3 AI models for independent analysis, synthesize the best approach, then execute with higher confidence. A/B tested: +20% more tests, better code structure, fewer brittle patterns vs single-model execution.

// Quorum workflow per slice
executeSlice(slice)
├─ scoreComplexity() → 1-10 score (7 weighted signals)
├─ score < threshold → normal execution
└─ score ≥ threshold → quorumDispatch()
├─ Claude Opus 4.6 → dry-run plan ─┐
├─ GPT-5.3-Codex → dry-run plan ─┼─ Promise.all()
└─ Claude Sonnet → dry-run plan ─┘
quorumReview() ← synthesize best approach per file
spawnWorker(enhancedPrompt) ← execute with consensus
gate ✓

Complexity Scoring

7 weighted signals: file scope (20%), cross-module deps (20%), security keywords (15%), database keywords (15%), gate count (10%), task count (10%), historical failure rate (10%).

Auto Mode

--quorum=auto triggers quorum only on high-complexity slices (score ≥ 7). Simple CRUD runs normally. Best of both: quality where it matters, speed where it doesn't.

Graceful Degradation

If <2 models respond, falls back to normal execution. If reviewer fails, uses best single dry-run. No model unavailability blocks your pipeline.

A/B Tested

Invoice Engine (rate tiers, discounts, tax, rounding): quorum produced 20% more tests, extracted DRY helpers, used idiomatic .NET patterns, and caught edge cases the single model missed.

A/B Test: Invoice Engine (4 slices, rate tiers + discounts + tax + banker's rounding)

MetricStandardQuorum (3 models)Delta
Pass rate4/44/4Tie
Duration12 min32 min+168%
Tests generated1518+20%
DRY helpersInlineExtractedBetter
Test datesHardcoded (fragile)Relative (robust)Better
Edge case coverageStandard+voided regen, +sequenceBetter

Live Dashboard

localhost:3100/dashboard — 8 tabs, real-time via WebSocket, no build step required.

📊

Progress

Live slice cards

📋

Runs

History table

💰

Cost

Model breakdown

Actions

One-click tools

🔄

Replay

Session logs

🧩

Extensions

Catalog browser

⚙️

Config

Visual editor

🔍

Traces

OTLP waterfall

Agents & Skills

19 Reviewer Agents

Stack (6): architecture, database, deploy, performance, security, test-runner

Cross-stack (8): accessibility, API contracts, CI/CD, compliance, dependency, error handling, multi-tenancy, observability

Pipeline (5): specifier → plan-hardener → executor → reviewer-gate → shipper

8 Slash Command Skills

/database-migration · /staging-deploy · /test-sweep

/dependency-audit · /code-review · /release-notes

/api-doc-gen · /onboarding

Observability & Memory

OTLP Telemetry

Every run produces trace.json with resource context, span kinds (SERVER/INTERNAL/CLIENT), severity levels, and log summaries.

  • Per-run manifest + global index (append-only, corruption-tolerant)
  • Dashboard Traces tab with waterfall timeline
  • Optional OTLP collector forwarding (Jaeger, Aspire, Grafana)

OpenBrain Memory

Optional persistent semantic memory. Decisions and patterns captured in one session are searchable in every future session.

  • Auto-search before each slice for prior conventions
  • Auto-capture decisions after each slice
  • Cost anomaly detection (>2x average triggers insight)
  • Run summary captured for future phase planning

Stack Presets

PresetInstructionsAgentsPromptsSkills
.NET1719158
TypeScript1819158
Python1719158
Java1719158
Go1719158
Azure IaC121863

Ready to forge?

Machine-readable: forge_capabilities MCP tool · .well-known/plan-forge.json