Capability Reference
Everything Plan Forge can do — tools, commands, agents, skills, telemetry, and integrations. One page, complete coverage.
MCP Tools
All tools callable via Copilot Chat, Claude, Cursor, or any MCP client. Start with forge_capabilities for full discovery.
forge_capabilitiesFull API surface — tools, workflows, config, memory, glossary
forge_run_planExecute plan — DAG scheduling, gates, token tracking, retry
forge_abortAbort active execution between slices
forge_plan_statusLatest run status from .forge/runs/
forge_cost_reportSpend by model, monthly aggregation
forge_smithEnvironment diagnostics + actionable fixes
forge_validateSetup file validation
forge_sweepTODO/FIXME/stub marker scanner
forge_statusPhase status from roadmap
forge_diffScope drift detection
forge_analyzeConsistency scoring (0-100)
forge_ext_searchBrowse extension catalog
forge_ext_infoExtension details
forge_new_phaseCreate plan + roadmap entry
Autonomous Execution
Full Auto
One command. pforge run-plan spawns gh copilot CLI for each slice. Gates validate at every boundary. Supports Claude, GPT, and Gemini via --model.
Assisted
You code in VS Code Copilot. Orchestrator prompts you per slice and validates gates automatically. Best of both: human creativity + automated quality.
Parallel
[P]-tagged slices run concurrently. DAG-aware scheduling with scope conflict detection. Up to maxParallelism: 3 workers.
Quorum Mode
Multi-model consensus: dispatch complex slices to 3 AI models for independent analysis, synthesize the best approach, then execute with higher confidence. A/B tested: +20% more tests, better code structure, fewer brittle patterns vs single-model execution.
Complexity Scoring
7 weighted signals: file scope (20%), cross-module deps (20%), security keywords (15%), database keywords (15%), gate count (10%), task count (10%), historical failure rate (10%).
Auto Mode
--quorum=auto triggers quorum only on high-complexity slices (score ≥ 7). Simple CRUD runs normally. Best of both: quality where it matters, speed where it doesn't.
Graceful Degradation
If <2 models respond, falls back to normal execution. If reviewer fails, uses best single dry-run. No model unavailability blocks your pipeline.
A/B Tested
Invoice Engine (rate tiers, discounts, tax, rounding): quorum produced 20% more tests, extracted DRY helpers, used idiomatic .NET patterns, and caught edge cases the single model missed.
A/B Test: Invoice Engine (4 slices, rate tiers + discounts + tax + banker's rounding)
| Metric | Standard | Quorum (3 models) | Delta |
|---|---|---|---|
| Pass rate | 4/4 | 4/4 | Tie |
| Duration | 12 min | 32 min | +168% |
| Tests generated | 15 | 18 | +20% |
| DRY helpers | Inline | Extracted | Better |
| Test dates | Hardcoded (fragile) | Relative (robust) | Better |
| Edge case coverage | Standard | +voided regen, +sequence | Better |
Live Dashboard
localhost:3100/dashboard — 8 tabs, real-time via WebSocket, no build step required.
Progress
Live slice cards
Runs
History table
Cost
Model breakdown
Actions
One-click tools
Replay
Session logs
Extensions
Catalog browser
Config
Visual editor
Traces
OTLP waterfall
Agents & Skills
19 Reviewer Agents
Stack (6): architecture, database, deploy, performance, security, test-runner
Cross-stack (8): accessibility, API contracts, CI/CD, compliance, dependency, error handling, multi-tenancy, observability
Pipeline (5): specifier → plan-hardener → executor → reviewer-gate → shipper
8 Slash Command Skills
/database-migration · /staging-deploy · /test-sweep
/dependency-audit · /code-review · /release-notes
/api-doc-gen · /onboarding
Observability & Memory
OTLP Telemetry
Every run produces trace.json with resource context, span kinds (SERVER/INTERNAL/CLIENT), severity levels, and log summaries.
- Per-run manifest + global index (append-only, corruption-tolerant)
- Dashboard Traces tab with waterfall timeline
- Optional OTLP collector forwarding (Jaeger, Aspire, Grafana)
OpenBrain Memory
Optional persistent semantic memory. Decisions and patterns captured in one session are searchable in every future session.
- Auto-search before each slice for prior conventions
- Auto-capture decisions after each slice
- Cost anomaly detection (>2x average triggers insight)
- Run summary captured for future phase planning
Stack Presets
| Preset | Instructions | Agents | Prompts | Skills |
|---|---|---|---|---|
| .NET | 17 | 19 | 15 | 8 |
| TypeScript | 18 | 19 | 15 | 8 |
| Python | 17 | 19 | 15 | 8 |
| Java | 17 | 19 | 15 | 8 |
| Go | 17 | 19 | 15 | 8 |
| Azure IaC | 12 | 18 | 6 | 3 |
Ready to forge?
Machine-readable: forge_capabilities MCP tool · .well-known/plan-forge.json