A master smith standing center-stage in the great hall of the Plan Forge shop, arms outstretched as if conducting an orchestra, surrounded by floating glowing rune-tools orbiting at chest height (compass, scroll, hammer, scales, hourglass, key), deep amber light streaming from below

Deep Dive · Cross-cutting

Forge-Master

A read-only reasoning orchestrator with its own dashboard tab. Classifies intent, pulls OpenBrain memory, and chains read-only forge tools on your behalf, so you can ask open-ended questions instead of wiring tool calls by hand.

Introduced in the Phase-28 MVP series. Subsequent phases added quorum advisory mode (Phase-38.7), embedding cache fallback (Phase-38.8), unified-timeline integration, and — most recently — the Phase-43 "CTO loop" upgrade: 17 read-only tools restored to the allowlist, three lane-specific system-prompt overlays (advisory-CTO / build-interviewer / troubleshoot-SRE), opinionated defaults flipped on (observer enabled, L3 memory enabled, auto-escalate on, quorum advisory auto), the + @baseline philosophy alias, and the new forge_master_audit tool + pforge audit CLI. Forge-Master is read-only by design, it never writes code or files. Use it to think; use the rest of the forge to do.

Why a Reasoning Orchestrator?

Plan Forge has 105 MCP tools. Most of the time you know which one you need. But sometimes you don't, sometimes the question is open-ended:

"Why did Phase-27 Slice 4 fail?" → needs forge_watch_live + brain_recall + forge_bug_list
"Pick up the thread from yesterday's auth work" → needs memory recall + forge_status + forge_plan_status
"Should we add caching to the user lookup?" → needs forge_search + brain_recall for prior decisions + maybe forge_diagnose

Chaining the right tools by hand is slow and easy to get wrong. Forge-Master is the front door: one prompt in, one synthesized answer out. Behind the scenes it classifies your intent, pulls relevant memory, and orchestrates whatever read-only tools fit.

Read-only is a feature, not a limitation. Forge-Master cannot edit your code, change .forge.json, or finalize a smelt. That guarantee is what makes it safe to ask anything at any time. When the answer requires a write, Forge-Master tells you the exact tool to call yourself.

Four Access Surfaces

Surface	Best for	Where
Studio tab	Interactive exploration with prompt gallery, streaming chat, live tool-call trace	`localhost:3100/dashboard` → Studio
`forge_master_ask` MCP tool	Agents that want one-shot reasoning embedded in a larger conversation	Any MCP-compatible client (Copilot, Claude Code, Cursor, Codex, Windsurf)
`forge_master_audit` MCP tool Phase-43	End-of-week health check, end-of-run hook, “what should I worry about today?” — returns bounded report with summary, top-3 risks, P0/P1/P2 actions, cost note	Any MCP-compatible client
`pforge forge-master status\|logs` & `pforge audit`	Scripts, CI checks, health probes, weekly digests, incident-triggered reviews	CLI

The `forge_master_ask` tool

The MCP tool is a one-shot entry-point:

forge_master_ask {
  message: "Why did Phase-27 Slice 4 fail?"
}
→ {
  ok: true,
  lane: "troubleshoot",
  via: "router-llm",          // or "keyword" / "embedding-cache"
  toolCalls: [
    { name: "forge_watch_live", args: { phase: "27", slice: 4 } },
    { name: "brain_recall",     args: { query: "Phase-27 slice 4 failures" } }
  ],
  reply: "The slice failed because…",
  costUSD: 0.0023
}

copilot-instructions.md guidance: "Prefer forge_master_ask over manually calling individual forge tools when the task is open-ended or involves multiple steps. Don't use it for direct file edits, Forge-Master is read-only."

Three-Stage Intent Classifier

Every prompt is classified into a lane before tools are dispatched. The classifier runs three stages in order, falling through only when the prior stage didn't match confidently. This keeps the common case free (keyword) and the edge case smart (router LLM).

Stage 1

Keyword scoring

Fast regex/keyword match against per-lane vocabularies. Zero API cost. Returns immediately if confidence is high. Covers the bulk of operational prompts ("open bugs", "failing gate", "scope contract violation", etc.).

Stage 1.5

Embedding cache

Cosine-similarity match (≥ 0.85) against previously-classified prompts. Zero API cost on hit. Uses all-MiniLM-L6-v2 via @xenova/transformers (lazy-loaded peer dep), or a deterministic hash bag-of-words fallback when the package isn't installed. Works fully offline once warm.

Stage 2

Router LLM

Default model: grok-3-mini. Used for ambiguous prompts the cache hasn't seen. Every successful classification is then written through to the cache, so the next similar prompt skips this stage entirely.

Each successful turn carries a via field telling you which stage answered: "keyword", "embedding-cache", or "router-llm". The dashboard's Forge-Master tab summarizes the distribution as {keyword, embedding, router} percentages.

The Lanes

Forge-Master classifies into one of these lanes. Each lane has a different default tool allowlist; Phase-43 added a lane-specific system-prompt overlay for the three lanes where voice matters most (advisory, build, troubleshoot):

Lane	Use case	Overlay (Phase-43)	Quorum-eligible?
`operational`	Status queries, run lookups, “what’s happening”, reads runs, plan status, costs	—	No (hard-blocked)
`troubleshoot`	Failure diagnosis, reads logs, watch-live, bugs, traces	`troubleshoot-sre` — SRE evidence discipline; `forge_watch_live` + `forge_bug_list` first	No (hard-blocked)
`build`	“Add OAuth”, “how would I build X”, reads patterns, runbooks, prior plans	`build-interviewer` — funnels into Crucible; calls `forge_crucible_submit` / `ask` / `preview`	No (hard-blocked)
`advisory`	Open-ended judgment calls, “biggest risk this week”, “what’s at risk”, “weekly audit”, “should we ship or refactor”, trade-off analysis	`advisory-cto` — CTO voice: trade-offs, receipts before verdict, rank risks, ≤3 actions	Yes (default escalation target; quorum advisory now `auto` by default)
`tempering`	Tempering gate evaluation and enforcement checks	—	No
`principle-judgment`	Principled architectural decisions and principle reviews	—	No
`meta-bug-triage`	Triage of meta-bugs, self-repair, plan/orchestrator defects (routes to `forge_meta_bug_file`)	—	No
`offtopic`	Catch-all when nothing else matches; routed to a polite fallback reply	—	No

How overlays compose. The system prompt sent to the reasoning model is base + "\n\n" + overlay. The overlay lives in pforge-master/src/prompts/<lane>.md and is loaded once per turn after intent classification. Lanes without an overlay use the base prompt alone.

Phase-43 router patterns. Three new advisory triggers were pinned via regression test so the lane classification is deterministic: “biggest/top/main/primary/key risk”, “what’s at risk / going wrong / state of / broken”, and “weekly / monthly / quarterly audit / review / digest / summary”. All resolve to advisory with weight 3.

Quorum Advisory Mode v2.78+

For high-stakes decisions in the advisory lane, Forge-Master can fan the prompt out to 2–3 models in parallel and return all replies plus a dissent summary. The human picks the reply, there's no auto-winner selection, because the whole point is to surface disagreement.

Not the same as Quorum Mode. Quorum Advisory (this section) is per-prompt, human-picks-the-winner, scoped to advisory-lane Forge-Master prompts. Quorum Mode is per-slice, reviewer-synthesizes, scoped to pforge run-plan execution. See the side-by-side comparison in Chapter 14 for when to use which.

Activation

Set quorumAdvisory in .forge.json → forgeMaster:

Mode	When quorum fires
`"off"`	Never. Single-model reply only.
`"auto"` (default, flipped in Phase-43)	Lane is `advisory` AND prompt was auto-escalated to the high tier (also default-on since Phase-43) AND classifier confidence is medium or above. The conservative trigger.
`"always"`	Every `advisory`-lane prompt fires quorum. Highest spend, highest signal.

Quorum is hard-blocked on operational, troubleshoot, and build lanes. Even with "always", those lanes get a single-model reply. Quorum is for judgment, not for lookups.

Cost preview before dispatch

Before any model is called, the GET /api/forge-master/chat/:sessionId/stream endpoint emits a quorum-estimate SSE event with the projected cost. Studio displays this and lets you cancel before spending. Programmatic clients should listen for the event:

data: {"type":"quorum-estimate","models":3,"estimatedUSD":0.0142,"models":[
  {"name":"claude-opus-4.7","estUSD":0.0061},
  {"name":"gpt-5.3-codex","estUSD":0.0048},
  {"name":"grok-4.20","estUSD":0.0033}
]}

Dissent extraction

After all replies arrive, Forge-Master runs a keyword-frequency divergence analysis across the reply texts and emits a dissent: { topic, axis } summary. Topic is what the models disagreed about; axis is the dimension of disagreement (timing, scope, model choice, etc.). The dashboard renders this as a one-line summary above the three replies so you can see the disagreement before reading.

Partial failure

Quorum dispatch uses Promise.allSettled with a 60s hard timeout per model. If 1 of 3 fails or times out, the remaining replies are returned with a partial: true flag. If all fail, the response is { ok: false, code: "QUORUM_ALL_FAILED" }.

REST API + MCP Tools

Method	Endpoint / tool	Description
MCP tool	`forge_master_ask`	One-shot reasoning. Accepts `{ message, sessionId? }`; returns lane, via, toolCalls[], reply, costUSD.
MCP tool	`forge_master_audit` Phase-43	Holistic CTO-style audit. Accepts `{ tier?, maxToolCalls?, drill?, sessionId?, path? }`. Returns bounded payload `{ ok, summary, top_risks (≤3), actions (≤5), cost_note, sources[], message, drill? }`. Pulls drift, cost, open bugs, watcher alerts, deploy journal, and open Crucible smelts.
MCP tool (companion)	`forge_master_observe`	Exposed by the standalone `pforge-master` MCP server (not `pforge-mcp`). Lifecycle for the long-running observer loop. Actions: `start \| stop \| status`. Observer is enabled in defaults since Phase-43 but still gated by an explicit `start` call.
POST	`/api/forge-master/chat`	Start a chat session (or continue an existing one with `sessionId`). Returns `{ sessionId, ... }`. Pair with the SSE stream below to receive incremental tokens.
GET	`/api/forge-master/chat/:sessionId/stream`	Server-Sent Events stream for the session. Emits `classification`, `quorum-estimate` (if advisory triggers), `tool-call`, `tool-result`, `delta` (token chunks), `done`.
POST	`/api/forge-master/chat/:sessionId/approve`	Resolve a pending approval prompt mid-stream (used by quorum-estimate cancel, gated tool calls).
GET	`/api/forge-master/session/:sessionId`	Last ~10 turns for the session, for transcript replay.
GET	`/api/forge-master/sessions`	Recent sessions list.
GET	`/api/forge-master/prompts`	Prompt catalog used by the Studio sidebar.
GET	`/api/forge-master/capabilities`	Server capabilities snapshot (models, tier, advisory mode).
GET	`/api/forge-master/cache-stats`	Embedding cache liveliness: `{ size, hitRate, maxSize: 500 }`. Use as a health probe.
GET / PUT	`/api/forge-master/prefs`	Read / write per-project Forge-Master preferences. Schema: `{ tier, autoEscalate, quorumAdvisory, embeddingFallback }`. Phase-43 default flip: `autoEscalate: true`, `quorumAdvisory: "auto"`. `GET` returns current values; `PUT` writes to `.forge/fm-prefs.json`.

Configuration

Forge-Master config lives under forgeMaster in .forge.json. All fields are optional, sensible defaults apply. Phase-43 flipped four defaults to the CTO-friendly settings: observer enabled, L3 memory enabled, auto-escalate on, quorum advisory auto.

{
  "forgeMaster": {
    "reasoningModel": "claude-opus-4.7",       // model used for replies in advisory lane
    "routerModel": "grok-3-mini",              // model used by stage-2 intent classifier
    "quorumAdvisory": "auto",                  // "off" | "auto" | "always"  (default: "auto" since Phase-43)
    "autoEscalate": true,                      // bump hard questions to high tier  (default: true since Phase-43)
    "embeddingFallback": true,                 // enable stage 1.5 embedding cache
    "l3Enabled": true,                         // OpenBrain memory (graceful degradation if unreachable)  (default: true since Phase-43)
    "observer": { "enabled": true },           // long-running observer loop, still gated by explicit start  (default: true since Phase-43)
    "philosophy": "+ @baseline",               // Phase-43 alias: append UNIVERSAL_BASELINE to project principles (use "@baseline" to replace)
    "discoverExtensionTools": true,            // allow extension-supplied tools to register
    "providers": {
      "githubCopilot": { "model": "gpt-4o" }   // GitHub Models override (zero-key path)
    }
  }
}

Field	Default	What it controls
`reasoningModel`	`model.default` (or `gpt-4o-mini`)	Model used to compose replies in advisory lane. Falls back to `.forge.json`'s top-level `model.default`.
`routerModel`	`grok-3-mini`	Stage-2 intent classifier model. Cheap by design, it's classifying, not reasoning.
`quorumAdvisory`	`"auto"` (Phase-43)	Enables Quorum Advisory Mode in the `advisory` lane. Was `"off"` pre-Phase-43.
`autoEscalate`	`true` (Phase-43)	Auto-bump hard or ambiguous prompts to the high tier. Required input for `quorumAdvisory: "auto"` to fire. Was `false` pre-Phase-43.
`l3Enabled`	`true` (Phase-43)	Pull recall context from OpenBrain (Postgres + pgvector) when available. Gracefully degrades if OpenBrain isn't configured. Was `false` pre-Phase-43.
`observer.enabled`	`true` (Phase-43)	Allow the long-running observer loop to run. Still gated by an explicit `forge_master_observe({action:"start"})` call — default just removes the opt-in config requirement. Was `false` pre-Phase-43.
`philosophy`	`null` (Phase-43 adds `"+ @baseline"` / `"@baseline"` aliases)	System-prompt augmentation. `"@baseline"` uses Plan Forge's `UNIVERSAL_BASELINE` in place of your project principles. `"+ @baseline"` appends the baseline on top of your principles.
`embeddingFallback`	`true`	Enables the stage 1.5 embedding cache. Disable to force every cache-miss to the router LLM.
`discoverExtensionTools`	`true`	Allow extensions in `extensions/` to register tools that Forge-Master can call.
`providers.githubCopilot.model`	`gpt-4o-mini`	Model used when routing through GitHub Models (zero-key path with `gh auth login`).

Lane-Specific Prompt Overlays Phase-43

The reasoning model's system prompt is composed per-turn as base + "\n\n" + overlay where overlay is loaded from pforge-master/src/prompts/<lane>.md after intent classification. Lanes without an overlay use the base prompt alone. Three overlays ship today:

Lane	Overlay file	Voice / contract
`advisory`	`advisory-cto.md`	CTO voice. Reads receipts (drift, cost, alerts, deploys) before giving a verdict. Ranks risks. Caps recommendations at ≤3 actions. Refuses to recommend without evidence.
`build`	`build-interviewer.md`	Funnels every “add X / change Y” idea into a Crucible interview. Calls `forge_crucible_submit` → `forge_crucible_ask` → `forge_crucible_preview` rather than writing prose.
`troubleshoot`	`troubleshoot-sre.md`	SRE evidence discipline. Pulls `forge_watch_live` + `forge_bug_list` before hypothesizing root cause. Never proposes a fix without naming the failing slice / gate.

Authoring a new overlay. Drop a <lane>.md file under pforge-master/src/prompts/ and register it in LANE_OVERLAYS in pforge-master/src/reasoning.mjs. Keep overlays short (<30 lines) — long overlays push out the model's context budget for actual reasoning.

Zero-key setup

The recommended setup path requires no API keys: run gh auth login once and Forge-Master auto-detects your GitHub token, then routes through GitHub Models. GitHub Copilot subscribers get this for free.

Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or XAI_API_KEY only if you want to override the default with a premium model directly. The dashboard Settings → API Keys tab is the GUI equivalent.

Embedding Cache Internals

The stage 1.5 cache is small, opinionated, and zero-config:

Persistence: .forge/fm-sessions/embedding-cache.bin (binary Float32Array) plus a JSON metadata sidecar
Capacity: 500 entries, LRU eviction
Hit threshold: cosine similarity ≥ 0.85 returns cached classification with via: "embedding-cache"
Provider: all-MiniLM-L6-v2 via @xenova/transformers (lazy-loaded). When the package isn't installed, the cache uses a deterministic 32-bit hash bag-of-words baseline (hash-bag). Both produce 384-dim vectors that are L2-normalized.
Write-through: every successful router-LLM classification is asynchronously written to the cache
Liveliness probe: GET /api/forge-master/cache-stats returns { size, hitRate, maxSize: 500 }

Disable for testing: Set embeddingFallback: false in prefs to force every cache-miss to the router LLM. Useful when you're tuning intent vocabularies and want to measure raw stage-2 behavior.

Dashboard Studio Tab

Open localhost:3100/dashboard → Studio. Three panels:

Prompt gallery, pre-built prompts grouped by intent (operational, troubleshoot, advisory). Click to populate the chat box; edit before sending.
Streaming chat, POST /api/forge-master/chat to start, then subscribes to GET /api/forge-master/chat/:sessionId/stream. Shows live classification badge, tool-call trace as each tool fires, and the streaming reply token-by-token.
Embedding cache tile, shows current size, hit rate, and the LRU capacity (500). When the cache is cold, the tile shows a hint that hits will start once you've used Forge-Master a few times.

Forge-Master turns also surface in the unified Timeline tab as fm-turn events (added v2.82). Each turn carries the lane, the user message (truncated to 200 chars), and the turn number, useful for retrospectives.

Dashboard Forge-Master Studio tab showing the prompt gallery, streaming chat with intent classification badge, and the embedding cache liveliness tile

CLI

Command	What it does
`pforge forge-master status`	Health check: server up, cache loaded, last classification
`pforge forge-master logs [--tail N]`	Tail recent turns from `.forge/fm-sessions/*.jsonl`
`pforge audit` Phase-43	Run a one-shot CTO audit via `forge_master_audit`. Returns the same bounded report (summary, top-3 risks, actions, cost note).
`pforge audit --since 7d --tier high`	Window the audit (default 7d) and override the reasoning tier (default `high` for audits).
`pforge audit --schedule weekly`	Print copy-paste Windows `schtasks` and Linux `crontab` snippets to run the audit on a cadence (daily / weekly).
`pforge audit --on-incident`	Print PagerDuty webhook and GitHub Actions wire-up guidance so an incident can trigger an automatic audit.
`pforge audit export ...`	Legacy subcommand preserved for compatibility — wraps `forge_audit_export` for streaming events.log records.

Troubleshooting

Replies say "I can't help with that" for a question I think is reasonable: Likely classified as offtopic. Check the via field in the response, if it says "keyword", the keyword scorer didn't match. Rephrase using one of the keyword-rich phrasings ("status of …", "why did … fail", "should we …"), or wait until embedding-cache warms up.
Quorum advisory fires on every prompt and the spend is too high: Since Phase-43 the default is quorumAdvisory: "auto" and autoEscalate: true, so any advisory-lane prompt that auto-escalates to the high tier will fire quorum. To dial it back, set quorumAdvisory: "off" in .forge.json → forgeMaster, or set autoEscalate: false via PUT /api/forge-master/prefs. Operational/troubleshoot/build lanes are still hard-blocked regardless of mode.
Cache hit rate is stuck at 0%: Three causes: (1) the cache is fresh and hasn't seen similar prompts yet, give it 10–20 turns; (2) @xenova/transformers isn't installed and the hash-bag fallback isn't matching well, install the peer dep for better embeddings; (3) embeddingFallback: false in prefs disables the stage entirely.
"NO_REASONING_MODEL" error: No reasoning model configured and no API key found. Either run gh auth login (zero-key path), set ANTHROPIC_API_KEY / OPENAI_API_KEY / XAI_API_KEY, or set forgeMaster.reasoningModel in .forge.json.
Router model classifying everything as offtopic: The router model is too small for your prompt style. Try bumping routerModel from grok-3-mini to grok-4 or gpt-4o-mini. The router runs once per prompt, small models are usually fine, but quirky vocabularies sometimes need more capability.

Forge-Master

Why a Reasoning Orchestrator?

Four Access Surfaces

The forge_master_ask tool

Three-Stage Intent Classifier

The Lanes

Quorum Advisory Mode v2.78+

Activation

Cost preview before dispatch

Dissent extraction

Partial failure

REST API + MCP Tools

Configuration

Lane-Specific Prompt Overlays Phase-43

Zero-key setup

Embedding Cache Internals

Dashboard Studio Tab

CLI

Troubleshooting

Further Reading

The `forge_master_ask` tool