A master smith standing center-stage in the great hall of the Plan Forge shop, arms outstretched as if conducting an orchestra, surrounded by floating glowing rune-tools orbiting at chest height (compass, scroll, hammer, scales, hourglass, key), deep amber light streaming from below
Deep Dive · Cross-cutting

Forge-Master

A read-only reasoning orchestrator with its own dashboard tab. Classifies intent, pulls OpenBrain memory, and chains read-only forge tools on your behalf, so you can ask open-ended questions instead of wiring tool calls by hand.

Introduced in the Phase-28 MVP series. Subsequent phases added quorum advisory mode (Phase-38.7), embedding cache fallback (Phase-38.8), and unified-timeline integration. Forge-Master is read-only by design, it never writes code or files. Use it to think; use the rest of the forge to do.

Why a Reasoning Orchestrator?

Plan Forge has 102 MCP tools. Most of the time you know which one you need. But sometimes you don't, sometimes the question is open-ended:

  • "Why did Phase-27 Slice 4 fail?" → needs forge_watch_live + brain_recall + forge_bug_list
  • "Pick up the thread from yesterday's auth work" → needs memory recall + forge_status + forge_plan_status
  • "Should we add caching to the user lookup?" → needs forge_search + brain_recall for prior decisions + maybe forge_diagnose

Chaining the right tools by hand is slow and easy to get wrong. Forge-Master is the front door: one prompt in, one synthesized answer out. Behind the scenes it classifies your intent, pulls relevant memory, and orchestrates whatever read-only tools fit.

Read-only is a feature, not a limitation. Forge-Master cannot edit your code, change .forge.json, or finalize a smelt. That guarantee is what makes it safe to ask anything at any time. When the answer requires a write, Forge-Master tells you the exact tool to call yourself.

Three Access Surfaces

SurfaceBest forWhere
Studio tabInteractive exploration with prompt gallery, streaming chat, live tool-call tracelocalhost:3100/dashboard → Studio
forge_master_ask MCP toolAgents that want one-shot reasoning embedded in a larger conversationAny MCP-compatible client (Copilot, Claude Code, Cursor, Codex, Windsurf)
pforge forge-master status|logsScripts, CI checks, health probesCLI

The forge_master_ask tool

The MCP tool is a one-shot entry-point:

forge_master_ask {
  message: "Why did Phase-27 Slice 4 fail?"
}
→ {
  ok: true,
  lane: "troubleshoot",
  via: "router-llm",          // or "keyword" / "embedding-cache"
  toolCalls: [
    { name: "forge_watch_live", args: { phase: "27", slice: 4 } },
    { name: "brain_recall",     args: { query: "Phase-27 slice 4 failures" } }
  ],
  reply: "The slice failed because…",
  costUSD: 0.0023
}
copilot-instructions.md guidance: "Prefer forge_master_ask over manually calling individual forge tools when the task is open-ended or involves multiple steps. Don't use it for direct file edits, Forge-Master is read-only."

Three-Stage Intent Classifier

Every prompt is classified into a lane before tools are dispatched. The classifier runs three stages in order, falling through only when the prior stage didn't match confidently. This keeps the common case free (keyword) and the edge case smart (router LLM).

Three-stage intent classifier flow: user prompt enters keyword scoring (stage 1, $0), falls through to embedding cache (stage 1.5, $0, cosine ≥ 0.85) on no keyword hit, falls through to grok-3-mini router LLM (stage 2, ~$0.0002) on cache miss. All three stages produce a lane classification with via field tagging which stage answered. Successful router-LLM classifications are written through to the cache.
Three-stage intent classifier flow
Stage 1
Keyword scoring

Fast regex/keyword match against per-lane vocabularies. Zero API cost. Returns immediately if confidence is high. Covers the bulk of operational prompts ("open bugs", "failing gate", "scope contract violation", etc.).

Stage 1.5
Embedding cache

Cosine-similarity match (≥ 0.85) against previously-classified prompts. Zero API cost on hit. Uses all-MiniLM-L6-v2 via @xenova/transformers (lazy-loaded peer dep), or a deterministic hash bag-of-words fallback when the package isn't installed. Works fully offline once warm.

Stage 2
Router LLM

Default model: grok-3-mini. Used for ambiguous prompts the cache hasn't seen. Every successful classification is then written through to the cache, so the next similar prompt skips this stage entirely.

Each successful turn carries a via field telling you which stage answered: "keyword", "embedding-cache", or "router-llm". The dashboard's Forge-Master tab summarizes the distribution as {keyword, embedding, router} percentages.

The Lanes

Forge-Master classifies into one of these lanes. Each lane has a different default tool allowlist:

LaneUse caseQuorum-eligible?
operationalStatus queries, run lookups, "what's happening", reads runs, plan status, costsNo (hard-blocked)
troubleshootFailure diagnosis, reads logs, watch-live, bugs, tracesNo (hard-blocked)
build"How would I build X", reads patterns, runbooks, prior plansNo (hard-blocked)
advisoryOpen-ended judgment calls, "should we…", "which approach…", "what's the trade-off…"Yes (default escalation target for quorum advisory)
offtopicCatch-all when nothing else matches; routed to a polite fallback replyNo

Quorum Advisory Mode v2.78+

For high-stakes decisions in the advisory lane, Forge-Master can fan the prompt out to 2–3 models in parallel and return all replies plus a dissent summary. The human picks the reply, there's no auto-winner selection, because the whole point is to surface disagreement.

Not the same as Quorum Mode. Quorum Advisory (this section) is per-prompt, human-picks-the-winner, scoped to advisory-lane Forge-Master prompts. Quorum Mode is per-slice, reviewer-synthesizes, scoped to pforge run-plan execution. See the side-by-side comparison in Chapter 14 for when to use which.

Activation

Set quorumAdvisory in .forge.jsonforgeMaster:

ModeWhen quorum fires
"off" (default)Never. Single-model reply only.
"auto"Lane is advisory AND prompt was auto-escalated to the high tier AND classifier confidence is medium or above. The conservative trigger.
"always"Every advisory-lane prompt fires quorum. Highest spend, highest signal.
Quorum is hard-blocked on operational, troubleshoot, and build lanes. Even with "always", those lanes get a single-model reply. Quorum is for judgment, not for lookups.

Cost preview before dispatch

Before any model is called, the GET /api/forge-master/chat/:sessionId/stream endpoint emits a quorum-estimate SSE event with the projected cost. Studio displays this and lets you cancel before spending. Programmatic clients should listen for the event:

data: {"type":"quorum-estimate","models":3,"estimatedUSD":0.0142,"models":[
  {"name":"claude-opus-4.7","estUSD":0.0061},
  {"name":"gpt-5.3-codex","estUSD":0.0048},
  {"name":"grok-4.20","estUSD":0.0033}
]}

Dissent extraction

After all replies arrive, Forge-Master runs a keyword-frequency divergence analysis across the reply texts and emits a dissent: { topic, axis } summary. Topic is what the models disagreed about; axis is the dimension of disagreement (timing, scope, model choice, etc.). The dashboard renders this as a one-line summary above the three replies so you can see the disagreement before reading.

Partial failure

Quorum dispatch uses Promise.allSettled with a 60s hard timeout per model. If 1 of 3 fails or times out, the remaining replies are returned with a partial: true flag. If all fail, the response is { ok: false, code: "QUORUM_ALL_FAILED" }.

REST API + MCP Tool

MethodEndpoint / toolDescription
MCP toolforge_master_askOne-shot reasoning. Accepts { message, sessionId? }; returns lane, via, toolCalls[], reply, costUSD.
POST/api/forge-master/chatStart a chat session (or continue an existing one with sessionId). Returns { sessionId, ... }. Pair with the SSE stream below to receive incremental tokens.
GET/api/forge-master/chat/:sessionId/streamServer-Sent Events stream for the session. Emits classification, quorum-estimate (if advisory triggers), tool-call, tool-result, delta (token chunks), done.
POST/api/forge-master/chat/:sessionId/approveResolve a pending approval prompt mid-stream (used by quorum-estimate cancel, gated tool calls).
GET/api/forge-master/session/:sessionIdLast ~10 turns for the session, for transcript replay.
GET/api/forge-master/sessionsRecent sessions list.
GET/api/forge-master/promptsPrompt catalog used by the Studio sidebar.
GET/api/forge-master/capabilitiesServer capabilities snapshot (models, tier, advisory mode).
GET/api/forge-master/cache-statsEmbedding cache liveliness: { size, hitRate, maxSize: 500 }. Use as a health probe.
GET / PUT/api/forge-master/prefsRead / write per-project Forge-Master preferences. Schema: { tier, autoEscalate, quorumAdvisory, embeddingFallback }. GET returns current values; PUT writes to .forge/fm-prefs.json.

Configuration

Forge-Master config lives under forgeMaster in .forge.json. All fields are optional, sensible defaults apply:

{
  "forgeMaster": {
    "reasoningModel": "claude-opus-4.6",       // model used for replies in advisory lane
    "routerModel": "grok-3-mini",              // model used by stage-2 intent classifier
    "quorumAdvisory": "auto",                  // "off" | "auto" | "always"
    "embeddingFallback": true,                 // enable stage 1.5 embedding cache
    "discoverExtensionTools": true,            // allow extension-supplied tools to register
    "providers": {
      "githubCopilot": { "model": "gpt-4o" }   // GitHub Models override (zero-key path)
    }
  }
}
FieldDefaultWhat it controls
reasoningModelmodel.default (or gpt-4o-mini)Model used to compose replies in advisory lane. Falls back to .forge.json's top-level model.default.
routerModelgrok-3-miniStage-2 intent classifier model. Cheap by design, it's classifying, not reasoning.
quorumAdvisory"off"Enables Quorum Advisory Mode in the advisory lane.
embeddingFallbacktrueEnables the stage 1.5 embedding cache. Disable to force every cache-miss to the router LLM.
discoverExtensionToolstrueAllow extensions in extensions/ to register tools that Forge-Master can call.
providers.githubCopilot.modelgpt-4o-miniModel used when routing through GitHub Models (zero-key path with gh auth login).

Zero-key setup

The recommended setup path requires no API keys: run gh auth login once and Forge-Master auto-detects your GitHub token, then routes through GitHub Models. GitHub Copilot subscribers get this for free.

Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or XAI_API_KEY only if you want to override the default with a premium model directly. The dashboard Settings → API Keys tab is the GUI equivalent.

Embedding Cache Internals

The stage 1.5 cache is small, opinionated, and zero-config:

  • Persistence: .forge/fm-sessions/embedding-cache.bin (binary Float32Array) plus a JSON metadata sidecar
  • Capacity: 500 entries, LRU eviction
  • Hit threshold: cosine similarity ≥ 0.85 returns cached classification with via: "embedding-cache"
  • Provider: all-MiniLM-L6-v2 via @xenova/transformers (lazy-loaded). When the package isn't installed, the cache uses a deterministic 32-bit hash bag-of-words baseline (hash-bag). Both produce 384-dim vectors that are L2-normalized.
  • Write-through: every successful router-LLM classification is asynchronously written to the cache
  • Liveliness probe: GET /api/forge-master/cache-stats returns { size, hitRate, maxSize: 500 }
Disable for testing: Set embeddingFallback: false in prefs to force every cache-miss to the router LLM. Useful when you're tuning intent vocabularies and want to measure raw stage-2 behavior.

Dashboard Studio Tab

Open localhost:3100/dashboardStudio. Three panels:

  • Prompt gallery, pre-built prompts grouped by intent (operational, troubleshoot, advisory). Click to populate the chat box; edit before sending.
  • Streaming chat, POST /api/forge-master/chat to start, then subscribes to GET /api/forge-master/chat/:sessionId/stream. Shows live classification badge, tool-call trace as each tool fires, and the streaming reply token-by-token.
  • Embedding cache tile, shows current size, hit rate, and the LRU capacity (500). When the cache is cold, the tile shows a hint that hits will start once you've used Forge-Master a few times.

Forge-Master turns also surface in the unified Timeline tab as fm-turn events (added v2.82). Each turn carries the lane, the user message (truncated to 200 chars), and the turn number, useful for retrospectives.

Dashboard Forge-Master Studio tab showing the prompt gallery, streaming chat with intent classification badge, and the embedding cache liveliness tile

CLI

CommandWhat it does
pforge forge-master statusHealth check: server up, cache loaded, last classification
pforge forge-master logs [--tail N]Tail recent turns from .forge/fm-sessions/*.jsonl

Troubleshooting

Replies say "I can't help with that" for a question I think is reasonable
Likely classified as offtopic. Check the via field in the response, if it says "keyword", the keyword scorer didn't match. Rephrase using one of the keyword-rich phrasings ("status of …", "why did … fail", "should we …"), or wait until embedding-cache warms up.
Quorum advisory never fires even though I set "auto"
Auto requires all four: lane = advisory, autoEscalated = true, fromTier = high, confidence ≥ medium. Use "always" to remove the gating during testing, then revert. Note that operational/troubleshoot/build lanes are hard-blocked regardless of mode.
Cache hit rate is stuck at 0%
Three causes: (1) the cache is fresh and hasn't seen similar prompts yet, give it 10–20 turns; (2) @xenova/transformers isn't installed and the hash-bag fallback isn't matching well, install the peer dep for better embeddings; (3) embeddingFallback: false in prefs disables the stage entirely.
"NO_REASONING_MODEL" error
No reasoning model configured and no API key found. Either run gh auth login (zero-key path), set ANTHROPIC_API_KEY / OPENAI_API_KEY / XAI_API_KEY, or set forgeMaster.reasoningModel in .forge.json.
Router model classifying everything as offtopic
The router model is too small for your prompt style. Try bumping routerModel from grok-3-mini to grok-4 or gpt-4o-mini. The router runs once per prompt, small models are usually fine, but quirky vocabularies sometimes need more capability.

Further Reading