The four stations of the Plan Forge Shop, Smelt crucible, Forge anvil, Guard watchtower, Learn golden brain.
Chapter 2 · Act I, Smelt

How It Works

Tour of the Forge Shop: four stations, the gates between them, and the sessions that keep them honest.

Three terms to know up front. This chapter uses Scope Contract, validation gate, and slice in nearly every paragraph. Plain-English definitions live in the Glossary and a full treatment in Chapter 5 — Crucible; skim either if these terms are new and the rest of this chapter will land cleaner.

The Four Stations

Plan Forge is not one step, it's a workshop. Every change to your code flows through four stations, each with its own tools, its own artifacts, and its own gate to the next station.

🪨
Station 1
Smelt
Intake → Scope Contract
🔨
Station 2
Forge
Contract → Shipped Code
🛡️
Station 3
Guard
Deploy Defense (LiveGuard)
🧠
Station 4
Learn
Memory & Retros

The stations are connected by gates, Smelt won't hand the plan to Forge until the Scope Contract is crisp; Forge won't ship code until slice gates are green; Guard won't approve a deploy until secret-scan + env-drift are clean; Learn absorbs everything and feeds it back into Smelt for the next plan.

🔗 Want the deep-dive? Each station has its own page on the Shop Tour. This chapter zooms out, how the stations fit together and what happens between them.
🔁 See also: The Inner Loop, an optional reflective layer that adds reflexion retries, trajectories, auto-skill promotion, adaptive gate synthesis, postmortems, cross-project federation, and a reviewer agent. All opt-in, all Dashboard-configurable.

The Loop That Never Ends

Drawn linearly, Plan Forge looks like a 7-step pipeline. Drawn honestly, it's a closed loop. Every failed test, every regression caught by tempering, every placeholder spotted by a discovery scan re-enters the Smelt station as a new ore, auto-smelted into a Crucible idea, hardened into a slice, executed, and re-tested. The loop only pauses when there's nothing left to find.

DISCOVERY
content audit
+ route crawl
+ placeholder regex
finds
CRUCIBLE
forge_crucible_
submit (agent)
smelts
HARDEN
Phase-NN plan
+ Scope Contract
BUG REGISTRY
auto-smelt loop
(re-enters Smelt)
files
TEMPERING
forge_tempering_
run
scans
EXECUTE
slice-by-slice
+ test gates
⟲ Closed loop · every failed test re-enters Smelt as new ore
Case study: The Loop That Never Ends — How Rummag Auto-Smelts Its Own Website Bugs shows this loop applied to a real production site audit, with a 4-pass discovery harness feeding the Crucible.

The 7-Step Pipeline (Inside the Forge)

The Forge station, where raw scope becomes shipped code, runs a 7-step pipeline. Steps 0–2 happen in Smelt, steps 3–6 happen in Forge, step 6 hands off to Guard and Learn.

Step 0
Specify
Smelt · What & why
Step 1
Pre-flight
Smelt · Verify setup
Step 2
Harden
Smelt · Scope contract
Step 3
Execute
Forge · Slice by slice
Step 4
Sweep
Forge · No TODOs left
Step 5
Review
Forge · Drift detection
Step 6
Ship
Guard + Learn

You describe what you want (Step 0, Smelt). The AI creates a spec. A pre-flight check verifies your setup (Step 1, Smelt). The plan gets hardened into a binding scope contract with slices, gates, and forbidden actions (Step 2, Smelt), this is when Smelt hands off to Forge. The AI builds it slice by slice, validated at every boundary (Step 3, Forge). A completeness sweep eliminates stubs and TODOs (Step 4, Forge). A fresh session audits everything (Step 5, Forge). The shipper commits, LiveGuard runs its pre-deploy scan (Step 6, Guard), and OpenBrain captures lessons (Step 6, Learn).

Sessions and Why They Matter

Session 1, Plan (Smelt)
Steps 0–2

Specify, verify, harden. Produces the scope contract.

Session 2, Build (Forge)
Steps 3–4

Execute slices, sweep for completeness.

Session 3, Audit (Forge)
Step 5

Fresh context. Independent review.

Session 4, Ship (Guard + Learn)
Step 6

Commit, LiveGuard scan, capture lessons.

The executor shouldn't self-audit, that's like grading your own exam. Each session starts fresh, loads the same guardrails, but brings independent judgment. Session 3 (Review) has never seen the code being written, it reads the plan, reads the code, and checks for drift. Session 4 is when Guard and Learn take over: LiveGuard does its pre-deploy scan, OpenBrain writes the lessons.

Nested subagents: Within a session, agents can spawn sub-agents for complex tasks, the architecture reviewer can call the security reviewer, for example. This happens automatically; you don't need to configure it.

Why Session Isolation Works

The grading-your-own-exam analogy above is the short version. Three concrete mechanisms make session isolation a structural requirement rather than a stylistic preference:

1. Sunk-cost bias is a property of the context window

The session that wrote the code will defend it. Not because the model is stubborn, because the bad code and the proposed fix live in the same token sequence. The model's belief that the code is correct is encoded in the same context that produced it; the model literally cannot evaluate the code from a position of "I have not seen this before." A fresh session reads the same code without any prior commitment to it.

2. Context contamination clouds review judgment

Build sessions accumulate context as they work, rejected approaches, half-considered alternatives, partial refactors. By the time the session finishes, its reasoning is shaped by paths it considered but didn't take. A reviewer in the same session inherits all of that as background noise. A reviewer in a fresh session sees only the final code, against the original plan, with no memory of the rabbit holes.

3. Fresh-context reviews catch blind spots the build session is structurally unable to see

Some bugs are only visible from outside the build session's mental model. A naming inconsistency, a forgotten edge case, an architectural violation that the build session rationalized in the moment, these surface immediately to a reviewer that didn't participate in the rationalization. The build session is not lying; it cannot see what is invisible from inside its own context.

The 4-session model is not optional polish. Combined feedback from production runs (see the Lessons Learned chapter) shows that single-session execute-and-review consistently misses defects that fresh-session review catches in seconds. The cost of running an extra session is roughly the cost of one model invocation; the cost of shipping a missed defect is measured in incidents.

The v2.18 Temper Guards and Warning Signs system codified the failure modes that emerged from this pattern, the specific shortcuts agents take that produce compiling but architecturally broken code. Each instruction file now teaches agents not just what to do but why not to skip it. Session isolation is the structural defense; Temper Guards are the named anti-patterns it catches.

Source material: The 80/20 Wall and Guardrails Lessons Learned. The grading-your-own-exam analogy is adapted from Lesson 3.

The File System

After setup, Plan Forge installs four types of files into your .github/ directory:

Project structure after setup
.github/
├── instructions/          ← Rules (auto-load by file type)
│   ├── architecture-principles.instructions.md
│   ├── security.instructions.md
│   ├── testing.instructions.md
│   ├── database.instructions.md
│   └── ... (14–18 files per preset)
├── agents/                ← Reviewer personas (read-only audit)
│   ├── architecture-reviewer.agent.md
│   ├── security-reviewer.agent.md
│   └── ... (12 agents)
├── prompts/               ← Pipeline templates (attach in chat)
│   ├── step0-specify-feature.prompt.md
│   ├── step2-harden-plan.prompt.md
│   └── ... (7 pipeline + scaffolding)
├── skills/                ← Multi-step procedures (slash commands)
│   ├── security-audit/SKILL.md
│   ├── forge-execute/SKILL.md
│   └── ... (11 skills)
├── hooks/                 ← Lifecycle automation
│   ├── sessionStart.sh
│   └── postToolUse.sh
└── copilot-instructions.md  ← Master config file
File TypeWhat It DoesAnalogy
Instruction filesAuto-load based on what file you're editingThe rulebook
Agent definitionsSpecialized reviewers that audit your codeExpert consultants
Pipeline promptsStep-by-step workflow templatesThe recipe
SkillsMulti-step executable proceduresPower tools
Lifecycle hooksRun automatically at agent lifecycle pointsSafety rails

How Guardrails Auto-Load

Each instruction file has an applyTo pattern in its YAML frontmatter. When you edit a file that matches the pattern, the instruction file loads automatically into the AI's context:

security.instructions.md, frontmatter
---
description: Security best practices
applyTo: "**/auth/**,**/security/**,**/middleware/**"
---

When you open src/auth/token-validator.ts, the security instruction file loads. When you open src/models/User.ts, the database instruction file loads. No manual action needed, the AI reads the right rules for the right code.

The .forge.json Config

This file stores your project's Plan Forge configuration:

.forge.json
{
  "preset": "dotnet",
  "modelRouting": {
    "default": "claude-sonnet-4.6",
    "execute": "grok-4",
    "review": "claude-opus-4.7"
  },
  "escalationChain": ["grok-4", "claude-opus-4.7", "gpt-5.2-codex"],
  "quorumThreshold": 6
}

Key settings: which preset was used, which models to use for each role (execution vs review), the escalation chain when a model fails, and the complexity threshold for quorum mode.

Plans Are Markdown

A plan is just a .md file with structure. It lives in docs/plans/ and follows a template. Here's the minimal skeleton:

docs/plans/Phase-1-AUTH-PLAN.md, skeleton
# Phase 1, User Authentication

## Scope Contract
**In Scope**: src/auth/**, src/middleware/auth*, tests/auth/**
**Out of Scope**: frontend, deployment, CI
**Forbidden Actions**: Do NOT modify src/database/migrations/

## MUST Criteria
- [ ] JWT token generation and validation
- [ ] Role-based access control (admin, user)
- [ ] Password hashing with bcrypt

## Execution Slices

### Slice 1, Auth Models + Migration [30 min]
**Tasks**: Create User model, JWT service
**Gate**: `dotnet build` passes, `dotnet test` passes
**Stop if**: Build fails or migration errors

### Slice 2, Auth Middleware [30 min]
**Tasks**: JWT validation middleware, role decorator
**Gate**: `dotnet test`, 6+ tests pass
**Stop if**: Any existing test regresses

The AI reads this contract and follows it literally. Slices are checkpointed, the gate at the end of each slice must pass before proceeding to the next.

Slices, Gates, and Scope

These are the three building blocks of every plan:

ConceptWhat It IsWhy It Matters
Slice A 30–120 minute chunk of work with a clear goal Small enough to validate, large enough to be useful. One PR's worth.
Gate A validation check at the end of each slice (build, test, specific assertions) Catches failures immediately. No silent drift.
Scope Contract What files the AI can touch, what's forbidden, what's out of scope Prevents "I'll also refactor this unrelated file" creep.
Stop conditions are the safety valve. If a gate fails or a stop condition triggers, execution halts. The AI doesn't try to work around the failure, it stops and reports what went wrong.

Three Ways to Run the Pipeline

The same pipeline can run three different ways. Pick the one that matches your tools:

ApproachHow It WorksBest For
Pipeline Agents Select the Specifier agent → click handoff buttons through the chain VS Code + Copilot. Smoothest flow.
Prompt Templates Attach step0-*.prompt.md files in Copilot Chat Learning the pipeline. You see every prompt.
Copy-Paste Prompts Copy prompts from the runbook into any AI tool Claude, Cursor, ChatGPT, terminal agents.

All three produce identical results. The guardrails, validation gates, and pipeline steps are the same, only the delivery mechanism differs.

📄 Full reference: Multi-Agent Setup — GitHub Copilot, capabilities