Chapter 2 · Act I, Smelt

How It Works

Tour of the Forge Shop: four stations, the gates between them, and the sessions that keep them honest.

Three terms to know up front. This chapter uses Scope Contract, validation gate, and slice in nearly every paragraph. Plain-English definitions live in the Glossary and a full treatment in Chapter 5 — Crucible; skim either if these terms are new and the rest of this chapter will land cleaner.

The Four Stations

Plan Forge is not one step, it's a workshop. Every change to your code flows through four stations, each with its own tools, its own artifacts, and its own gate to the next station.

🪨

Station 1

Smelt

Intake → Scope Contract

→

🔨

Station 2

Forge

Contract → Shipped Code

→

🛡️

Station 3

Guard

Deploy Defense (LiveGuard)

→

🧠

Station 4

Learn

Memory & Retros

The stations are connected by gates, Smelt won't hand the plan to Forge until the Scope Contract is crisp; Forge won't ship code until slice gates are green; Guard won't approve a deploy until secret-scan + env-drift are clean; Learn absorbs everything and feeds it back into Smelt for the next plan.

🔗 Want the deep-dive? Each station has its own page on the Shop Tour. This chapter zooms out, how the stations fit together and what happens between them.

🔁 See also: The Inner Loop, an optional reflective layer that adds reflexion retries, trajectories, auto-skill promotion, adaptive gate synthesis, postmortems, cross-project federation, and a reviewer agent. All opt-in, all Dashboard-configurable.

The Loop That Never Ends

Drawn linearly, Plan Forge looks like a 7-step pipeline. Drawn honestly, it's a closed loop. Every failed test, every regression caught by tempering, every placeholder spotted by a discovery scan re-enters the Smelt station as a new ore, auto-smelted into a Crucible idea, hardened into a slice, executed, and re-tested. The loop only pauses when there's nothing left to find.

DISCOVERY

content audit
+ route crawl
+ placeholder regex

finds

→

CRUCIBLE

forge_crucible_
submit (agent)

smelts

→

HARDEN

Phase-NN plan
+ Scope Contract

↑

↓

BUG REGISTRY

auto-smelt loop
(re-enters Smelt)

files

←

TEMPERING

forge_tempering_
run

scans

←

EXECUTE

slice-by-slice
+ test gates

⟲ Closed loop · every failed test re-enters Smelt as new ore

Case study: The Loop That Never Ends — How Rummag Auto-Smelts Its Own Website Bugs shows this loop applied to a real production site audit, with a 4-pass discovery harness feeding the Crucible.

The 7-Step Pipeline (Inside the Forge)

The Forge station, where raw scope becomes shipped code, runs a 7-step pipeline. Steps 0–2 happen in Smelt, steps 3–6 happen in Forge, step 6 hands off to Guard and Learn.

Step 0

Specify

Smelt · What & why

→

Step 1

Pre-flight

Smelt · Verify setup

→

Step 2

Harden

Smelt · Scope contract

→

Step 3

Execute

Forge · Slice by slice

→

Step 4

Sweep

Forge · No TODOs left

→

Step 5

Review

Forge · Drift detection

→

Step 6

Ship

Guard + Learn

You describe what you want (Step 0, Smelt). The AI creates a spec. A pre-flight check verifies your setup (Step 1, Smelt). The plan gets hardened into a binding scope contract with slices, gates, and forbidden actions (Step 2, Smelt), this is when Smelt hands off to Forge. The AI builds it slice by slice, validated at every boundary (Step 3, Forge). A completeness sweep eliminates stubs and TODOs (Step 4, Forge). A fresh session audits everything (Step 5, Forge). The shipper commits, LiveGuard runs its pre-deploy scan (Step 6, Guard), and OpenBrain captures lessons (Step 6, Learn).

Sessions and Why They Matter

Session 1, Plan (Smelt)

Steps 0–2

Specify, verify, harden. Produces the scope contract.

Session 2, Build (Forge)

Steps 3–4

Execute slices, sweep for completeness.

Session 3, Audit (Forge)

Step 5

Fresh context. Independent review.

Session 4, Ship (Guard + Learn)

Step 6

Commit, LiveGuard scan, capture lessons.

The executor shouldn't self-audit, that's like grading your own exam. Each session starts fresh, loads the same guardrails, but brings independent judgment. Session 3 (Review) has never seen the code being written, it reads the plan, reads the code, and checks for drift. Session 4 is when Guard and Learn take over: LiveGuard does its pre-deploy scan, OpenBrain writes the lessons.

Nested subagents: Within a session, agents can spawn sub-agents for complex tasks, the architecture reviewer can call the security reviewer, for example. This happens automatically; you don't need to configure it.

Why Session Isolation Works

The grading-your-own-exam analogy above is the short version. Three concrete mechanisms make session isolation a structural requirement rather than a stylistic preference:

1. Sunk-cost bias is a property of the context window

The session that wrote the code will defend it. Not because the model is stubborn, because the bad code and the proposed fix live in the same token sequence. The model's belief that the code is correct is encoded in the same context that produced it; the model literally cannot evaluate the code from a position of "I have not seen this before." A fresh session reads the same code without any prior commitment to it.

2. Context contamination clouds review judgment

Build sessions accumulate context as they work, rejected approaches, half-considered alternatives, partial refactors. By the time the session finishes, its reasoning is shaped by paths it considered but didn't take. A reviewer in the same session inherits all of that as background noise. A reviewer in a fresh session sees only the final code, against the original plan, with no memory of the rabbit holes.

3. Fresh-context reviews catch blind spots the build session is structurally unable to see

Some bugs are only visible from outside the build session's mental model. A naming inconsistency, a forgotten edge case, an architectural violation that the build session rationalized in the moment, these surface immediately to a reviewer that didn't participate in the rationalization. The build session is not lying; it cannot see what is invisible from inside its own context.

The 4-session model is not optional polish. Combined feedback from production runs (see the Lessons Learned chapter) shows that single-session execute-and-review consistently misses defects that fresh-session review catches in seconds. The cost of running an extra session is roughly the cost of one model invocation; the cost of shipping a missed defect is measured in incidents.

The v2.18 Temper Guards and Warning Signs system codified the failure modes that emerged from this pattern, the specific shortcuts agents take that produce compiling but architecturally broken code. Each instruction file now teaches agents not just what to do but why not to skip it. Session isolation is the structural defense; Temper Guards are the named anti-patterns it catches.

Source material: The 80/20 Wall and Guardrails Lessons Learned. The grading-your-own-exam analogy is adapted from Lesson 3.

The File System

After setup, Plan Forge installs four types of files into your .github/ directory:

Project structure after setup

.github/
├── instructions/          ← Rules (auto-load by file type)
│   ├── architecture-principles.instructions.md
│   ├── security.instructions.md
│   ├── testing.instructions.md
│   ├── database.instructions.md
│   └── ... (14–18 files per preset)
├── agents/                ← Reviewer personas (read-only audit)
│   ├── architecture-reviewer.agent.md
│   ├── security-reviewer.agent.md
│   └── ... (12 agents)
├── prompts/               ← Pipeline templates (attach in chat)
│   ├── step0-specify-feature.prompt.md
│   ├── step2-harden-plan.prompt.md
│   └── ... (7 pipeline + scaffolding)
├── skills/                ← Multi-step procedures (slash commands)
│   ├── security-audit/SKILL.md
│   ├── forge-execute/SKILL.md
│   └── ... (11 skills)
├── hooks/                 ← Lifecycle automation
│   ├── sessionStart.sh
│   └── postToolUse.sh
└── copilot-instructions.md  ← Master config file

File Type	What It Does	Analogy
Instruction files	Auto-load based on what file you're editing	The rulebook
Agent definitions	Specialized reviewers that audit your code	Expert consultants
Pipeline prompts	Step-by-step workflow templates	The recipe
Skills	Multi-step executable procedures	Power tools
Lifecycle hooks	Run automatically at agent lifecycle points	Safety rails

How Guardrails Auto-Load

Each instruction file has an applyTo pattern in its YAML frontmatter. When you edit a file that matches the pattern, the instruction file loads automatically into the AI's context:

security.instructions.md, frontmatter

---
description: Security best practices
applyTo: "**/auth/**,**/security/**,**/middleware/**"
---

When you open src/auth/token-validator.ts, the security instruction file loads. When you open src/models/User.ts, the database instruction file loads. No manual action needed, the AI reads the right rules for the right code.

The `.forge.json` Config

This file stores your project's Plan Forge configuration:

.forge.json

{
  "preset": "dotnet",
  "modelRouting": {
    "default": "claude-sonnet-4.6",
    "execute": "grok-4",
    "review": "claude-opus-4.7"
  },
  "escalationChain": ["grok-4", "claude-opus-4.7", "gpt-5.2-codex"],
  "quorumThreshold": 6
}

Key settings: which preset was used, which models to use for each role (execution vs review), the escalation chain when a model fails, and the complexity threshold for quorum mode.

Plans Are Markdown

A plan is just a .md file with structure. It lives in docs/plans/ and follows a template. Here's the minimal skeleton:

docs/plans/Phase-1-AUTH-PLAN.md, skeleton

# Phase 1, User Authentication

## Scope Contract
**In Scope**: src/auth/**, src/middleware/auth*, tests/auth/**
**Out of Scope**: frontend, deployment, CI
**Forbidden Actions**: Do NOT modify src/database/migrations/

## MUST Criteria
- [ ] JWT token generation and validation
- [ ] Role-based access control (admin, user)
- [ ] Password hashing with bcrypt

## Execution Slices

### Slice 1, Auth Models + Migration [30 min]
**Tasks**: Create User model, JWT service
**Gate**: `dotnet build` passes, `dotnet test` passes
**Stop if**: Build fails or migration errors

### Slice 2, Auth Middleware [30 min]
**Tasks**: JWT validation middleware, role decorator
**Gate**: `dotnet test`, 6+ tests pass
**Stop if**: Any existing test regresses

The AI reads this contract and follows it literally. Slices are checkpointed, the gate at the end of each slice must pass before proceeding to the next.

Slices, Gates, and Scope

These are the three building blocks of every plan:

Concept	What It Is	Why It Matters
Slice	A 30–120 minute chunk of work with a clear goal	Small enough to validate, large enough to be useful. One PR's worth.
Gate	A validation check at the end of each slice (`build`, `test`, specific assertions)	Catches failures immediately. No silent drift.
Scope Contract	What files the AI can touch, what's forbidden, what's out of scope	Prevents "I'll also refactor this unrelated file" creep.

Stop conditions are the safety valve. If a gate fails or a stop condition triggers, execution halts. The AI doesn't try to work around the failure, it stops and reports what went wrong.

Three Ways to Run the Pipeline

The same pipeline can run three different ways. Pick the one that matches your tools:

Approach	How It Works	Best For
Pipeline Agents	Select the Specifier agent → click handoff buttons through the chain	VS Code + Copilot. Smoothest flow.
Prompt Templates	Attach `step0-*.prompt.md` files in Copilot Chat	Learning the pipeline. You see every prompt.
Copy-Paste Prompts	Copy prompts from the runbook into any AI tool	Claude, Cursor, ChatGPT, terminal agents.

All three produce identical results. The guardrails, validation gates, and pipeline steps are the same, only the delivery mechanism differs.

📄 Full reference: Multi-Agent Setup — GitHub Copilot, capabilities

How It Works

The Four Stations

The Loop That Never Ends

The 7-Step Pipeline (Inside the Forge)

Sessions and Why They Matter

Why Session Isolation Works

1. Sunk-cost bias is a property of the context window

2. Context contamination clouds review judgment

3. Fresh-context reviews catch blind spots the build session is structurally unable to see

The File System

How Guardrails Auto-Load

The .forge.json Config

Plans Are Markdown

Slices, Gates, and Scope

Three Ways to Run the Pipeline

The `.forge.json` Config