How It Works
Tour of the Forge Shop: four stations, the gates between them, and the sessions that keep them honest.
The Four Stations
Plan Forge is not one step, it's a workshop. Every change to your code flows through four stations, each with its own tools, its own artifacts, and its own gate to the next station.
The stations are connected by gates, Smelt won't hand the plan to Forge until the Scope Contract is crisp; Forge won't ship code until slice gates are green; Guard won't approve a deploy until secret-scan + env-drift are clean; Learn absorbs everything and feeds it back into Smelt for the next plan.
The Loop That Never Ends
Drawn linearly, Plan Forge looks like a 7-step pipeline. Drawn honestly, it's a closed loop. Every failed test, every regression caught by tempering, every placeholder spotted by a discovery scan re-enters the Smelt station as a new ore, auto-smelted into a Crucible idea, hardened into a slice, executed, and re-tested. The loop only pauses when there's nothing left to find.
+ route crawl
+ placeholder regex
submit (agent)
+ Scope Contract
(re-enters Smelt)
run
+ test gates
The 7-Step Pipeline (Inside the Forge)
The Forge station, where raw scope becomes shipped code, runs a 7-step pipeline. Steps 0–2 happen in Smelt, steps 3–6 happen in Forge, step 6 hands off to Guard and Learn.
You describe what you want (Step 0, Smelt). The AI creates a spec. A pre-flight check verifies your setup (Step 1, Smelt). The plan gets hardened into a binding scope contract with slices, gates, and forbidden actions (Step 2, Smelt), this is when Smelt hands off to Forge. The AI builds it slice by slice, validated at every boundary (Step 3, Forge). A completeness sweep eliminates stubs and TODOs (Step 4, Forge). A fresh session audits everything (Step 5, Forge). The shipper commits, LiveGuard runs its pre-deploy scan (Step 6, Guard), and OpenBrain captures lessons (Step 6, Learn).
Sessions and Why They Matter
Specify, verify, harden. Produces the scope contract.
Execute slices, sweep for completeness.
Fresh context. Independent review.
Commit, LiveGuard scan, capture lessons.
The executor shouldn't self-audit, that's like grading your own exam. Each session starts fresh, loads the same guardrails, but brings independent judgment. Session 3 (Review) has never seen the code being written, it reads the plan, reads the code, and checks for drift. Session 4 is when Guard and Learn take over: LiveGuard does its pre-deploy scan, OpenBrain writes the lessons.
Why Session Isolation Works
The grading-your-own-exam analogy above is the short version. Three concrete mechanisms make session isolation a structural requirement rather than a stylistic preference:
1. Sunk-cost bias is a property of the context window
The session that wrote the code will defend it. Not because the model is stubborn, because the bad code and the proposed fix live in the same token sequence. The model's belief that the code is correct is encoded in the same context that produced it; the model literally cannot evaluate the code from a position of "I have not seen this before." A fresh session reads the same code without any prior commitment to it.
2. Context contamination clouds review judgment
Build sessions accumulate context as they work, rejected approaches, half-considered alternatives, partial refactors. By the time the session finishes, its reasoning is shaped by paths it considered but didn't take. A reviewer in the same session inherits all of that as background noise. A reviewer in a fresh session sees only the final code, against the original plan, with no memory of the rabbit holes.
3. Fresh-context reviews catch blind spots the build session is structurally unable to see
Some bugs are only visible from outside the build session's mental model. A naming inconsistency, a forgotten edge case, an architectural violation that the build session rationalized in the moment, these surface immediately to a reviewer that didn't participate in the rationalization. The build session is not lying; it cannot see what is invisible from inside its own context.
The v2.18 Temper Guards and Warning Signs system codified the failure modes that emerged from this pattern, the specific shortcuts agents take that produce compiling but architecturally broken code. Each instruction file now teaches agents not just what to do but why not to skip it. Session isolation is the structural defense; Temper Guards are the named anti-patterns it catches.
Source material: The 80/20 Wall and Guardrails Lessons Learned. The grading-your-own-exam analogy is adapted from Lesson 3.
The File System
After setup, Plan Forge installs four types of files into your .github/ directory:
.github/
├── instructions/ ← Rules (auto-load by file type)
│ ├── architecture-principles.instructions.md
│ ├── security.instructions.md
│ ├── testing.instructions.md
│ ├── database.instructions.md
│ └── ... (14–18 files per preset)
├── agents/ ← Reviewer personas (read-only audit)
│ ├── architecture-reviewer.agent.md
│ ├── security-reviewer.agent.md
│ └── ... (12 agents)
├── prompts/ ← Pipeline templates (attach in chat)
│ ├── step0-specify-feature.prompt.md
│ ├── step2-harden-plan.prompt.md
│ └── ... (7 pipeline + scaffolding)
├── skills/ ← Multi-step procedures (slash commands)
│ ├── security-audit/SKILL.md
│ ├── forge-execute/SKILL.md
│ └── ... (11 skills)
├── hooks/ ← Lifecycle automation
│ ├── sessionStart.sh
│ └── postToolUse.sh
└── copilot-instructions.md ← Master config file
| File Type | What It Does | Analogy |
|---|---|---|
| Instruction files | Auto-load based on what file you're editing | The rulebook |
| Agent definitions | Specialized reviewers that audit your code | Expert consultants |
| Pipeline prompts | Step-by-step workflow templates | The recipe |
| Skills | Multi-step executable procedures | Power tools |
| Lifecycle hooks | Run automatically at agent lifecycle points | Safety rails |
How Guardrails Auto-Load
Each instruction file has an applyTo pattern in its YAML frontmatter. When you edit a file that matches the pattern, the instruction file loads automatically into the AI's context:
---
description: Security best practices
applyTo: "**/auth/**,**/security/**,**/middleware/**"
---
When you open src/auth/token-validator.ts, the security instruction file loads. When you open src/models/User.ts, the database instruction file loads. No manual action needed, the AI reads the right rules for the right code.
The .forge.json Config
This file stores your project's Plan Forge configuration:
{
"preset": "dotnet",
"modelRouting": {
"default": "claude-sonnet-4.6",
"execute": "grok-4",
"review": "claude-opus-4.7"
},
"escalationChain": ["grok-4", "claude-opus-4.7", "gpt-5.2-codex"],
"quorumThreshold": 6
}
Key settings: which preset was used, which models to use for each role (execution vs review), the escalation chain when a model fails, and the complexity threshold for quorum mode.
Plans Are Markdown
A plan is just a .md file with structure. It lives in docs/plans/ and follows a template. Here's the minimal skeleton:
# Phase 1, User Authentication
## Scope Contract
**In Scope**: src/auth/**, src/middleware/auth*, tests/auth/**
**Out of Scope**: frontend, deployment, CI
**Forbidden Actions**: Do NOT modify src/database/migrations/
## MUST Criteria
- [ ] JWT token generation and validation
- [ ] Role-based access control (admin, user)
- [ ] Password hashing with bcrypt
## Execution Slices
### Slice 1, Auth Models + Migration [30 min]
**Tasks**: Create User model, JWT service
**Gate**: `dotnet build` passes, `dotnet test` passes
**Stop if**: Build fails or migration errors
### Slice 2, Auth Middleware [30 min]
**Tasks**: JWT validation middleware, role decorator
**Gate**: `dotnet test`, 6+ tests pass
**Stop if**: Any existing test regresses
The AI reads this contract and follows it literally. Slices are checkpointed, the gate at the end of each slice must pass before proceeding to the next.
Slices, Gates, and Scope
These are the three building blocks of every plan:
| Concept | What It Is | Why It Matters |
|---|---|---|
| Slice | A 30–120 minute chunk of work with a clear goal | Small enough to validate, large enough to be useful. One PR's worth. |
| Gate | A validation check at the end of each slice (build, test, specific assertions) |
Catches failures immediately. No silent drift. |
| Scope Contract | What files the AI can touch, what's forbidden, what's out of scope | Prevents "I'll also refactor this unrelated file" creep. |
Three Ways to Run the Pipeline
The same pipeline can run three different ways. Pick the one that matches your tools:
| Approach | How It Works | Best For |
|---|---|---|
| Pipeline Agents | Select the Specifier agent → click handoff buttons through the chain | VS Code + Copilot. Smoothest flow. |
| Prompt Templates | Attach step0-*.prompt.md files in Copilot Chat |
Learning the pipeline. You see every prompt. |
| Copy-Paste Prompts | Copy prompts from the runbook into any AI tool | Claude, Cursor, ChatGPT, terminal agents. |
All three produce identical results. The guardrails, validation gates, and pipeline steps are the same, only the delivery mechanism differs.
📄 Full reference: Multi-Agent Setup — GitHub Copilot, capabilities