Architect's blueprint scroll with glowing scope (amber), validation gate (green), and forbidden zone (red) annotations, a brass compass marks a key boundary
Chapter 4

Writing Plans That Work

Here's what works and here's what breaks.

Plan Structure

Every hardened plan has these mandatory sections. The plan-hardener agent adds them automatically during Step 2 (or the Crucible interview adds them upstream during Smelt), but you should understand what each does and how to edit them:

SectionRequired?Purpose
Scope ContractYesIn-scope paths, out-of-scope, forbidden actions
MUST CriteriaYesNon-negotiable outcomes (checkboxes)
SHOULD CriteriaOptionalBest-effort goals
Build / Test CommandsYes v2.82.1+build-command + test-command, required by the Crucible critical-fields gate
Execution SlicesYesCheckpointed work chunks with gates and per-slice **Files in scope**
Branch StrategyRecommendedGit branch name and merge approach
Rollback PlanRecommendedHow to undo if things go wrong
Field aliases: Per-slice scope can be authored as either **Files:** or **Files in scope** (the latter is what the Crucible/hardener now emit). Validation gates can be authored as either **Validation Gate** or **Exit gate**. The orchestrator parses both. Hand-authored plans following the convention should prefer the Files in scope + Exit gate pair to match generated output.

CRITICAL_FIELDS Gate v2.82.1+

Plans created via the Crucible smelter are now blocked from finalizing until every CRITICAL_FIELD is filled in. This eliminates the entire class of "TBD-laden plans that compile but can't run."

FieldWhat it locks downExample
build-commandThe exact command the orchestrator will run as the build gate per slicedotnet build
test-commandThe exact command the orchestrator will run as the test gatedotnet test
scopeIn-scope paths (per-slice Files in scope + plan-level scope)src/services/**, tests/services/**
validation-gatesAt least one executable gate per slicedotnet test --filter UserService
forbidden-actionsConcrete file patterns or actions that are out-of-boundsDo NOT modify src/database/migrations/
rollbackHow to undo the change cleanlygit revert <commit> or named feature flag

If any CRITICAL_FIELD is missing, forge_crucible_finalize returns 409 with CRITICAL_FIELDS_MISSING and a criticalGaps[] array pointing at the unresolved fields. The Crucible interview adds a question for each missing field automatically, the feature lane now asks 7 questions (was 6); the tweak lane asks 4 (was 3).

The build/test commands are inferred from your repo when possible (via inferRepoCommands, checks package.json, *.csproj, pyproject.toml, Cargo.toml, etc.) so most projects don't have to type them by hand.

Hand-authored plans bypass the gate: If you write the plan yourself in docs/plans/Phase-NN.md instead of using the Crucible, the gate doesn't apply. But you still want to fill these fields in, the orchestrator reads build-command and test-command from the plan frontmatter when running gates that don't specify a full command inline.

Writing a Good Scope Contract

The scope contract is the most important section. It tells the AI exactly what files it can touch, and what's off-limits.

Good: Tight Scope

Clear boundaries
## Scope Contract
**In Scope**: src/services/UserService.cs, src/repositories/UserRepository.cs,
              tests/services/UserServiceTests.cs, tests/repositories/UserRepositoryTests.cs
**Out of Scope**: frontend/**, deployment/**, docs/** (except this plan)
**Forbidden Actions**:
- Do NOT modify src/database/migrations/ (migration is a separate phase)
- Do NOT change AppSettings.json connection strings
- Do NOT add NuGet packages without explicit approval

Bad: Loose Scope

Too vague
## Scope Contract
**In Scope**: anything related to users
**Out of Scope**: nothing specific
**Forbidden Actions**: don't break things
Forbidden Actions come from the Crucible interview: The 7th feature-lane question explicitly asks for forbidden actions. Answers like “don't touch the migrations folder” or “don't add new NuGet packages without approval” flow directly into this section as concrete patterns. If you skip the interview and hand-author the plan, write them as file patterns or named actions, not vibes.

"Anything related to users" gives the AI free rein to refactor 20 files. "Don't break things" isn't enforceable. Be specific about paths, and list forbidden actions as concrete file patterns. That's how you get lasagna code, clean layers, each with a purpose, instead of spaghetti where everything touches everything.

Slicing Strategy

Before the rules, the worked example. The same feature, add a User Profile endpoint, planned two ways:

Bad: one mega-slice
Slice 1, Add User Profile feature           [≥90 min, unbounded]
  • Database migration
  • Repository
  • Service
  • Controller
  • Tests

When the gate fails you have no idea which layer broke, the migration can't roll back cleanly without nuking the service work, and the reviewer is reading a 12-file diff with no checkpoint to anchor against.

Good: 4 layered slices
Slice 1, Migration + model                  [30 min]
Slice 2, Repository + unit tests            [45 min]
Slice 3, Service + business-logic tests     [60 min]
Slice 4, Controller + integration tests     [45 min]

Each slice ends at a real checkpoint. A migration failure stops Slice 1 cleanly. A controller bug at Slice 4 doesn't touch the migration in Slice 1. The reviewer reads four small diffs, each scoped to one architectural layer.

Slicing strategy: side-by-side comparison of tight scope (left, green) vs loose scope (right, red). Tight scope shows 3 slices each touching one architectural layer (Controller, Service, Repository) with concrete forbidden actions (no migrations, no AppSettings.json, no new NuGet packages). Loose scope shows one mega-slice mixing 4 layers (controllers + services + repositories + migrations) with the consequences: test fails make it impossible to isolate which layer broke, mid-slice migrations can't roll back cleanly, reviewers can't audit one boundary at a time. Forbidden actions section shows 'don't break things' as struck-through (unenforceable). Bottom rule: one layer per slice, scope = exact file paths, forbidden actions = concrete patterns. If you can't write the gate command, the slice is too broad.
Figure 4-1. Slicing strategy

Slices are 30–120 minute chunks of work. Each slice should produce a commit-worthy change, the "one PR" rule.

Rules of Thumb

  • One layer per slice, don't mix database migration with API controller in the same slice
  • Build on foundations, create the model/migration first, then the repository, then the service, then the controller
  • Tests with the code, include tests in the same slice as the code they test (not a separate "add tests" slice at the end)
  • 30 minutes minimum, slices shorter than this have too much gate overhead
  • 120 minutes maximum, slices longer than this accumulate too much risk before the next checkpoint

Example: 6-Slice Plan

Layer-by-layer slicing
Slice 1, Database migration + model           [30 min]
Slice 2, Repository + unit tests               [45 min]
Slice 3, Service layer + business logic tests   [60 min]
Slice 4, API controller + integration tests     [45 min]
Slice 5, Error handling + edge case tests       [30 min]
Slice 6, Documentation + cleanup                [30 min]

Validation Gates

Gates are the quality checkpoints between slices. A gate must be a concrete, executable command, not a human judgment call.

Good Gates

Executable and specific
**Gate**:
  dotnet build                              # zero errors
  dotnet test --filter "UserProfile"        # 6+ tests pass
  grep -rn "string interpolation" src/      # zero hits (security)

Bad Gates

Vague or unenforceable
**Gate**: "tests pass"           ← Which tests? How many?
**Gate**: "code looks clean"     ← Not executable
**Gate**: "review the changes"   ← Human-dependent, blocks automation

Parallel Execution

Mark slices that can run concurrently with the [P] tag. Add dependency declarations when slices must run in order:

Parallel slices with dependencies
### Slice 1, Database Migration [30 min]
...

### Slice 2, Repository Layer [P] [depends: Slice 1] [scope: src/repos/**]
...

### Slice 3, Service Layer [P] [depends: Slice 1] [scope: src/services/**]
...

### Slice 4, API Controller [depends: Slice 2, Slice 3]
...

Slices 2 and 3 both depend on Slice 1 (the migration) but are independent of each other, they run in parallel. Slice 4 waits for both to finish. The orchestrator builds a DAG (directed acyclic graph) and schedules accordingly.

DAG diagram showing parallel slices 2 and 3 executing concurrently after slice 1, then converging on slice 4
Figure 4-2. Parallel slice DAG, slices 2 and 3 run concurrently after slice 1, converging on slice 4.
When NOT to parallelize: If two slices modify the same files, they'll conflict. Only use [P] when slices touch different [scope: ...] paths.

Stop Conditions

Stop conditions tell the AI when to halt instead of trying to work around a failure:

Good stop conditions
**Stop if**: Build fails with compilation error
**Stop if**: Any existing test regresses (not just new tests)
**Stop if**: Migration produces data loss warning
**Stop if**: Security scan finds HIGH or CRITICAL vulnerability

Without stop conditions, the AI may try to "fix" a build failure by removing code, or skip a failing test by commenting it out. Stop conditions force it to report the problem instead of hiding it.

Plan Status Auto-Rewrite v3.x+

A hardened plan carries a status: HARDENED frontmatter field plus a matching > Status: HARDENED quote-block line under the title. When pforge run-plan finishes a plan successfully, the orchestrator's _finalizeRunPlan step calls rewritePlanStatusOnSuccess from pforge-mcp/orchestrator/run-plan/plan-status-update.mjs — atomically rewriting both surfaces to COMPLETE in a single file write.

What this means for plan authors:

  • Don't hand-edit status: COMPLETE at the end of a successful run — the orchestrator does it. Hand-edits between successful slice runs and the final finalize step will be overwritten.
  • A failed or aborted run leaves the status at HARDENED so the next pforge run-plan resumes correctly. The auto-rewrite only fires on the success path.
  • If you want to mark a plan complete by hand (e.g. you finished the last slice outside of pforge run-plan), update both surfaces: the frontmatter field and the quote-block line. The auditor at scripts/audit/plan-status-audit.mjs flags any plan where the two are out of sync.

Retro-slice template

The Step-2 plan hardener (step2-harden-plan.prompt.md) requires every hardened plan to end with a retro slice — a final, no-op-by-default slice whose validation gates are the post-hoc record of what was actually shipped:

### Slice N (final): Retro

**Goal**: Capture the run's outcome in the plan itself.

**Files in scope**: docs/plans/Phase-NN-PLAN.md

**Validation Gates**:
- All prior slices show ✅ in the "Slice status" table above
- CHANGELOG.md has an entry for this phase
- forge_diff_classify on the slice's commit range is "non-breaking" or has a documented rationale

The retro slice is not optional. The plan hardener will inject one if missing. Its job is to give the auto-status-rewrite a stable terminal slice to land on, and to give the reviewer-gate persona a single section to read when assessing whether the plan delivered what it promised.

Context Files

Each slice can list which instruction files are relevant. Don't load all 18, load only what's needed:

Targeted context loading
### Slice 1, Database Migration
**Context**: database.instructions.md, security.instructions.md

### Slice 4, API Controller
**Context**: api-patterns.instructions.md, auth.instructions.md, errorhandling.instructions.md

This keeps the AI's context window focused. A database slice doesn't need caching instructions; a controller slice doesn't need migration patterns.

Common Mistakes

MistakeWhat HappensFix
Scope too loose AI refactors 20 files instead of 3 List specific file paths, not categories
Scope too tight AI can't create necessary helper files Include reasonable wildcards: src/services/**
No stop conditions AI works around failures silently Add "Stop if" to every slice
Vague gates Gate "passes" without actually validating Use executable commands with expected counts
Tests in last slice 5 slices of code, then discover it's untestable Include tests alongside each code slice
Giant slices 120+ min of work before first checkpoint Break into 30–60 min focused chunks
Missing rollback Panic when something breaks in production Add rollback plan with specific git revert commands

Plan Templates

Eight language-specific plan examples ship with Plan Forge. Use them as starting points:

StackFileFeatures Demonstrated
.NETPhase-DOTNET-EXAMPLE.mdRLS, Dapper, Blazor, GraphQL, 12 slices
TypeScriptPhase-TYPESCRIPT-EXAMPLE.mdExpress, Prisma, Vitest
PythonPhase-PYTHON-EXAMPLE.mdFastAPI, SQLAlchemy, Pytest
JavaPhase-JAVA-EXAMPLE.mdSpring Boot, JPA, JUnit
GoPhase-GO-EXAMPLE.mdChi router, sqlx, testing
SwiftPhase-SWIFT-EXAMPLE.mdVapor, Fluent, XCTest
RustPhase-RUST-EXAMPLE.mdAxum, sqlx, Cargo test
PHPPhase-PHP-EXAMPLE.mdLaravel, Eloquent, PHPUnit

All examples live in docs/plans/examples/.

For a Design Patterns-style catalog of 25+ plan archetypes — database migrations, refactors, multi-service rollouts, bug sweeps, and more, each with a skeleton template — see Appendix Y — Plan Pattern Library.

📄 Full reference: AI-Plan-Hardening-Runbook.md on GitHub