Case Study NEW April 23, 2026 · 8 min read

The Loop That Never Ends

How TheProject turned a website audit into a self-smelting pipeline — and why every failed test now becomes the next plan slice.

Scott Nichols

Scott Nichols

Director @ Microsoft

A forge smith watching a glowing ouroboros of molten metal circle around an anvil, with parchment scrolls of bug reports being drawn into the flame and re-emerging as pristine emerald blueprints

Most pipelines end. You ship, you close the PR, you move on. Then two weeks later a customer reports a broken link, a Coming soon that never came, or a TODO hiding in the FAQ. You open a ticket. You context-switch. You rebuild the mental model of what should be there.

This post is about what happens when you stop treating bugs as interruptions and start treating them as new ore.

The Project

I'll call it TheProject — a pseudonym for a real production Next.js site I maintain. (The owner didn't clear the real name for publication, but every number and behaviour in this post is from the actual audit.) Marketing pages, a product catalog, a handful of interactive demos. Like most sites that grow organically, it had accumulated the usual rot: placeholder copy that never got replaced, stale /docs routes, console errors nobody noticed, href="#" waiting to be wired up.

I could have sat down with a checklist. Instead, I wired it into Plan Forge's closed loop.

The Loop (Drawn Honestly)

Plan Forge has a 7-step pipeline, but drawing it as a straight line has always been a simplification. The real shape is circular:

DISCOVERY content audit + route crawl + placeholder regex CRUCIBLE forge_crucible_submit (agent) HARDEN Phase-NN plan + Scope Contract BUG REGISTRY auto-smelt loop (re-enters Discovery) TEMPERING forge_tempering_run (re-audit) EXECUTE slice-by-slice + test gates finds ship green fails auto-smelt (new ore) THE CLOSED LOOP
Amber arrows = forward path. Green = happy path out of Execute. Rose dashed = the loop closing back on itself.

The back-edges are the point. Discovery finds problems and funnels them into the Crucible. Tempering catches regressions and writes them to the bug registry, which auto-smelts them back into Discovery's next pass. You don't hit a "done" state — you hit a quiet state. And the next deploy starts the loop again.

Pass 1: The Discovery Harness

Before the loop can run, Discovery needs to actually discover things. I built a reusable Node crawler — one file, ~200 lines — that emits structured JSON the Crucible can consume:

  • Crawls / + sitemap.xml + a known-routes list checked into the repo
  • Per route, records: HTTP status, <title>, <h1>, word-count of <main>
  • Placeholder markers: matches against TODO, Coming soon, Lorem, placeholder, mock
  • Broken <a href> count, via a follow-up fetch on each internal link
  • Console errors captured via Playwright
  • Writes everything to .forge/audits/dev-<ts>.json, buckets triaged by severity

That's Pass 1. The output is boring JSON. But it's structured boring JSON, which is the only kind the Crucible can turn into smelts.

THE 4-PASS BUILD 1 Harness Node + Playwright 2 Wrapper JSON → Crucible 3 Execute slices + temper 4 Auto-smelt no human triage

A discovery harness that emits prose is a report. A discovery harness that emits structured JSON is fuel.

Pass 2: The Crucible Eats the JSON

The second pass is almost embarrassingly simple. I wrote a 30-line wrapper that reads .forge/audits/dev-<ts>.json, groups findings by route and severity, and for each group calls forge_crucible_submit with:

  • Title: "Fix 3 placeholder blocks + 1 broken link on /pricing"
  • Evidence: the raw JSON entries that triggered the smelt
  • Priority: derived from the severity bucket

The Crucible does its usual interview dance — asks clarifying questions, proposes scope, confirms forbidden actions. Then it hands a finalized smelt to Step 2 (Harden), which emits a Phase-NN plan with a Scope Contract.

At this point I haven't written a single line of product code. Everything so far was pipeline plumbing. That plumbing is what makes Passes 3 and 4 free.

Interlude: Not Everything Is a Smelt

The first version of the wrapper made a mistake I want to call out, because it's the mistake that almost sank the loop. It routed every finding through the Crucible. Console errors, 404s, auth redirects, placeholder regex hits — all of it became a proposed smelt for the Crucible to interview. The result was an interview queue 60+ items deep, and half the items were noise the Crucible had no business thinking about.

The fix was to triage findings into three lanes before the Crucible ever sees them.

THREE-LANE TRIAGE Discovery harness dev-<ts>.json 69 raw findings triage wrapper.mjs BUG REGISTRY 4 bugs · B1–B4 auto-smelt → fix → temper CRUCIBLE 7 patterns · 62 routes interview → harden → plan FILTERED 58 hits · 16 + 42 auth-307 + seed-404 (noise) confirmed defect scope-ambiguous harness-tuning
One real audit from TheProject: 69 findings in, 4 defects to the registry, 7 feature patterns to the Crucible, 58 hits filtered. The Crucible only sees scope-ambiguous work.

The bug lane skips the Crucible entirely because bugs aren't ideas — they have evidence, they have scope, they don't need an interview. Route them straight to the registry and let auto-smelt fix them in a single pass. The Crucible gets the lane it was built for: scope-ambiguous feature work that needs hardening before it can be executed.

Lane What goes there This audit Next step
Bug registry Confirmed defects with hard evidence — 5xx, 404 on seeded ID, schema reject 4 bugs
(B1–B4 API)
Auto-smelt → fix in one pass → tempering validates
Crucible Feature gaps — empty shells, missing flows, unshipped work 7 patterns
(62 routes)
Interview → harden → Phase-NN plan
Filtered out Methodology-only noise — auth redirects, seed mismatches 58 hits
(16 + 42)
Nothing — harness tuning problem, not a finding

The third lane — noise — is the one most teams get wrong. A 307 on an auth-gated route isn't a bug, it's the middleware working correctly. A 404 on an /items/:id you never seeded is a test data problem, not a code problem. The harness must filter these before triage, or the Crucible drowns in busywork and the signal-to-noise ratio collapses.

Discovery that cries wolf on auth redirects teaches the Crucible to ignore it. Tune signal-to-noise at the source.

Sequencing matters. In the audit above I'm running the bug lane first — fix 4 known defects, watch tempering validate them, prove the mechanics end-to-end. Only then does the feature lane open with F1 (campaign donate/edit/manage). If Round 1's bug lane fails, the auto-smelt re-ingests and retries without me — the loop eats its own mistakes before it ever touches the feature backlog. That ordering is what makes the feature lane safe to run unattended.

Pass 3: Execute and Temper

Step 3 runs the plan slice by slice. Nothing exotic here — the same pforge run-plan workflow Plan Forge uses on itself. The interesting part is what happens after the last slice commits.

Tempering re-runs the discovery harness against the newly-deployed preview URL. If the JSON output is empty for the routes the plan claimed to fix, Tempering reports green and closes the loop. If not — if a placeholder slipped through, or a "fix" introduced a new broken link — Tempering writes the failures to the bug registry.

And the bug registry auto-smelts. No human triage. The exact same pipeline I used to write the fix re-runs to catch the fix's own regressions.

Pass 4: The Loop Closes on Itself

The final pass is the one that made me sit up straight. Once the bug registry has auto-smelt enabled, the loop runs without me. I described the discovery harness, wrote the Crucible wrapper, hardened the first plan. After that:

  1. Cron runs the discovery harness nightly
  2. New findings → auto-smelt → new Phase-NN plan
  3. Plan runs on the dev branch
  4. Tempering re-audits
  5. If clean, PR opens; if not, auto-smelt again

I stopped maintaining a manual TODO list for TheProject two weeks ago. The loop found 23 placeholders I didn't know existed, 7 broken links from a migration last month, and a console error in the checkout flow that had been silently firing for weeks. It's still finding things — slower now, but steady.

What Makes This Work

Four things had to be true for the loop to close:

1. Structured evidence, not prose. The Crucible can't smelt a bug report that says "the pricing page looks weird." It can smelt {"route": "/pricing", "placeholders": ["Coming soon", "TODO: price tiers"], "broken_hrefs": ["#"]}. The discovery harness exists to turn the first into the second.

2. Triage before the Crucible, not after. Findings split into three lanes (bug registry / Crucible / filtered noise) at the wrapper, not inside the Crucible interview. This is the insight from the previous section, and it's the one that took longest to learn.

3. Tempering has to re-audit with the same tool that discovered. If discovery uses regex and tempering uses eyeballs, the loop leaks. If both use the same harness, a fix is only "done" when the same JSON query that found it now returns empty.

4. Auto-smelt is opt-in but default-on. You can turn it off per-project, but the moment you turn it off, the loop degrades into a pipeline — and pipelines end. The whole point is that this one doesn't.

The Pattern Generalizes

TheProject is a website, but the pattern isn't website-specific. Replace "discovery harness" with any tool that emits structured findings:

  • A .NET API: Roslyn analyzer runs → emits JSON of violations → Crucible smelts → plan fixes → tempering re-runs the analyzer
  • A Python data pipeline: Great Expectations suite → fails → emits JSON → Crucible smelts → plan adds missing validation → tempering re-runs the suite
  • An infrastructure repo: Azure Resource Graph query → emits compliance gaps → Crucible smelts → Bicep patch → tempering re-queries

Any system where you can express "what correct looks like" as a query that returns empty when everything is fine can live inside this loop.

See It on the Manual Page

The cyclical version of this diagram now lives on the How It Works manual page, right above the 7-step pipeline. The linear view (7 steps, Smelt → Forge → Guard → Learn) is still true — it's what happens inside a single pass. The cyclical view is what happens across passes, and it's the view that matters once auto-smelt is on.

Linear pipelines end when the last slice ships. Cyclical pipelines only pause when there's nothing left to find. In a real production system, that pause is never more than temporary.

Plan Forge is free and open source. MIT licensed. Wire it into your next project and stop maintaining a TODO list by hand.

v2.80 update: The loop described in this post is now a first-class Tempering subsystem. Use forge_tempering_drain (MCP tool) or pforge audit-loop (CLI) to run the audit drain loop programmatically. Activation defaults to off — opt in via .forge.json#audit.mode. See capabilities for the full tool reference.