Bronze funnel with glowing amber findings flowing in from above and being routed to three output channels that loop back to the top, the closed-loop audit drain
Deep Dive · Act II, Forge

Audit Loop

Closed-loop bug discovery: content-audit scan → triage → fix, iterating until convergence or max rounds.

New here? Read this first. The audit loop is Plan Forge's way of finding bugs in a running app and fixing them automatically. Point it at your dev or staging server and it will:
  1. Scan, visit every page/route and record what's broken (404s, blank pages, “Coming soon” placeholders, broken links).
  2. Triage, sort each finding into one of three lanes: fix it now, ask a human, or I'm not sure.
  3. Fix, for the “fix it now” lane, spawn a worker to apply the fix, then re-scan.
  4. Repeat, keep going until no new bugs appear (“convergence”) or the round limit hits.
It works like a tireless QA tester that not only files bugs but closes them. It's off by default, you have to opt in. Production is permanently off-limits.
Audit loop drain flow: content-audit scanner produces findings, forge_triage_route classifies each into one of three lanes (bug -> forge_bug_register, spec -> forge_crucible_submit, classifier -> .forge/audits/ artifact), then spawnWorker applies fixes and the loop iterates. Activation via .forge.json#audit.mode (default off). Production environments are hard-blocked.
Off by default. The audit loop defaults to off. It never runs automatically unless you explicitly set audit.mode to "auto" or "always" in .forge.json. Production environments are always forbidden.

What It Does

The audit loop is a first-class Tempering subsystem that discovers bugs from a running system. It probes live routes against a dev or staging server, triages the findings into actionable lanes, and iterates until the finding count converges (no new issues found) or the maximum round limit is reached.

The Three Components

1. Content-Audit Scanner

pforge-mcp/tempering/scanners/content-audit.mjs, HTTP-probes a set of routes against a live base URL and emits structured findings: HTTP status, page title, h1, word count, placeholder markers, and client-shell detection for hydrated SPAs.

  • Production guard: Reuses looksLikeProduction() from ui-playwright.mjs. Refuses to crawl production URLs unless allowProduction: true is explicitly set (and forbidProduction in config is immutably true).
  • Injectable fetcher: Tests use a mock fetcher, no real HTTP in the test suite.

2. Triage Router

pforge-mcp/tempering/triage.mjs, routeFinding(finding, classifier) routes each finding to one of three lanes:

LaneDestinationWhat happens
"bug"Bug RegistryFinding registered via forge_bug_register
"spec"CrucibleFinding submitted as a new smelt (feature gap)
"classifier"Local artifactProposal written to .forge/audits/ for human review

Unknown classifier output falls safe to { lane: "bug", confidence: "low" }, findings are never dropped.

3. Drain Loop

pforge-mcp/tempering/drain.mjs, runTemperingDrain(opts) orchestrates the full cycle:

  1. Run all registered scanners (content-audit + any others)
  2. Triage each finding through routeFinding()
  3. Apply fixes for bug-lane findings (via injectable spawnWorker)
  4. Re-scan to check if fixes resolved the issues
  5. Repeat until convergence or maxRounds (default 5)

Activation Surface

Configuration lives in .forge.json#audit:

{
  "audit": {
    "mode": "off",
    "maxRounds": 5,
    "autoThresholds": {
      "minFilesChanged": 5,
      "minDaysSinceLastDrain": 3,
      "requireFindings": true
    },
    "environments": ["dev", "staging"],
    "forbidProduction": true
  }
}
ModeBehavior
"off" (default)No automatic drain. Manual only via pforge audit-loop.
"auto"Evaluates thresholds after plan completion. Fires only if change-surface signals trip.
"always"Dispatches unconditionally after every plan completion.

CLI Usage

# Manual one-shot (ignores config, always runs)
pforge audit-loop

# Respect .forge.json#audit config
pforge audit-loop --auto

# Dry run with custom rounds
pforge audit-loop --dry-run --max=3

# Target staging
pforge audit-loop --env=staging

MCP Tools

  • forge_tempering_drain, programmatic drain loop access. Accepts project, maxRounds, scanners, dryRun, env.
  • forge_triage_route, route a single finding through the classifier. Returns { lane, payload, confidence }.

Dashboard

The audit-loop toggle in the dashboard persists to .forge.json#audit, not session-scoped. This matches the pattern used by Forge-Master prefs (.forge/fm-prefs.json) and the quorum advisory toggle.

Discovery Harness Implementation

The discovery harness is the engine that turns a running dev server into a stream of structured findings. It uses a 4-pass build sequence, crawl, wrap, execute, auto-smelt, to close the loop between bug discovery and bug resolution with no human triage required.

Discovery Harness 4-pass build sequence: Pass 1 (Harness) crawls routes with Node + Playwright, Pass 2 (Wrapper) transforms JSON into Crucible smelts, Pass 3 (Execute) runs slices with Tempering, Pass 4 (Auto-smelt) converts failures into new smelts
Discovery Harness 4-pass build sequence

Pass 1 — Harness (Node + Playwright)

A headless Playwright browser crawls every route exposed by the dev server. For each page the harness records HTTP status, document title, h1 text, word count, placeholder markers (e.g. Coming soon, TODO), broken links, and client-shell detection for hydrated SPAs. Results are written as structured JSON to .forge/audits/.

Representative example: a marketing site with 47 routes produces 12 findings on its first pass, three placeholder headings, two broken anchor links, four pages returning non-200 status codes, and three pages with zero meaningful content.

Pass 2 — Wrapper (JSON → Crucible)

Each finding from Pass 1 is transformed into a Crucible smelt via forge_crucible_submit. The wrapper applies severity triage, routing findings through the three-lane classifier (bug, spec, classifier) before packaging them as structured smelt input with enough context for the hardener to produce actionable plan slices.

Pass 3 — Execute (Slices + Tempering)

The hardened plan runs slice-by-slice through forge_run_plan. Each slice carries its own validation gate and Tempering re-audit. LiveGuard hooks fire between slices, catching regressions before they compound.

Pass 4 — Auto-smelt (Closed Loop)

Any Tempering failures from Pass 3 are converted into new smelts via forge_tempering_drain and re-entered into the bug registry, no human triage required. The loop iterates until convergence (zero new findings) or the configured maxRounds limit (default 5) is reached.

Further reading. For a real-world walkthrough of the 4-pass sequence applied to a production Next.js site, see the blog post The Loop That Never Ends.

Three-Lane Triage Funnel

Every finding from the discovery harness gets sorted into one of three lanes by the wrapper before reaching Crucible. Lane assignment determines whether a human ever sees the finding, what shape the resulting plan slice takes, and how the loop closes. The funnel is the difference between an audit that produces 100 PRs nobody reads and an audit that produces 5 PRs that ship.

Three-Lane Triage Funnel: discovery findings sorted into Bug Lane (auto-smelt to bug-registry), Spec Lane (escalate to human spec author), and Classifier Lane (refine the classifier itself when uncertain)
Three-Lane Triage Funnel

Bug Lane — Auto-smelt to Bug Registry

Findings with high confidence and a clear remediation pattern (broken links, non-200 status codes, placeholder markers, hydration failures) drop into the bug lane. The wrapper packages them as Crucible smelts with severity attached, then the auto-smelt pass converts them into entries in the bug registry. No human triage required, the loop closes automatically.

Representative example: a 4-pass run finds 8 broken anchor links across the docs. All 8 land in the bug lane as a single batch smelt with severity medium, generate one plan slice that fixes them together, and close themselves out via tempering re-audit.

Spec Lane — Escalate to Human Spec Author

Findings that imply missing or ambiguous spec content (placeholder headings like "Coming soon," pages with zero meaningful content, hydrated SPAs that crash without JS) drop into the spec lane. These can't be auto-fixed because the harness doesn't know what content should be there, only that something is missing. The wrapper escalates them as Crucible smelts requiring human input before they can be hardened into plan slices.

Representative example: the harness finds a route titled "Pricing, Coming soon" with 12 words of body content. Spec lane escalates this to a human as a Crucible smelt requesting a draft of the actual pricing tier copy. The human responds in the Crucible interview funnel, the wrapper hardens the response into a plan slice, and the loop resumes.

Classifier Lane — Refine the Classifier

Findings the classifier can't confidently sort (novel signals, contradictory evidence, low confidence scores) drop into the classifier lane. Rather than guess, the wrapper records the finding plus the classifier's confusion signal as a Crucible smelt targeting the classifier itself. Over time, classifier-lane volume should drop as the classifier learns from each handoff.

Representative example: the harness finds a 200 OK route with full content but the document title is just ".", the classifier hasn't seen this signal before. Classifier lane creates a smelt asking the maintainer "should pages with single-character titles be flagged as defective?" The answer becomes a new classifier rule for the next run.

Finding-type to lane mapping

Finding typeDefault laneWhy
Non-200 HTTP statusBugUnambiguous failure, fix is mechanical
Broken anchor / linkBugTarget either exists or it doesn't; trivial to verify
Placeholder marker (TODO, Coming soon)SpecImplies missing content, not broken content
Zero meaningful contentSpecPage exists but says nothing, needs human authoring
Hydration failure (SPA crashes without JS)BugBuild / config defect, not a content gap
Novel signal / low confidenceClassifierClassifier can't sort; ask the maintainer
Mixed signals (multiple conflicting findings)ClassifierPre-empt a wrong auto-smelt by asking first
What gets auto-smelted. Only the bug lane runs autonomously. Spec and classifier lanes always require a human in the loop, by design. The point of the funnel is to keep humans focused on what only humans can answer (intent, scope, novel signals), not on triaging mechanical defects the harness already understands.

For a worked example of how the bug lane closes a real defect end-to-end, including the multi-model quality patterns that catch issues a single model misses, see Quorum Quality Examples in Chapter 14.

Design Decisions

  • Classifier proposals are local files: Written to .forge/audits/ as JSON artifacts. GitHub PR creation is a deferred enhancement.
  • spawnWorker is injectable: Consistent with visual-diff quorum and bug classifier patterns. Already in the function signature.
  • Production is immutably forbidden: forbidProduction: true cannot be overridden via config, it's hardcoded in auto-activate.mjs.