Audit Loop
Closed-loop bug discovery: content-audit scan → triage → fix, iterating until convergence or max rounds.
- Scan, visit every page/route and record what's broken (404s, blank pages, “Coming soon” placeholders, broken links).
- Triage, sort each finding into one of three lanes: fix it now, ask a human, or I'm not sure.
- Fix, for the “fix it now” lane, spawn a worker to apply the fix, then re-scan.
- Repeat, keep going until no new bugs appear (“convergence”) or the round limit hits.
off. It never runs automatically unless you explicitly set audit.mode to "auto" or "always" in .forge.json. Production environments are always forbidden.
What It Does
The audit loop is a first-class Tempering subsystem that discovers bugs from a running system. It probes live routes against a dev or staging server, triages the findings into actionable lanes, and iterates until the finding count converges (no new issues found) or the maximum round limit is reached.
The Three Components
1. Content-Audit Scanner
pforge-mcp/tempering/scanners/content-audit.mjs, HTTP-probes a set of routes against a live base URL and emits structured findings: HTTP status, page title, h1, word count, placeholder markers, and client-shell detection for hydrated SPAs.
- Production guard: Reuses
looksLikeProduction()fromui-playwright.mjs. Refuses to crawl production URLs unlessallowProduction: trueis explicitly set (andforbidProductionin config is immutablytrue). - Injectable fetcher: Tests use a mock fetcher, no real HTTP in the test suite.
2. Triage Router
pforge-mcp/tempering/triage.mjs, routeFinding(finding, classifier) routes each finding to one of three lanes:
| Lane | Destination | What happens |
|---|---|---|
"bug" | Bug Registry | Finding registered via forge_bug_register |
"spec" | Crucible | Finding submitted as a new smelt (feature gap) |
"classifier" | Local artifact | Proposal written to .forge/audits/ for human review |
Unknown classifier output falls safe to { lane: "bug", confidence: "low" }, findings are never dropped.
3. Drain Loop
pforge-mcp/tempering/drain.mjs, runTemperingDrain(opts) orchestrates the full cycle:
- Run all registered scanners (content-audit + any others)
- Triage each finding through
routeFinding() - Apply fixes for bug-lane findings (via injectable
spawnWorker) - Re-scan to check if fixes resolved the issues
- Repeat until convergence or
maxRounds(default 5)
Activation Surface
Configuration lives in .forge.json#audit:
{
"audit": {
"mode": "off",
"maxRounds": 5,
"autoThresholds": {
"minFilesChanged": 5,
"minDaysSinceLastDrain": 3,
"requireFindings": true
},
"environments": ["dev", "staging"],
"forbidProduction": true
}
}
| Mode | Behavior |
|---|---|
"off" (default) | No automatic drain. Manual only via pforge audit-loop. |
"auto" | Evaluates thresholds after plan completion. Fires only if change-surface signals trip. |
"always" | Dispatches unconditionally after every plan completion. |
CLI Usage
# Manual one-shot (ignores config, always runs)
pforge audit-loop
# Respect .forge.json#audit config
pforge audit-loop --auto
# Dry run with custom rounds
pforge audit-loop --dry-run --max=3
# Target staging
pforge audit-loop --env=staging
MCP Tools
forge_tempering_drain, programmatic drain loop access. Acceptsproject,maxRounds,scanners,dryRun,env.forge_triage_route, route a single finding through the classifier. Returns{ lane, payload, confidence }.
Dashboard
The audit-loop toggle in the dashboard persists to .forge.json#audit, not session-scoped. This matches the pattern used by Forge-Master prefs (.forge/fm-prefs.json) and the quorum advisory toggle.
Discovery Harness Implementation
The discovery harness is the engine that turns a running dev server into a stream of structured findings. It uses a 4-pass build sequence, crawl, wrap, execute, auto-smelt, to close the loop between bug discovery and bug resolution with no human triage required.
Pass 1 — Harness (Node + Playwright)
A headless Playwright browser crawls every route exposed by the dev server. For each page the harness records HTTP status, document title, h1 text, word count, placeholder markers (e.g. Coming soon, TODO), broken links, and client-shell detection for hydrated SPAs. Results are written as structured JSON to .forge/audits/.
Representative example: a marketing site with 47 routes produces 12 findings on its first pass, three placeholder headings, two broken anchor links, four pages returning non-200 status codes, and three pages with zero meaningful content.
Pass 2 — Wrapper (JSON → Crucible)
Each finding from Pass 1 is transformed into a Crucible smelt via forge_crucible_submit. The wrapper applies severity triage, routing findings through the three-lane classifier (bug, spec, classifier) before packaging them as structured smelt input with enough context for the hardener to produce actionable plan slices.
Pass 3 — Execute (Slices + Tempering)
The hardened plan runs slice-by-slice through forge_run_plan. Each slice carries its own validation gate and Tempering re-audit. LiveGuard hooks fire between slices, catching regressions before they compound.
Pass 4 — Auto-smelt (Closed Loop)
Any Tempering failures from Pass 3 are converted into new smelts via forge_tempering_drain and re-entered into the bug registry, no human triage required. The loop iterates until convergence (zero new findings) or the configured maxRounds limit (default 5) is reached.
Three-Lane Triage Funnel
Every finding from the discovery harness gets sorted into one of three lanes by the wrapper before reaching Crucible. Lane assignment determines whether a human ever sees the finding, what shape the resulting plan slice takes, and how the loop closes. The funnel is the difference between an audit that produces 100 PRs nobody reads and an audit that produces 5 PRs that ship.
Bug Lane — Auto-smelt to Bug Registry
Findings with high confidence and a clear remediation pattern (broken links, non-200 status codes, placeholder markers, hydration failures) drop into the bug lane. The wrapper packages them as Crucible smelts with severity attached, then the auto-smelt pass converts them into entries in the bug registry. No human triage required, the loop closes automatically.
Representative example: a 4-pass run finds 8 broken anchor links across the docs. All 8 land in the bug lane as a single batch smelt with severity medium, generate one plan slice that fixes them together, and close themselves out via tempering re-audit.
Spec Lane — Escalate to Human Spec Author
Findings that imply missing or ambiguous spec content (placeholder headings like "Coming soon," pages with zero meaningful content, hydrated SPAs that crash without JS) drop into the spec lane. These can't be auto-fixed because the harness doesn't know what content should be there, only that something is missing. The wrapper escalates them as Crucible smelts requiring human input before they can be hardened into plan slices.
Representative example: the harness finds a route titled "Pricing, Coming soon" with 12 words of body content. Spec lane escalates this to a human as a Crucible smelt requesting a draft of the actual pricing tier copy. The human responds in the Crucible interview funnel, the wrapper hardens the response into a plan slice, and the loop resumes.
Classifier Lane — Refine the Classifier
Findings the classifier can't confidently sort (novel signals, contradictory evidence, low confidence scores) drop into the classifier lane. Rather than guess, the wrapper records the finding plus the classifier's confusion signal as a Crucible smelt targeting the classifier itself. Over time, classifier-lane volume should drop as the classifier learns from each handoff.
Representative example: the harness finds a 200 OK route with full content but the document title is just ".", the classifier hasn't seen this signal before. Classifier lane creates a smelt asking the maintainer "should pages with single-character titles be flagged as defective?" The answer becomes a new classifier rule for the next run.
Finding-type to lane mapping
| Finding type | Default lane | Why |
|---|---|---|
| Non-200 HTTP status | Bug | Unambiguous failure, fix is mechanical |
| Broken anchor / link | Bug | Target either exists or it doesn't; trivial to verify |
| Placeholder marker (TODO, Coming soon) | Spec | Implies missing content, not broken content |
| Zero meaningful content | Spec | Page exists but says nothing, needs human authoring |
| Hydration failure (SPA crashes without JS) | Bug | Build / config defect, not a content gap |
| Novel signal / low confidence | Classifier | Classifier can't sort; ask the maintainer |
| Mixed signals (multiple conflicting findings) | Classifier | Pre-empt a wrong auto-smelt by asking first |
For a worked example of how the bug lane closes a real defect end-to-end, including the multi-model quality patterns that catch issues a single model misses, see Quorum Quality Examples in Chapter 14.
Design Decisions
- Classifier proposals are local files: Written to
.forge/audits/as JSON artifacts. GitHub PR creation is a deferred enhancement. spawnWorkeris injectable: Consistent with visual-diff quorum and bug classifier patterns. Already in the function signature.- Production is immutably forbidden:
forbidProduction: truecannot be overridden via config, it's hardcoded inauto-activate.mjs.