The Testbed
A separate repo. A library of scenarios. End-to-end proof that the shop still works.
- Why a separate repo? So Plan Forge can break things, commit, revert, and try again, without ever touching your real project.
- Why a library of scenarios? Each scenario is a JSON file describing a known regression (e.g. “agent dropped a test file last release— catch it”). Run them all and you know the forge still holds.
- Who needs this? You don't, day-to-day. The Testbed is mainly for Plan Forge maintainers and platform teams who want regression coverage of the tool itself. Skip ahead unless that's you.
forge_testbed_run. Scenarios: docs/plans/testbed-scenarios/*.json. Findings: docs/plans/testbed-findings/*.json. Requires testbed.path in .forge.json.
Why a Separate Testbed?
Unit tests cover one module; integration tests cover one service. Neither tells you whether the full Plan-Forge pipeline still produces a clean, shippable outcome on a real repo under a real scenario. The Testbed does, it's a second, dedicated repository that Plan Forge treats as a read-write fixture, replays a scenario against, and records the defect log.
Learn-by-Doing: The Reference Testbed
The canonical reference testbed lives at srnichols/plan-forge-testbed. It's a real .NET 10 application, TimeTracker, a billable-hours tracker with Clients, Projects, Time Entries, Billing, Invoices, and Dashboard surfaces, used as the worked example throughout this manual.
If you're learning Plan-Forge by doing, work through it in this order:
- Backend slices (
docs/plans/Phase-1-CLIENTS-CRUD-PLAN.md), see howpforge run-plandrives a four-slice CRUD feature with[P]parallelism,[depends:],[scope:], and validation gates. - UI slices (
docs/plans/Phase-2-WEB-UI-PLAN.md), Plan-Forge builds a Blazor Server + Microsoft Fluent UI front-end against the existing REST API. The plan demonstrates that pforge produces enterprise-grade UI: layered (page → service interface → repository, neverDbContextin components), accessible (WCAG 2.1 AA), and tested (bUnit). This is the proof artifact for "pforge does not vibe-code." - Operational scenarios (
docs/plans/testbed-scenarios/*.json), the synthetic regressions in the section below, replayed end-to-end viaforge_testbed_run.
The .NET preset ships three artifacts that make Step 2 work on any consuming project, they're not testbed-specific:
| Artifact | Path | Purpose |
|---|---|---|
| Instruction file | .github/instructions/blazor-fluent-ui.instructions.md | Auto-loads on *.razor edits. Forbids DbContext in components, mandates code-behind split, lifecycle discipline, accessibility checklist. |
| Reviewer agent | .github/agents/blazor-reviewer.agent.md | Read-only audit of UI changes for layer violations, lifecycle bugs, and Fluent UI misuse. |
| Skill | .github/skills/ui-scaffold/SKILL.md | /ui-scaffold <Entity> --crud generates the page + DTO + service interface + bUnit test in one shot, enforcing the layering rules. |
DbContext in .razor, every page accessible, every component tested.
Scenario Fixtures
Scenarios are JSON files under docs/plans/testbed-scenarios/. Each one describes:
- Initial state, branch, commit, known-good baseline.
- Instructions, the prompt / plan the agent will execute.
- Expected artifacts, which files must change, which must not.
- Gates, build, test, lint, drift thresholds.
A scenario is idempotent: the Testbed resets the fixture repo to the pinned commit before every run.
Anatomy of a Run
forge_testbed_run:
- Acquires
.forge/testbed.lock(one scenario at a time per testbed). - Verifies the testbed is clean (
ERR_TESTBED_DIRTYif not). - Replays the scenario end-to-end in the testbed directory.
- Captures artifacts, run metrics, and any defects.
- Writes a finding JSON under
docs/plans/testbed-findings/and emitstestbed-scenario-completed. - Releases the lock.
Common Errors
| Code | Meaning | Recovery |
|---|---|---|
ERR_TESTBED_NOT_FOUND | testbed.path missing or invalid | Set it in .forge.json |
ERR_TESTBED_DIRTY | Uncommitted changes in the testbed | Commit or stash inside the testbed repo |
ERR_TESTBED_LOCKED | Another scenario is running | Wait, or remove a stale .forge/testbed.lock |
Feedback Into the Loop
Findings with defects feed two consumers:
- Bug Registry, scanner-eligible defects auto-register via
forge_bug_register. - Health DNA, run metrics (duration, gate failures, drift score) feed the daily Health DNA fingerprint.