How the Shop Remembers
The plain-English tour of Plan Forge's upgraded memory system, and the reason a cheaper, faster model can now do work that used to require the expensive one.
- Still three tiers. L1/L2/L3 didn't go away. We forged better tools around them.
- Still one capture call. Your code doesn't change. The shop just remembers more reliably now.
- The payoff is measurable. Drift dropped 64% over 90 days. A 7-slice plan now executes for $0.07 on Sonnet alone, no Opus escalation.
The Four New Pieces
Think of the forge shop. The L1/L2/L3 memory tiers are the workbench, the filing cabinet, and the library across town. They were already there. What we added is the craftsmanship around them:
| Piece | The shop metaphor | What it actually does |
|---|---|---|
| Hallmark | The maker's mark stamped into the metal, proves who forged it, when, from what stock. | A small JSON envelope (hallmark/v1) attached to every memory record and artifact. Lets any tool ask "is this still the version I think it is?" and catch drift before it bites. |
| Anvil | The anvil where everything gets struck, solid, reliable, never drops the hammer. | The boundary code that delivers L2 records to OpenBrain (L3). Adds a dead-letter queue, a capability handshake, and a boot-time drain so a network blip never loses a memory. |
| Lattice | The map of the shop, every workbench, every tool, every chain pulley, indexed by where it sits. | A code-graph index over your repo. Splits source into semantic chunks, records who-calls-whom, and answers "show me everyone who calls executeSlice" in milliseconds. |
| forge_sync_memories | The dispatch rider that carries shop news to the wider guild. | A soft-sync that copies decisions/lessons/gotchas from .forge/ into Copilot's own Memory store, so VS Code agents see them automatically next session. |
A Day in the Life of a Slice
Here's what happens when pforge run-plan starts executing slice 3 of your plan. Every step touches at least one memory subsystem:
-
Preload, The orchestrator calls
buildPlanBootContextand emits amemory-preloadevent with semantic queries derived from the slice's Scope Contract. The agent runtime (Copilot, Claude, Cursor) catches the event and runssearch_thoughtsagainst L3 + alatticeQueryagainst the code-graph. The agent now knows what prior slices learned and which files are relevant, before it reads a single line. -
Execute, The agent edits files. When it hits a tricky pattern ("Windows shell quoting breaks
grep -cwhen piped into a brace group"), it callscapture_thoughtwith typegotcha. The capture path stamps the record with a fresh Hallmark envelope and writes to L1 (instant), L2 (durable), and queues it for L3. -
Anvil delivery, A background drainer pulls from
.forge/openbrain-queue.jsonland pushes to OpenBrain. If OpenBrain is down or rejects the schema, the record lands in.forge/openbrain-dlq.jsonlinstead of vanishing. The next boot drains the DLQ automatically. -
Verify with Lattice, Before declaring the slice done, the agent runs
latticeCallerson every function it touched. If the call graph shows an unexpected caller (a test it forgot about, or a sibling slice's import), the slice gate catches it. This is the step that prevents "I refactored X and didn't realize Y depended on it." -
Sync out, At slice end,
forge_sync_memoriescopies new decisions and lessons into Copilot Memory. Tomorrow's VS Code session sees them in the global memory pane without anyone running anything.
Why Cheaper, Faster Models Now Punch Above Their Weight
This is the part most teams don't expect.
The classic AI cost equation goes better model → fewer mistakes → less wasted spend. That's still true, but it ignores a second lever: context quality. A medium-tier model with the right context will routinely outperform a flagship model with vague context. Memory is context. And the memory upgrades make the context dramatically better.
Here's the receipt, measured on this repo over the last 90 days:
| Metric | Before the upgrades | After (current) | What it means |
|---|---|---|---|
| Drift score | 22 | 8 | Architecture decay per session, lower is better. −64%. |
| Sonnet-4.6 success rate | ~78% (estimated) | 91% (332 / 365 slices) | Cheaper model now beats what Opus did a quarter ago. |
| Cost per slice | ~$0.09 | $0.04 | Less re-reading, less back-and-forth, less escalation. ~55% cheaper. |
| Opus escalation rate | Multiple slices per plan | Zero on QA-class plans | The memory-QA plan executed 7 slices for $0.07 on Sonnet alone. |
| OpenBrain DLQ depth | N/A (would have dropped) | 0 (Anvil catches all) | Zero memories lost to transient L3 failures. |
| Telemetry dedup rate | ~0% (no dedup) | 62.5% (10 of 16) | Hallmark's content hash collapses redundant writes. |
How the four pieces compound
- Hallmark means the agent can trust that "lesson learned in slice 2" is exactly what it was when written. No silent schema drift. The cheaper model doesn't waste tokens re-deriving facts it already has.
- Anvil means recall is reliable. Pre-upgrade, a network hiccup could silently drop a memory and the next slice would re-learn the same gotcha. Now the DLQ catches it and the boot drainer replays it.
- Lattice means the agent finds the right files without scanning the whole repo. "Who calls this function?" is a 50ms query instead of a 50-second grep-and-read. Fewer tokens, more accurate edits.
- forge_sync_memories means knowledge crosses session boundaries automatically. The next session's cheaper model starts already knowing what the last session's expensive model figured out.
Put bluntly: the memory upgrades subsidize the model choice. You can pick Sonnet (or another mid-tier) and let memory carry the load that used to require Opus reasoning. The savings show up in the cost ledger; the quality shows up in the drift score.
Three Commands You Can Run Today
The memory subsystems are exposed through the pforge CLI and the MCP server. Here are the three you'll use most:
# What does the agent see when it asks "where is snapshot restore handled?"
pforge lattice query "snapshot restore"
# Who calls this function?
pforge lattice callers executeSlice
# What does this function call?
pforge lattice callees attachSliceSnapshotRestore
# Health of every memory surface, L2 files, OpenBrain queue, DLQ, dedup rate, orphans
pforge memory report
# 90-day trend across drift / cost / models / incidents
pforge health-trend --days 90
# Push new decisions / lessons / gotchas into Copilot's own memory store.
# Safe to re-run, dedupes by content hash.
pforge sync-memories
# Dry-run preview (shows what would be written, writes nothing)
pforge sync-memories --dry-run
Where to Look on the Dashboard
The live dashboard (localhost:3100/dashboard) added an Anvil & Lattice tab when these subsystems shipped. From there you can see:
- Anvil panel, current OpenBrain queue depth, DLQ depth, last-drain timestamp, drain success rate over time. A non-zero DLQ depth that doesn't clear within a drain pass is your only "go look at this" signal.
- Lattice panel, index size (chunks, edges, files), last-rebuild timestamp, top-N hottest functions by caller count. Rebuild from here if you've made structural changes outside a plan run.
- Hallmark coverage, percentage of L2 records carrying a
_vstamp. Should sit at 100% for newly-written records; older records may shownone.
How the New Pieces Fit the Old Tiers
To make sure the mental model holds, here's the same picture from Chapter 21 with the new pieces drawn in:
┌─────────────────────────────────────────────────────────────────┐
│ Copilot Memory (cross-session, IDE-wide) │
│ ▲ │
│ │ forge_sync_memories (additive, hash-deduped) │
│ ┌────┴─────────────────────────────────────────────────────┐ │
│ │ L3, OpenBrain (pgvector, cross-project) │ │
│ │ ▲ │ │
│ │ │ Anvil (DLQ + capability handshake + boot drain) │ │
│ │ ┌────┴─────────────────────────────────────────────┐ │ │
│ │ │ L2, .forge/*.jsonl (Hallmark-stamped, _v:1) │ │ │
│ │ │ L1, Hub (in-process, runId-scoped) │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Parallel axis (not a tier): │
│ Lattice, .forge/lattice/{chunks,edges}.jsonl │
│ (code-graph; queried alongside, not stacked on, memory) │
└─────────────────────────────────────────────────────────────────┘
L1/L2/L3 are the same tiers. Hallmark adds a contract to what gets written. Anvil hardens the L2 → L3 doorway. forge_sync_memories pushes upward into Copilot. Lattice sits beside everything as a separate code-graph axis the agent queries the same way it queries memory.
See Also
- Chapter 21 — Memory Architecture, the technical L1/L2/L3 deep-dive these upgrades build on.
- Chapter 13 — Multi-Agent → OpenBrain Connective Tissue, how OpenBrain handoffs work across Claude, Copilot, and Cursor.
- Chapter 25 — Health DNA, the composite health score that now trends visibly better thanks to memory upgrades.
- MCP Server Reference — Memory Tools, the full parameter list for
forge_memory_report,forge_hallmark_show,forge_hallmark_verify, and the lattice tools. - Chapter 8 — CLI Reference, every
pforge lattice,pforge memory, andpforge sync-memoriessubcommand.