A vast bronze-lined memory vault inside a stone forge chamber with tall shelves of small leather-bound notebooks tagged with glowing amber runes; small enchanted brass automatons file and pull notebooks while an open ornate ledger in the foreground emits a stream of softly glowing amber sparks rising upward
Act IV, Learn · Chapter 22

How the Shop Remembers

The plain-English tour of Plan Forge's upgraded memory system, and the reason a cheaper, faster model can now do work that used to require the expensive one.

New here? Start with this. The previous chapter (Memory Architecture) explains the three-tier plumbing (L1 hub, L2 files, L3 OpenBrain). This chapter explains what we added on top in plain language, the maker's mark on every record (Hallmark), the safer doorway to the shared brain (Anvil), the code-map that lets the agent ask "who calls this function?" (Lattice), and the bridge that hands all of it to Copilot's own memory (forge_sync_memories).
  • Still three tiers. L1/L2/L3 didn't go away. We forged better tools around them.
  • Still one capture call. Your code doesn't change. The shop just remembers more reliably now.
  • The payoff is measurable. Drift dropped 64% over 90 days. A 7-slice plan now executes for $0.07 on Sonnet alone, no Opus escalation.
What's in this chapter: a one-page mental model of the four new pieces, a day-in-the-life walkthrough of a slice, the cheaper/faster-model story with real numbers from this very repo, three commands you can run today, and where to look on the dashboard.

The Four New Pieces

Think of the forge shop. The L1/L2/L3 memory tiers are the workbench, the filing cabinet, and the library across town. They were already there. What we added is the craftsmanship around them:

Piece The shop metaphor What it actually does
Hallmark The maker's mark stamped into the metal, proves who forged it, when, from what stock. A small JSON envelope (hallmark/v1) attached to every memory record and artifact. Lets any tool ask "is this still the version I think it is?" and catch drift before it bites.
Anvil The anvil where everything gets struck, solid, reliable, never drops the hammer. The boundary code that delivers L2 records to OpenBrain (L3). Adds a dead-letter queue, a capability handshake, and a boot-time drain so a network blip never loses a memory.
Lattice The map of the shop, every workbench, every tool, every chain pulley, indexed by where it sits. A code-graph index over your repo. Splits source into semantic chunks, records who-calls-whom, and answers "show me everyone who calls executeSlice" in milliseconds.
forge_sync_memories The dispatch rider that carries shop news to the wider guild. A soft-sync that copies decisions/lessons/gotchas from .forge/ into Copilot's own Memory store, so VS Code agents see them automatically next session.
Why "soft" sync? Copilot Memory is read-only-from-our-side. We can write, but we can't delete what the user has curated. So the sync is additive only, never destructive. Deduplication is handled by content hash, so re-running is safe.

A Day in the Life of a Slice

Here's what happens when pforge run-plan starts executing slice 3 of your plan. Every step touches at least one memory subsystem:

  1. Preload, The orchestrator calls buildPlanBootContext and emits a memory-preload event with semantic queries derived from the slice's Scope Contract. The agent runtime (Copilot, Claude, Cursor) catches the event and runs search_thoughts against L3 + a latticeQuery against the code-graph. The agent now knows what prior slices learned and which files are relevant, before it reads a single line.
  2. Execute, The agent edits files. When it hits a tricky pattern ("Windows shell quoting breaks grep -c when piped into a brace group"), it calls capture_thought with type gotcha. The capture path stamps the record with a fresh Hallmark envelope and writes to L1 (instant), L2 (durable), and queues it for L3.
  3. Anvil delivery, A background drainer pulls from .forge/openbrain-queue.jsonl and pushes to OpenBrain. If OpenBrain is down or rejects the schema, the record lands in .forge/openbrain-dlq.jsonl instead of vanishing. The next boot drains the DLQ automatically.
  4. Verify with Lattice, Before declaring the slice done, the agent runs latticeCallers on every function it touched. If the call graph shows an unexpected caller (a test it forgot about, or a sibling slice's import), the slice gate catches it. This is the step that prevents "I refactored X and didn't realize Y depended on it."
  5. Sync out, At slice end, forge_sync_memories copies new decisions and lessons into Copilot Memory. Tomorrow's VS Code session sees them in the global memory pane without anyone running anything.

Why Cheaper, Faster Models Now Punch Above Their Weight

This is the part most teams don't expect.

The classic AI cost equation goes better model → fewer mistakes → less wasted spend. That's still true, but it ignores a second lever: context quality. A medium-tier model with the right context will routinely outperform a flagship model with vague context. Memory is context. And the memory upgrades make the context dramatically better.

Here's the receipt, measured on this repo over the last 90 days:

Metric Before the upgrades After (current) What it means
Drift score 22 8 Architecture decay per session, lower is better. −64%.
Sonnet-4.6 success rate ~78% (estimated) 91% (332 / 365 slices) Cheaper model now beats what Opus did a quarter ago.
Cost per slice ~$0.09 $0.04 Less re-reading, less back-and-forth, less escalation. ~55% cheaper.
Opus escalation rate Multiple slices per plan Zero on QA-class plans The memory-QA plan executed 7 slices for $0.07 on Sonnet alone.
OpenBrain DLQ depth N/A (would have dropped) 0 (Anvil catches all) Zero memories lost to transient L3 failures.
Telemetry dedup rate ~0% (no dedup) 62.5% (10 of 16) Hallmark's content hash collapses redundant writes.

How the four pieces compound

  • Hallmark means the agent can trust that "lesson learned in slice 2" is exactly what it was when written. No silent schema drift. The cheaper model doesn't waste tokens re-deriving facts it already has.
  • Anvil means recall is reliable. Pre-upgrade, a network hiccup could silently drop a memory and the next slice would re-learn the same gotcha. Now the DLQ catches it and the boot drainer replays it.
  • Lattice means the agent finds the right files without scanning the whole repo. "Who calls this function?" is a 50ms query instead of a 50-second grep-and-read. Fewer tokens, more accurate edits.
  • forge_sync_memories means knowledge crosses session boundaries automatically. The next session's cheaper model starts already knowing what the last session's expensive model figured out.

Put bluntly: the memory upgrades subsidize the model choice. You can pick Sonnet (or another mid-tier) and let memory carry the load that used to require Opus reasoning. The savings show up in the cost ledger; the quality shows up in the drift score.

The Phase-MEMORY-QA receipt. When we tested the memory upgrades themselves, the QA plan (7 slices, full E2E with mock OpenBrain, lattice callers, hallmark show/verify, backward-compat checks) ran for $0.07 total in ~51 minutes, 100% on Sonnet-4.6, no escalation, zero failed slices. The system QA'd itself with the very upgrades it was QA'ing, and did it for the price of a coffee. That's the loop closing.

Three Commands You Can Run Today

The memory subsystems are exposed through the pforge CLI and the MCP server. Here are the three you'll use most:

1. Search the code graph (Lattice)
# What does the agent see when it asks "where is snapshot restore handled?"
pforge lattice query "snapshot restore"

# Who calls this function?
pforge lattice callers executeSlice

# What does this function call?
pforge lattice callees attachSliceSnapshotRestore
2. Inspect the memory subsystem (any time)
# Health of every memory surface, L2 files, OpenBrain queue, DLQ, dedup rate, orphans
pforge memory report

# 90-day trend across drift / cost / models / incidents
pforge health-trend --days 90
3. Sync local decisions into Copilot Memory
# Push new decisions / lessons / gotchas into Copilot's own memory store.
# Safe to re-run, dedupes by content hash.
pforge sync-memories

# Dry-run preview (shows what would be written, writes nothing)
pforge sync-memories --dry-run

Where to Look on the Dashboard

The live dashboard (localhost:3100/dashboard) added an Anvil & Lattice tab when these subsystems shipped. From there you can see:

  • Anvil panel, current OpenBrain queue depth, DLQ depth, last-drain timestamp, drain success rate over time. A non-zero DLQ depth that doesn't clear within a drain pass is your only "go look at this" signal.
  • Lattice panel, index size (chunks, edges, files), last-rebuild timestamp, top-N hottest functions by caller count. Rebuild from here if you've made structural changes outside a plan run.
  • Hallmark coverage, percentage of L2 records carrying a _v stamp. Should sit at 100% for newly-written records; older records may show none.

How the New Pieces Fit the Old Tiers

To make sure the mental model holds, here's the same picture from Chapter 21 with the new pieces drawn in:

The memory stack, layered, not replaced
┌─────────────────────────────────────────────────────────────────┐
│  Copilot Memory (cross-session, IDE-wide)                       │
│       ▲                                                         │
│       │ forge_sync_memories  (additive, hash-deduped)           │
│  ┌────┴─────────────────────────────────────────────────────┐   │
│  │  L3, OpenBrain (pgvector, cross-project)                │   │
│  │       ▲                                                  │   │
│  │       │ Anvil  (DLQ + capability handshake + boot drain) │   │
│  │  ┌────┴─────────────────────────────────────────────┐    │   │
│  │  │  L2, .forge/*.jsonl   (Hallmark-stamped, _v:1)  │    │   │
│  │  │  L1, Hub (in-process, runId-scoped)             │    │   │
│  │  └──────────────────────────────────────────────────┘    │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Parallel axis (not a tier):                                    │
│    Lattice, .forge/lattice/{chunks,edges}.jsonl                │
│      (code-graph; queried alongside, not stacked on, memory)    │
└─────────────────────────────────────────────────────────────────┘

L1/L2/L3 are the same tiers. Hallmark adds a contract to what gets written. Anvil hardens the L2 → L3 doorway. forge_sync_memories pushes upward into Copilot. Lattice sits beside everything as a separate code-graph axis the agent queries the same way it queries memory.

See Also