Three-tier vault: amber sparks (hub events), glowing scrolls (file artifacts), radiant golden brain-orb (semantic memory), the three-tier memory architecture
Act IV, Learn · Chapter 21

Memory Architecture

Three tiers, one capture path. How Plan Forge remembers what it learned, across slices, across sessions, across plans.

New here? Start with this. When an AI agent ships a slice, it learns things, a tricky bug, a naming convention, a gotcha that took an hour to figure out. Most tools throw that away when the session ends. Plan Forge's memory system writes it down in three places at once so the next slice (or the next agent, or next month's session) starts from where the last one left off.
  • L1 (Hub), fast, in-process, like RAM. Powers the live dashboard.
  • L2 (Files), local .forge/*.jsonl files in your repo. Your project's permanent notebook.
  • L3 (OpenBrain), a shared semantic database. Searchable across projects, agents, and machines.
The same captureMemory() call writes to all three. If any tier fails, the others still succeed, nothing blocks your code.

And around those three tiers, v3.x added four pieces of craftsmanship: Hallmark stamps every record with a provenance envelope (hallmark/v1) so drift is detectable; Anvil hardens the L2→L3 doorway with a dead-letter queue and capability handshake so a network blip never loses a memory; Lattice sits alongside as a code-graph index the agent can query ("who calls this function?"); and forge_sync_memories pushes decisions and lessons up into Copilot's own Memory store so the next IDE session sees them automatically. The plain-English tour with numbers is in Chapter 22 — How the Shop Remembers.

This chapter consolidates the three-tier memory work in one place. The companion Chapter 22 — How the Shop Remembers tells the same story in plain English with the cost/quality numbers.
Looking for the v3.x upgrades (Hallmark, Anvil, Lattice, forge_sync_memories)? They're covered in plain English in the next chapter, Chapter 22 — How the Shop Remembers. That chapter explains what we layered on top of the L1/L2/L3 tiers described here, and shows the cost/quality numbers proving why a cheaper model can now do work that used to require the expensive one.

The Three Tiers

Three-tier memory capture flow: forge_memory_capture call fans out to L1 (hub WebSocket broadcast, instant, ephemeral), L2 (.forge/memory/ files, sync, persistent, gitignored), and L3 (OpenBrain pgvector via async push, cross-project, cross-tool, semantic search). All three readable independently from the read path.
Figure 21-1. Three-tier memory capture flow

Plan Forge separates volatile working memory from durable project memory from cross-project semantic memory. Every captureMemory call writes to all three in a single best-effort pass, no tier blocks the others, no failure aborts the calling tool.

Tier Storage Lifetime Read API What v3 added
L1, HubEventEmitter in hub.mjs + .forge/hub-events.jsonlProcess lifetime + replay fileWebSocket subscribers, forge_watchUnchanged. Same hub, same broadcast.
L2, Files.forge/*.jsonl (memory-captures, gotchas, lessons, decisions, patterns…)Repository lifetimeforge_memory_report, manual file readsHallmark stamps every new record (_v:1) so drift is detectable.
L3, OpenBrainpgvector via .forge/openbrain-queue.jsonl drainCross-project, cross-sessionsearch_thoughts, semantic recallAnvil hardens the doorway (DLQ + capability handshake + boot drain).
+ Lattice.forge/lattice/{chunks,edges}.jsonlRepository lifetime (rebuildable)latticeQuery, latticeCallers, latticeBlastParallel axis, a code-graph the agent queries alongside memory.
↑ Copilot MemoryCopilot's own Memory store (IDE)Cross-session, IDE-wideCopilot reads automaticallyforge_sync_memories pushes decisions/lessons upward (additive, hash-deduped).
One picture, all the pieces. The three tiers didn't go away, we forged better tools around them. For the layered tower diagram showing exactly how Hallmark, Anvil, Lattice, and forge_sync_memories fit on top of L1/L2/L3, see Chapter 22 § How the New Pieces Fit the Old Tiers.

Unified Memory Across Agents

OpenBrain isn't just a per-session scratch pad, it's a shared memory layer that compounds across every AI agent, every IDE, and every session. When Claude captures a gotcha in Slice 2, Copilot reads it in Slice 5 without any manual handoff. When Cursor records a naming convention, Claude's next run already knows it.

OpenBrain cross-agent compounding: Claude, Cursor, and Copilot each write decisions via capture_thought and read prior context via search_thoughts. Knowledge compounds, each slice raises the quality floor for every future agent.
Figure 21-2. OpenBrain cross-agent compounding

How it works — 4 steps

  1. Capture, any agent calls capture_thought({ content, project, source, type }) after a key decision. The record is scoped to your project and the originating slice path.
  2. Fan-out, Plan Forge's L2 + L3 capture path appends the record locally (.forge/openbrain-queue.jsonl) and drains it to OpenBrain asynchronously.
  3. Retrieve, at the start of any slice (or any session), agents call search_thoughts({ query, project, limit }) to surface relevant prior decisions before writing a single line of code.
  4. Compound, each new capture raises the signal quality for every future agent. A convention captured in Phase 1 is still enforced in Phase 40, by a different agent, in a different IDE.

Agent integration table

Agent Capture path Retrieve path Notes
Claude capture_thought MCP tool search_thoughts MCP tool Full read/write; memory-preload event on plan start
Cursor capture_thought MCP tool search_thoughts MCP tool Background agent and composer mode both supported
Copilot capture_thought MCP tool search_thoughts MCP tool Lifecycle hooks (SessionStart) inject prior context automatically
Future agents Any MCP client Any MCP client MCP-capable clients connect to the same store
See also: Multi-Agent → OpenBrain: The Connective Tissue, a deeper dive into how OpenBrain wires the 4-station pipeline together and what happens at each agent handoff.

Concepts in this section were first explored in the blog posts One Framework, Seven AI Agents and From WhatsApp to Shipped PR: The Unified System.

Capture Flow

One write, three destinations. The diagram below traces a single captureMemory({tool, type, body}) call from any tool through the dual-write fan-out:

┌──────────────────────────────────────────────────────────────────────┐
│  Any forge tool, watcher, hook, or skill                             │
│  └─► captureMemory({ tool, type, body, source })                     │
└──────────────────────────────────┬───────────────────────────────────┘
                                   │
        ┌──────────────────────────┼──────────────────────────┐
        ▼                          ▼                          ▼
┌──────────────────┐    ┌─────────────────────┐    ┌────────────────────┐
│  L1, Hub        │    │  L2, Files         │    │  L3, OpenBrain    │
│                  │    │                     │    │                    │
│ EventEmitter     │    │ Append _v:1 record  │    │ Append to          │
│   broadcast      │    │   to .forge/        │    │   openbrain-       │
│                  │    │   memory-captures   │    │   queue.jsonl      │
│ → WebSocket      │    │   .jsonl            │    │                    │
│   subscribers    │    │                     │    │ Drain worker:      │
│                  │    │ Tag-route to        │    │   batch → POST     │
│ → hub-events     │    │   gotchas.jsonl,    │    │   → pgvector       │
│   .jsonl replay  │    │   lessons.jsonl,    │    │                    │
│                  │    │   decisions.jsonl…  │    │ Failures → DLQ     │
│ Real-time UI     │    │                     │    │   .jsonl           │
└──────────────────┘    └─────────────────────┘    └────────────────────┘
                                                              │
                                                              ▼
                                                   ┌──────────────────────┐
                                                   │ search_thoughts /    │
                                                   │ buildPlanBootContext │
                                                   │ → preload on plan-   │
                                                   │   start (memory-     │
                                                   │   preload event)     │
                                                   └──────────────────────┘

Every step is wrapped in try/catch. A failed L3 enqueue never blocks the L2 file append; a corrupt L2 file never blocks the L1 broadcast. This is the dual-write pattern: best-effort fan-out with structured telemetry on each branch.

L1 — The Hub

The hub is a single EventEmitter instance in pforge-mcp/hub.mjs. Every event, slice start, model choice, tool result, memory capture, flows through it:

  • Subscribers, WebSocket clients (the dashboard), the watcher worker, the OpenBrain drain worker, anything listening for memory-captured
  • Replay file, every event also appends to .forge/hub-events.jsonl so a fresh dashboard can rebuild state on connect
  • Worker capability probe, workers announce which event types they handle so the hub can drop unhandled events early instead of fanning out garbage

L2 — The Files

Every memory file lives under .forge/ as line-delimited JSON. Each record carries a schema version field _v so the format can evolve without breaking older data:

FileContents
memory-captures.jsonlRaw capture log, every captureMemory call
gotchas.jsonlType-routed: type: "gotcha"
lessons.jsonlType-routed: type: "lesson"
decisions.jsonlType-routed: type: "decision"
patterns.jsonlType-routed: type: "pattern"
conventions.jsonlType-routed: type: "convention"
openbrain-queue.jsonlPending L3 deliveries (drain worker source)
openbrain-dlq.jsonlPermanently failed L3 deliveries
hub-events.jsonlL1 replay log

The Memory tab in the dashboard renders this exact set as a live KPI strip + per-file breakdown, see the dashboard chapter. The data comes from forge_memory_report, also exposed at GET /api/memory/report.

L3 — OpenBrain Bridge

OpenBrain is the cross-project semantic store (pgvector + thought metadata). Plan Forge never writes to it directly during a tool call, that would couple every tool's latency to the OpenBrain endpoint. Instead, the path goes through the Anvil boundary: a small piece of code that owns delivery, capability negotiation, and failure recovery so the calling tool only ever talks to a local queue.

  1. captureMemory appends one line to .forge/openbrain-queue.jsonl (microseconds, local I/O)
  2. The Anvil drain worker wakes on a timer or hub event, negotiates capabilities with the L3 endpoint, batches pending lines, and POSTs them to OpenBrain
  3. Successes are removed from the queue. Failures retry up to N times, then land in openbrain-dlq.jsonl, the dead-letter queue that the next boot drains automatically
  4. A drain-trend rolling window in forge_memory_report exposes pass/fail/deferred counts so the Memory tab can flag a stuck pipeline
OpenBrain not configured? The queue still fills harmlessly. captureMemory never fails because of L3. When you later set openbrain.endpoint in .forge.json, the next drain pass ships the backlog.

L3 → L1 Preload

When forge_run_plan emits run-started, the orchestrator calls buildPlanBootContext(plan, projectName) to derive a small set of semantic queries the agent should pre-fetch from L3 before slice 1:

  • plan-history hint, keyed off the plan name (plan Phase-1-AUTH), surfaces prior decisions on the same plan
  • slice-keyword hints, derived from slice titles via the keyword search map (e.g. "database"database migration patterns, "api"API endpoint design patterns), deduped and capped at 8

The hints are emitted as a memory-preload hub event. Any agent runtime listening (Copilot, Claude Code, Cursor) can resolve the hints via search_thoughts and seed its working context, eliminating the cold-start "what did we learn last time" gap.

Watcher → Memory

The file watcher (chapter 6 — Watcher tab) doesn't just emit FS events, it drives capture. When a file change matches a watcher rule, the watcher composes a buildWatcherSearchPrompt payload and pushes it through the same captureMemory path so the change becomes a first-class L2 record and an L3 query.

This closes the loop where edits made between plan slices used to vanish from memory. Now the watcher feeds L1/L2/L3 just like any tool would.

Source Attribution

Every capture carries a source field with a strict format: <tool> or <tool>/<subsystem>. validateSourceFormat rejects anything else. This means the Memory tab's "by tool" breakdown is always accurate, no untagged drift.

Examples
// Valid
"forge_run_plan"
"forge_run_plan/slice-executor"
"watcher/fs-rule"
"hook/pre-deploy"

// Rejected (logged, capture still proceeds, source replaced with "unknown")
"My Tool"
"forge_run_plan / slice-executor"   // spaces around slash
""

Migration: pforge migrate-memory

Schema changes (the _v field bumps) are handled by the migration switch in pforge.ps1 / pforge.sh:

Terminal
# Inspect what would migrate (no writes)
pforge migrate-memory --dry-run

# Apply: rewrites every .forge/*.jsonl record to the latest _v
pforge migrate-memory

# Migration is idempotent, running twice is a no-op

Originals are backed up to .forge/.migration-backup-<timestamp>/ before any rewrite.

Telemetry & Reporting

Three helpers in memory.mjs drive everything the dashboard shows:

  • buildCaptureTelemetry(), totals, deduped count, by-tool and by-type histograms (cosine-similarity dedup at write time)
  • buildCacheEntry() + isCacheEntryFresh(), search-result cache with TTL stamping (stampThoughtExpiry) and read-time filtering (filterUnexpiredThoughts)
  • buildMemoryReport(projectDir), assembles the full payload behind forge_memory_report / /api/memory/report: file inventory, version distribution, queue depth, drain trend, orphan detection

Further Reading

📄 v2.36.0 changelog: View CHANGELOG on GitHub.