Two intricate gear assemblies meshing above the anvil, brass clockwork (GitHub primitives) interlocking with glowing amber filigree (Plan Forge orchestration), sparking ember particles upward at the meshing point
Appendix H

GitHub Stack Alignment

The thesis: GitHub ships the agent runtime + integration standards + customization primitives + engagement metrics. Everything above the runtime is the ecosystem's lane. Plan Forge is built for that lane.

Who this page is for: Engineering leaders, platform engineers, and architects evaluating a complete AI-SDLC stack, whether you've already standardized on GitHub Copilot or you're shopping the category fresh.

Companion to: What is Plan Forge? · How it works · Appendix I — Plan Forge on the GitHub Stack (the surface-by-surface technical reference).

Why this combination is the only one in the category

Plan Forge + GitHub Copilot ships four capabilities no other AI-SDLC platform on the market combines today:

  • Three-tier memory so context quality compounds across teams instead of being a per-repo lottery
  • Multi-model quorum eval, Claude + GPT + Gemini score the same slice independently, 0–100 LLM-as-judge consensus
  • Audit Loop, scan-triage-fix loop for AI-generated drift, defaulting off, hard-blocked in production at the schema level
  • Watcher, a second IDE session that tails any in-flight run, read-only by schema (literally cannot write to the target)
In a hurry? Read the next three sections and stop: What you get · The picture · The four pillars. Then jump to Try it — on your own. Architects: the lower half of the page is the supporting context.

What you get — the outcomes

Six numbers every AI-SDLC programme is shopping for. Plan Forge surfaces all six on the live dashboard out of the box, no warehouse project, no BI build, no glue code.

AI-PR %
share of merged PRs touched by an agent
% code by AI
bytes-changed-by-agent vs human, per slice
Pass-rate / phase
first-pass success: design / code / review / test
RCA MTTR
incident-fired → fix-validated, hours
Drift score
codebase-vs-architecture, scored per commit
$ / merged PR
token spend reconciled against shipped value

The leading-indicator metric leadership usually asks for last, human-intervention frequency, is also captured automatically. Every time a human took over from an agent is recorded; trend lines show whether the harness is getting better or worse. See Health DNA for the full metric catalogue, or the quick reference for the complete dashboard surface.

The picture — harness (orchestration) on substrate (primitives)

Read top-down: outcomes you get, the harness (the orchestration layer Plan Forge provides), the substrate (GitHub Copilot's primitives) it sits on, and the GitHub platform foundation everything inherits.

AI SDLC Stack

End to end — harness on substrate

The first complete AI software-development lifecycle stack: GitHub Copilot below, Plan Forge above, your outcomes on top.

Outcomes, what the platform delivers AI-PR % · % code by AI · pass-rate per phase · RCA MTTR · drift score · cost per merged PR Plan-aware delivery scope contract · slice gates AI-aware code review 20 specialised reviewers Closed-loop RCA & fix register → diagnose → verify Drift & quality eval LLM-as-judge · 0–100 score Audit-grade governance leaderboard + runbooks Plan Forge, the harness (orchestration) Open-source · MIT · runs on the GitHub substrate below · the orchestration lane GitHub leaves to the ecosystem Orchestration ▸ Crucible, interview funnel ▸ Tempering, quality scorer ▸ Inner Loop, competitive worktrees ▸ Forge-Master, chat-first router ▸ 20 Reviewer Agents · 14 Skills ▸ Reflexion retry · Auto-skill library ▸ Lifecycle hooks (pre/post slice) → Agent Factory · grows itself per slice Memory · L1 / L2 / L3 ▸ L1 Hub, live WebSocket ▸ L2 Files, .forge/ append-only ▸ L3 OpenBrain, pgvector semantic ▸ Cross-team federation (read-only) ▸ Bridge-and-flush durability ▸ search_thoughts · brain_recall ▸ Cross-project · cross-session → Context quality compounds across teams Eval & Drift ▸ Quorum, Claude · GPT · Gemini ▸ 0–100 consensus · LLM-as-judge ▸ forge_drift_report (per-commit) ▸ forge_health_trend · trajectories ▸ forge_estimate_quorum (cost preview) ▸ forge_fix_proposal (RCA → PR) ▸ % code by AI · MTTR · drift score → Quality, not just adoption Governance & Self-Repair ▸ LiveGuard hooks (preDeploy / postSlice) ▸ Bug Registry · Incident Capture · MTTR ▸ Audit Loop, Coverity-equivalent ▸ forge_runbook · Deploy Journal ▸ Remote Bridge (Slack / Teams / PD) ▸ Watcher, read-only by schema ▸ forge_meta_bug_file (self-repair) → Approve from your phone · audit-grade GitHub Copilot, the substrate (primitives) Multi-model · one IP boundary · one SCIM endpoint · one audit log · per-developer + per-IDE primitives Chat & Edits in-IDE Copilot CLI per-slice worker Cloud Agent issue → PR · @copilot Code Review PR-native Spaces curated context Multi-model Claude · GPT · Gemini MCP + SDK tool surface Metrics API adoption + flow GitHub platform, the foundation GHAS CodeQL · Autofix · Dependabot GitHub Actions PR gates · CI/CD Issues · PRs · Projects system of record · data residency SCIM · Audit Log single chain · SOC 2 / FedRAMP IP Indemnification Microsoft Customer Copyright Commitment
Read top-down: the green band is what you ship. The amber band is Plan Forge, the harness (orchestration) that produces those outcomes. The blue band is the GitHub Copilot substrate (primitives) the harness sits on. The slate band is the GitHub platform foundation everything inherits.

The four pillars — what the harness actually does

Plan Forge organises into four pillars. Each card is plain English; click What's inside for the component-level detail and the manual chapter that goes deep.

1 · Orchestration

Plans become slices, slices become work, work becomes audited PRs.

An idea is interviewed into a hardened plan. The plan is split into safe-sized slices. Each slice runs in its own worktree, gets reviewed by 20 specialised reviewer agents, and only ships if its validation gate passes. The platform learns from every run and builds new skills automatically.

What's inside & where to read more

Crucible interview funnel · Tempering quality scorer · Inner Loop competitive worktrees · Forge-Master chat-first router · 20 read-only reviewer agents · 14 slash-command skills · Reflexion retry · auto-skill library · lifecycle hooks (pre/post slice).

Crucible · Inner Loop · Forge-Master · Instructions & Agents · Agent Factory recipe · Multi-agent

… and more. Full surface area in the quick reference.

2 · Memory

Context quality compounds across teams instead of being a per-repo lottery.

Three tiers: a live event stream you can watch right now, a deterministic file trail every team can audit and grep, and an optional semantic store that lets one team's lessons surface automatically when another team hits a similar problem. Lessons learned in service A become defaults in service B without anyone filing a knowledge-base article.

What's inside & where to read more

L1 Hub, live WebSocket events · L2 Files, .forge/ append-only audit trail · L3 OpenBrain, pgvector semantic store · cross-team federation (read-only) · bridge-and-flush durability · search_thoughts · brain_recall.

Memory architecture

… and more. Full surface area in the quick reference.

3 · Eval & Drift

Quality, not just adoption, the half the GitHub Metrics API doesn't cover.

Three frontier models score the same change independently and a reviewer model produces a 0–100 consensus number. Drift from your architecture is measured per commit. RCA outputs become PR proposals, not tickets. Cost is previewed before the run, not after the bill.

What's inside & where to read more

Quorum (Claude + GPT + Gemini) · 0–100 LLM-as-judge consensus · forge_drift_report per-commit · forge_health_trend with trajectories · forge_estimate_quorum (cancellable cost preview) · forge_fix_proposal (RCA → PR) · % code by AI · MTTR · drift score.

Health DNA · Self-deterministic loop · Dashboard

… and more. Full surface area in the quick reference.

4 · Governance & Self-Repair

Audit-grade by default. Approve from your phone. The platform reports its own bugs upstream.

Hooks fire before every deploy and after every slice. Bugs deduplicate themselves. A separate read-only watcher tails any in-flight run. When the harness itself misbehaves, it files a structured bug report against its own upstream, you're never holding the bag alone on a platform issue.

What's inside & where to read more

LiveGuard hooks (preDeploy / postSlice / preAgentHandoff) · Bug Registry with fingerprint dedupe · Incident Capture + MTTR · Audit Loop (scan → triage → spawn-worker fix) · forge_runbook + Deploy Journal · Remote Bridge (Slack / Teams / PagerDuty / Discord / Telegram) · Watcher (read-only by schema) · forge_meta_bug_file self-repair.

What is LiveGuard · LiveGuard dashboard · Audit loop · Bug registry · Watcher · Remote bridge

… and more. Full surface area in the quick reference.

What we deliberately don't try to do

Discipline matters. A platform that tries to own everything ends up owning nothing well. Plan Forge does not:

  • Replicate the Copilot Metrics API, we add quality metrics; we don't re-implement adoption metrics
  • Embed or fork the Copilot Cloud Agent runtime, we dispatch to it
  • Compete with github/github-mcp-server, we use it; we ship our own MCP server only for orchestration concerns
  • Reinvent AGENTS.md, Skills, or MCP, we adopt the open standards; we contribute back when we learn something

If GitHub ships a feature that subsumes a Plan Forge capability, the right answer is to delete the Plan Forge code and use GitHub's. We're explicit about that in the project README.

Try it — on your own, on your own time

Plan Forge is MIT-licensed and open source. There's no sales call, no pilot agreement, no license to procure. If you already have GitHub Copilot and GHAS, you have everything you need to evaluate the full stack against your own repos this afternoon.

  1. Install in one repo. Clone github.com/srnichols/plan-forge, run setup.ps1 -Agent claude (or --agent codex / --agent cursor / --agent copilot). Generate Project Principles + initial instruction files via forge_run_skill /onboarding. Wire action.yml into GitHub Actions for PR-time gates. Walk-through: install + first plan.
  2. Run a real task end-to-end. Take one in-flight ticket through the full pipeline: Crucibleplan → execution → reviewer agents → Bug Registry if you hit one. The trajectory is captured automatically; you can replay it from the dashboard.
  3. Add a second repo, turn on what makes sense for you. Cloud Agent dispatch (--worker copilot-coding-agent) for async bulk work. LiveGuard hooks if you have a deploy pipeline. The Audit Loop if you want a Coverity-style scan over an existing module. Everything is opt-in.
  4. Read the dashboard. The six KPIs from "What you get" populate themselves as you run plans. Compare to your baseline. Decide whether to roll wider on your own schedule.

Cost to evaluate: zero beyond your existing Copilot + GHAS subscription. No new licences, no headcount, no infrastructure, no procurement cycle. Bring your own GHCP partner relationship if you have one, Plan Forge composes on top of whatever Copilot Enterprise tier and support arrangement you already use.

Stuck? File an issue at github.com/srnichols/plan-forge/issues, or open a discussion. Plan Forge ships forge_meta_bug_file precisely so problems with the platform get reported back automatically, you're not on your own.


Architect appendix · supporting context for technical readers

The signal: GitHub said this out loud in April 2026

On April 2, 2026, GitHub shipped the Copilot SDK in public preview. The release notes describe it as "the same production-tested agent runtime that powers GitHub Copilot cloud agent and Copilot CLI" exposed for application developers to embed.

The implication is unmistakable:

GitHub views agent orchestration as something built on top of their primitives, not inside them.

This page documents how Plan Forge composes with the primitives GitHub explicitly leaves to the ecosystem.

What GitHub ships (the substrate — primitives)

PrimitiveWhat it isStatus (May 2026)
Copilot Cloud Agent (formerly Coding Agent)Ephemeral Actions-powered runner. Single repo / single branch / single PR per task. Three modes: research-only, plan-only, branch-onlyGA
AGENTS.mdOpen standard for agent context filesStewarded by Agentic AI Foundation under the Linux Foundation. 60k+ repos use it. GitHub adopts; does not own
Agent SkillsOpen standard for agent procedural knowledgeRepo agentskills/agentskills, Apache 2.0, maintained by Anthropic. GitHub adopts
Model Context Protocol (MCP)Open standard for agent-to-tool integrationLinux Foundation project. Maintained by Anthropic et al. GitHub ships github/github-mcp-server (29.5k stars, MIT) as the reference implementation
.github/instructions/GitHub-native repo customizationGA. Plan Forge ships ~18 instruction files
.github/copilot-instructions.mdRepo-wide Copilot contextGA
.github/agents/Custom agent personasGA on github.com (preview in JetBrains/Eclipse/Xcode)
.github/hooks/Lifecycle hooks (preToolUse, postToolUse, sessionStart, etc.)GA
.github/skills/Repo-scoped skill definitionsGA
GitHub ActionsCI/CD runtime that powers Cloud AgentGA
GitHub Advanced Security (GHAS)Code scanning, secret scanning, DependabotGA
Copilot SpacesCurated context bundles for chatGA (chat-side; not yet a Cloud Agent execution context)
Copilot Metrics APIAdoption + flow metrics (active users, PR throughput, time-to-merge)GA
Copilot SDKEmbed the Cloud Agent runtime in your own appPublic preview, April 2, 2026
Custom propertiesOrg-level governance primitiveGA
Org runner controls + firewallCloud Agent runtime governanceGA (April 2026)

This is a strong, coherent substrate. It is also explicitly just the substrate.

What GitHub deliberately leaves to the ecosystem (the Plan Forge lane)

These are the surfaces GitHub does not ship and shows no sign of shipping, direct evidence from GitHub's own docs and roadmap:

GapEvidence
Hardened plan as versioned artifact with scope contract, slices, validation gates, drift detectionPlan-mode is session-scoped one-shot; no plan file format, no scope contract, no slice persistence
Cross-repo / multi-service orchestrationExplicit single-repo limitation: "Copilot can only make changes in the repository specified when you start a task. Copilot cannot make changes across multiple repositories in one run."
Multi-model quorum / consensus per taskNo built-in mechanism. Single model per session
Plan execution harness with per-slice gates and resume-from semanticscopilot-setup-steps.yml is one pre-flight hook; nothing slice-aware
Semantic eval harness (test pass rate, regression rate, plan-adherence)Metrics API explicitly does not measure quality, only adoption + flow
Cost prediction per task / per plan before executionOnly post-hoc Actions + premium-request totals
Live programmatic watch of an in-flight agent from external toolsSession UI is in-product only; no public stream
Cross-org / cross-team fleet console with queue, capacity, SLA visibilityOnly per-issue / per-project session UI
Pre-merge plan-adherence gatesNo first-party concept of "this PR drifted from the approved plan"
Agent skills / instructions sync across N reposUp to consumer (.github-private is the only template mechanism)
Multi-tenant cost budgets and prioritizationNot in product
A/B comparison of custom agents or models for the same task classNot in product
Cross-team / cross-project semantic memory so lessons compound across pilotsCopilot Spaces is chat-side and repo-scoped; no semantic recall across teams or sessions
Closed-loop RCA → fix-proposal → validate-fix pipeline@copilot on issues + GHAS Autofix are open-loop point features; no native bug registry, no multi-model RCA, no fix validation cycle
Coverity-style scan → triage → spawn-worker → fix loop for AI-generated driftGHAS scans + Autofix on findings only; nothing that spawns a worker per finding and iterates to convergence
Deploy-aware lifecycle hooks (preDeploy / postSlice / preAgentHandoff) with severity gatesExisting hooks (preToolUse / postToolUse / sessionStart) are session-scoped; nothing fires before deploys with severity blocking
Idea → hardened-plan interview funnel with lane-scoped Q&APlan-mode is single-shot session output; no interview funnel, no lane classification, no progressive refinement
Pre-flight plan-quality scorer (scope-contract clarity, slice sizing, gate strength, forbidden-actions)Nothing in product scores plan quality before execution
Specialized reviewer agent fleet (20+ read-only personas: arch / security / db / perf / a11y / multi-tenancy / CI-CD / compliance / dependency / observability)Copilot Code Review is singular and chat-prompted; no first-party persona library
Remote-bridge approval flows with resume-on-approve (Slack / Teams / PagerDuty / Telegram / Discord)GitHub notifications fire one-way; no inline-approve → resume-paused-slice flow
Deploy Journal + auto-generated runbook per planNo first-party concept of "audit record per deploy" or "runbook from this plan"
… and more. The full capability index lives in the quick reference and the manual book index.

GitHub's positioning is consistent: wrap your tool/data source as an MCP server, layer your customization via the open file standards (AGENTS.md, Skills, instructions), and build your orchestration on top of the SDK. That is exactly the Plan Forge architecture.

How Plan Forge composes with each GitHub primitive

A 16-row reference for architects mapping each GitHub-native primitive to the Plan Forge surface that consumes it. Click to expand.

Per-primitive composition table (16 rows)
GitHub primitiveHow Plan Forge consumes itWhere in Plan Forge
Copilot Cloud AgentPlan Forge dispatches plan slices to CCA via gh issue create --assignee @copilot. Trajectories captured to .forge/trajectories/<plan-slug>.jsonlpforge-mcp/orchestrator.mjs (--worker copilot-coding-agent mode)
AGENTS.mdPlan Forge generates and maintains AGENTS.md alongside .github/copilot-instructions.md so any AGENTS.md-aware agent (Claude Code, Cursor, Codex, Amp, Aider, Gemini CLI, Goose, Windsurf) consumes Plan Forge contextpforge-mcp/server.mjs setup phase
.github/instructions/Plan Forge ships ~18 instruction files covering architecture, security, testing, database, API, auth, error handling, deployment, performance, observability, version, status reporting, context fuel, self-repair, plan hardeningtemplates/.github/instructions/
.github/copilot-instructions.mdPlan Forge generates the project-scoped Copilot instructions during setup.ps1 / setup.shsetup.ps1, setup.sh
.github/agents/Plan Forge ships 20 custom agent personas (architecture, database, security, deploy, performance, test-runner, API contracts, accessibility, multi-tenancy, CI/CD, observability, dependency, compliance, plus 6 pipeline agents and an audit classifier)templates/.github/agents/
.github/hooks/Plan Forge ships its own lifecycle hooks: PreDeploy, PreCommit, PreAgentHandoff, PostSlice, plus plan-forge.json hook configuration. Distinct from Claude Code's hook names.templates/.github/hooks/
.github/skills/Plan Forge ships 11 skills as / slash-commands: database-migration, staging-deploy, test-sweep, dependency-audit, security-audit, code-review, release-notes, api-doc-gen, onboarding, health-check, forge-execute, audit-loop, plus pipeline skillstemplates/.github/skills/
MCPPlan Forge ships its own MCP server (pforge-mcp) with 102 tools covering planning, execution, eval, observability, cost, memory, search, timeline, notifications. Auto-generates .vscode/mcp.jsonpforge-mcp/server.mjs, pforge-mcp/tools.json
github/github-mcp-serverPlan Forge documents this as the canonical GitHub-side MCP integration. Plan Forge agents call it via the MCP plumbing they already speakdocs reference, .vscode/mcp.json example
GitHub ActionsPlan Forge plans can run as Actions workflows; pforge run-plan is callable from any runner. CCA itself runs in Actions and Plan Forge plans dispatched via CCA inherit Actions concurrency, runners, and minutesaction.yml
GitHub Advanced SecurityPlan Forge's forge_secret_scan, forge_dep_watch, and security-audit skill complement GHAS, not replace it. Plan Forge surfaces GHAS findings into plan-aware bug reportspforge-mcp/notifications/, dependency-reviewer.agent.md
Copilot SpacesPlan Forge plan files + Scope Contract are the equivalent concept for autonomous execution. Spaces serves chat-side context curation; Plan Forge serves execution-time scope bindingdocs reference
Copilot Metrics APIPlan Forge does not duplicate it. Plan Forge surfaces quality metrics (gate failure rates, drift scores, plan-adherence, regressions caught at gate boundary, cost per merged PR) that the Metrics API explicitly does notforge_health_trend, forge_drift_report, forge_cost_report
Copilot SDKPlan Forge does not embed the Copilot runtime. Plan Forge orchestrates across multiple agent runtimes (CCA, Claude Code, Codex, custom workers). The SDK is the right tool when you want to embed a single agent in your app; Plan Forge is the right tool when you want to coordinate many agent runs as a delivery pipelinearchitecture reference
Custom propertiesPlan Forge documents the recommended custom-property schema for governing per-team Plan Forge enablement, plan templates, and budget capstemplates/docs/CUSTOMIZATION.md
Org runner controlsPlan Forge dispatched plans inherit the org's runner policy. No conflict, no override neededdocs reference

Why this matters for the consolidation thesis

If your strategic direction is "consolidate on GitHub Enterprise + Copilot Enterprise," Plan Forge reinforces that choice rather than competing with it.

  • Cursor and Sourcegraph Amp are platform-agnostic by design. They work as well on GitLab and Bitbucket as on GitHub. Adopting them does not strengthen your GitHub investment.
  • GitHub Copilot Cloud Agent shipped the substrate but explicitly leaves orchestration to the ecosystem. Without an orchestration layer, the substrate is incomplete for fleet rollouts.
  • Plan Forge is the only project in the comparison set built specifically to extend GitHub primitives in the direction GitHub itself signaled is the ecosystem's lane. The architecture is a deliberate "yes, and" to GitHub's stack.

For Microsoft-shop enterprises pursuing the GitHub-native consolidation thesis, this is the cleanest path: GitHub for the substrate, Plan Forge for the orchestration layer, no third vendor in the picture.

Variations for Microsoft Foundry shops

For customers using Microsoft Foundry (Azure OpenAI, Foundry Agent Service, Foundry Toolboxes), Plan Forge composes additionally with:

  • Azure OpenAI as a first-class LLM provider (alongside GitHub Copilot, Anthropic, OpenAI, xAI). Auth via Entra ID (recommended), API key, or managed identity. Endpoint format https://{resource}.openai.azure.com/openai/v1/. Customer configures deployment names, not model families.
  • Foundry Toolboxes as MCP-compatible endpoints. Plan Forge already speaks MCP; pointing .vscode/mcp.json at a Foundry Toolbox endpoint is config, not code.
  • Foundry App Insights as the OTel sink. Plan Forge OTel traces land in the same dashboards as the customer's Foundry agent runs.

See Reference Architecture — Microsoft Foundry variant for the full picture.

Explore deeper

If the four pillars and the picture earned a closer look, jump straight to the chapters that go deep. Grouped for shoppers, builders, and operators.

… and more. Browse the full manual book index or the quick reference for everything.