Two intricate gear assemblies meshing above the anvil, brass clockwork (GitHub primitives) interlocking with glowing amber filigree (Plan Forge orchestration), sparking ember particles upward at the meshing point

Appendix H

GitHub Stack Alignment

The thesis: GitHub ships the agent runtime + integration standards + customization primitives + engagement metrics. Everything above the runtime is the ecosystem's lane. Plan Forge is built for that lane.

Who this page is for: Engineering leaders, platform engineers, and architects evaluating a complete AI-SDLC stack, whether you've already standardized on GitHub Copilot or you're shopping the category fresh.

Companion to: What is Plan Forge? · How it works · Appendix I — Plan Forge on the GitHub Stack (the surface-by-surface technical reference).

Why this combination is the only one in the category

Plan Forge + GitHub Copilot ships four capabilities no other AI-SDLC platform on the market combines today:

Three-tier memory so context quality compounds across teams instead of being a per-repo lottery
Multi-model quorum eval, Claude + GPT + Gemini score the same slice independently, 0–100 LLM-as-judge consensus
Audit Loop, scan-triage-fix loop for AI-generated drift, defaulting off, hard-blocked in production at the schema level
Watcher, a second IDE session that tails any in-flight run, read-only by schema (literally cannot write to the target)

In a hurry? Read the next three sections and stop: What you get · The picture · The four pillars. Then jump to Try it — on your own. Architects: the lower half of the page is the supporting context.

What you get — the outcomes

Six numbers every AI-SDLC programme is shopping for. Plan Forge surfaces all six on the live dashboard out of the box, no warehouse project, no BI build, no glue code.

AI-PR %

share of merged PRs touched by an agent

% code by AI

bytes-changed-by-agent vs human, per slice

Pass-rate / phase

first-pass success: design / code / review / test

RCA MTTR

incident-fired → fix-validated, hours

Drift score

codebase-vs-architecture, scored per commit

$ / merged PR

token spend reconciled against shipped value

The leading-indicator metric leadership usually asks for last, human-intervention frequency, is also captured automatically. Every time a human took over from an agent is recorded; trend lines show whether the harness is getting better or worse. See Health DNA for the full metric catalogue, or the quick reference for the complete dashboard surface.

The picture — harness (orchestration) on substrate (primitives)

Read top-down: outcomes you get, the harness (the orchestration layer Plan Forge provides), the substrate (GitHub Copilot's primitives) it sits on, and the GitHub platform foundation everything inherits.

AI SDLC Stack

End to end — harness on substrate

The first complete AI software-development lifecycle stack: GitHub Copilot below, Plan Forge above, your outcomes on top.

Read top-down: the green band is what you ship. The amber band is Plan Forge, the harness (orchestration) that produces those outcomes. The blue band is the GitHub Copilot substrate (primitives) the harness sits on. The slate band is the GitHub platform foundation everything inherits.

The four pillars — what the harness actually does

Plan Forge organises into four pillars. Each card is plain English; click What's inside for the component-level detail and the manual chapter that goes deep.

1 · Orchestration

Plans become slices, slices become work, work becomes audited PRs.

An idea is interviewed into a hardened plan. The plan is split into safe-sized slices. Each slice runs in its own worktree, gets reviewed by 20 specialised reviewer agents, and only ships if its validation gate passes. The platform learns from every run and builds new skills automatically.

What's inside & where to read more

Crucible interview funnel · Tempering quality scorer · Inner Loop competitive worktrees · Forge-Master chat-first router · 20 read-only reviewer agents · 14 slash-command skills · Reflexion retry · auto-skill library · lifecycle hooks (pre/post slice).

→ Crucible · Inner Loop · Forge-Master · Instructions & Agents · Agent Factory recipe · Multi-agent

… and more. Full surface area in the quick reference.

2 · Memory

Context quality compounds across teams instead of being a per-repo lottery.

Three tiers: a live event stream you can watch right now, a deterministic file trail every team can audit and grep, and an optional semantic store that lets one team's lessons surface automatically when another team hits a similar problem. Lessons learned in service A become defaults in service B without anyone filing a knowledge-base article.

What's inside & where to read more

L1 Hub, live WebSocket events · L2 Files, .forge/ append-only audit trail · L3 OpenBrain, pgvector semantic store · cross-team federation (read-only) · bridge-and-flush durability · search_thoughts · brain_recall.

→ Memory architecture

… and more. Full surface area in the quick reference.

3 · Eval & Drift

Quality, not just adoption, the half the GitHub Metrics API doesn't cover.

Three frontier models score the same change independently and a reviewer model produces a 0–100 consensus number. Drift from your architecture is measured per commit. RCA outputs become PR proposals, not tickets. Cost is previewed before the run, not after the bill.

What's inside & where to read more

Quorum (Claude + GPT + Gemini) · 0–100 LLM-as-judge consensus · forge_drift_report per-commit · forge_health_trend with trajectories · forge_estimate_quorum (cancellable cost preview) · forge_fix_proposal (RCA → PR) · % code by AI · MTTR · drift score.

→ Health DNA · Self-deterministic loop · Dashboard

… and more. Full surface area in the quick reference.

4 · Governance & Self-Repair

Audit-grade by default. Approve from your phone. The platform reports its own bugs upstream.

Hooks fire before every deploy and after every slice. Bugs deduplicate themselves. A separate read-only watcher tails any in-flight run. When the harness itself misbehaves, it files a structured bug report against its own upstream, you're never holding the bag alone on a platform issue.

What's inside & where to read more

LiveGuard hooks (preDeploy / postSlice / preAgentHandoff) · Bug Registry with fingerprint dedupe · Incident Capture + MTTR · Audit Loop (scan → triage → spawn-worker fix) · forge_runbook + Deploy Journal · Remote Bridge (Slack / Teams / PagerDuty / Discord / Telegram) · Watcher (read-only by schema) · forge_meta_bug_file self-repair.

→ What is LiveGuard · LiveGuard dashboard · Audit loop · Bug registry · Watcher · Remote bridge

… and more. Full surface area in the quick reference.

What we deliberately don't try to do

Discipline matters. A platform that tries to own everything ends up owning nothing well. Plan Forge does not:

Replicate the Copilot Metrics API, we add quality metrics; we don't re-implement adoption metrics
Embed or fork the Copilot Cloud Agent runtime, we dispatch to it
Compete with github/github-mcp-server, we use it; we ship our own MCP server only for orchestration concerns
Reinvent AGENTS.md, Skills, or MCP, we adopt the open standards; we contribute back when we learn something

If GitHub ships a feature that subsumes a Plan Forge capability, the right answer is to delete the Plan Forge code and use GitHub's. We're explicit about that in the project README.

Try it — on your own, on your own time

Plan Forge is MIT-licensed and open source. There's no sales call, no pilot agreement, no license to procure. If you already have GitHub Copilot and GHAS, you have everything you need to evaluate the full stack against your own repos this afternoon.

Install in one repo. Clone github.com/srnichols/plan-forge, run setup.ps1 -Agent claude (or --agent codex / --agent cursor / --agent copilot). Generate Project Principles + initial instruction files via forge_run_skill /onboarding. Wire action.yml into GitHub Actions for PR-time gates. Walk-through: install + first plan.
Run a real task end-to-end. Take one in-flight ticket through the full pipeline: Crucible → plan → execution → reviewer agents → Bug Registry if you hit one. The trajectory is captured automatically; you can replay it from the dashboard.
Add a second repo, turn on what makes sense for you. Cloud Agent dispatch (--worker copilot-coding-agent) for async bulk work. LiveGuard hooks if you have a deploy pipeline. The Audit Loop if you want a Coverity-style scan over an existing module. Everything is opt-in.
Read the dashboard. The six KPIs from "What you get" populate themselves as you run plans. Compare to your baseline. Decide whether to roll wider on your own schedule.

Cost to evaluate: zero beyond your existing Copilot + GHAS subscription. No new licences, no headcount, no infrastructure, no procurement cycle. Bring your own GHCP partner relationship if you have one, Plan Forge composes on top of whatever Copilot Enterprise tier and support arrangement you already use.

Stuck? File an issue at github.com/srnichols/plan-forge/issues, or open a discussion. Plan Forge ships forge_meta_bug_file precisely so problems with the platform get reported back automatically, you're not on your own.

Architect appendix · supporting context for technical readers

The signal: GitHub said this out loud in April 2026

On April 2, 2026, GitHub shipped the Copilot SDK in public preview. The release notes describe it as "the same production-tested agent runtime that powers GitHub Copilot cloud agent and Copilot CLI" exposed for application developers to embed.

The implication is unmistakable:

GitHub views agent orchestration as something built on top of their primitives, not inside them.

This page documents how Plan Forge composes with the primitives GitHub explicitly leaves to the ecosystem.

What GitHub ships (the substrate — primitives)

Primitive	What it is	Status (May 2026)
Copilot Cloud Agent (formerly Coding Agent)	Ephemeral Actions-powered runner. Single repo / single branch / single PR per task. Three modes: research-only, plan-only, branch-only	GA
AGENTS.md	Open standard for agent context files	Stewarded by Agentic AI Foundation under the Linux Foundation. 60k+ repos use it. GitHub adopts; does not own
Agent Skills	Open standard for agent procedural knowledge	Repo `agentskills/agentskills`, Apache 2.0, maintained by Anthropic. GitHub adopts
Model Context Protocol (MCP)	Open standard for agent-to-tool integration	Linux Foundation project. Maintained by Anthropic et al. GitHub ships `github/github-mcp-server` (29.5k stars, MIT) as the reference implementation
`.github/instructions/`	GitHub-native repo customization	GA. Plan Forge ships ~18 instruction files
`.github/copilot-instructions.md`	Repo-wide Copilot context	GA
`.github/agents/`	Custom agent personas	GA on github.com (preview in JetBrains/Eclipse/Xcode)
`.github/hooks/`	Lifecycle hooks (preToolUse, postToolUse, sessionStart, etc.)	GA
`.github/skills/`	Repo-scoped skill definitions	GA
GitHub Actions	CI/CD runtime that powers Cloud Agent	GA
GitHub Advanced Security (GHAS)	Code scanning, secret scanning, Dependabot	GA
Copilot Spaces	Curated context bundles for chat	GA (chat-side; not yet a Cloud Agent execution context)
Copilot Metrics API	Adoption + flow metrics (active users, PR throughput, time-to-merge)	GA
Copilot SDK	Embed the Cloud Agent runtime in your own app	Public preview, April 2, 2026
Custom properties	Org-level governance primitive	GA
Org runner controls + firewall	Cloud Agent runtime governance	GA (April 2026)

This is a strong, coherent substrate. It is also explicitly just the substrate.

What GitHub deliberately leaves to the ecosystem (the Plan Forge lane)

These are the surfaces GitHub does not ship and shows no sign of shipping, direct evidence from GitHub's own docs and roadmap:

Gap	Evidence
Hardened plan as versioned artifact with scope contract, slices, validation gates, drift detection	Plan-mode is session-scoped one-shot; no plan file format, no scope contract, no slice persistence
Cross-repo / multi-service orchestration	Explicit single-repo limitation: "Copilot can only make changes in the repository specified when you start a task. Copilot cannot make changes across multiple repositories in one run."
Multi-model quorum / consensus per task	No built-in mechanism. Single model per session
Plan execution harness with per-slice gates and resume-from semantics	`copilot-setup-steps.yml` is one pre-flight hook; nothing slice-aware
Semantic eval harness (test pass rate, regression rate, plan-adherence)	Metrics API explicitly does not measure quality, only adoption + flow
Cost prediction per task / per plan before execution	Only post-hoc Actions + premium-request totals
Live programmatic watch of an in-flight agent from external tools	Session UI is in-product only; no public stream
Cross-org / cross-team fleet console with queue, capacity, SLA visibility	Only per-issue / per-project session UI
Pre-merge plan-adherence gates	No first-party concept of "this PR drifted from the approved plan"
Agent skills / instructions sync across N repos	Up to consumer (`.github-private` is the only template mechanism)
Multi-tenant cost budgets and prioritization	Not in product
A/B comparison of custom agents or models for the same task class	Not in product
Cross-team / cross-project semantic memory so lessons compound across pilots	Copilot Spaces is chat-side and repo-scoped; no semantic recall across teams or sessions
Closed-loop RCA → fix-proposal → validate-fix pipeline	`@copilot` on issues + GHAS Autofix are open-loop point features; no native bug registry, no multi-model RCA, no fix validation cycle
Coverity-style scan → triage → spawn-worker → fix loop for AI-generated drift	GHAS scans + Autofix on findings only; nothing that spawns a worker per finding and iterates to convergence
Deploy-aware lifecycle hooks (preDeploy / postSlice / preAgentHandoff) with severity gates	Existing hooks (preToolUse / postToolUse / sessionStart) are session-scoped; nothing fires before deploys with severity blocking
Idea → hardened-plan interview funnel with lane-scoped Q&A	Plan-mode is single-shot session output; no interview funnel, no lane classification, no progressive refinement
Pre-flight plan-quality scorer (scope-contract clarity, slice sizing, gate strength, forbidden-actions)	Nothing in product scores plan quality before execution
Specialized reviewer agent fleet (20+ read-only personas: arch / security / db / perf / a11y / multi-tenancy / CI-CD / compliance / dependency / observability)	Copilot Code Review is singular and chat-prompted; no first-party persona library
Remote-bridge approval flows with resume-on-approve (Slack / Teams / PagerDuty / Telegram / Discord)	GitHub notifications fire one-way; no inline-approve → resume-paused-slice flow
Deploy Journal + auto-generated runbook per plan	No first-party concept of "audit record per deploy" or "runbook from this plan"
… and more. The full capability index lives in the quick reference and the manual book index.

GitHub's positioning is consistent: wrap your tool/data source as an MCP server, layer your customization via the open file standards (AGENTS.md, Skills, instructions), and build your orchestration on top of the SDK. That is exactly the Plan Forge architecture.

How Plan Forge composes with each GitHub primitive

A 16-row reference for architects mapping each GitHub-native primitive to the Plan Forge surface that consumes it. Click to expand.

Per-primitive composition table (16 rows)

GitHub primitive	How Plan Forge consumes it	Where in Plan Forge
Copilot Cloud Agent	Plan Forge dispatches plan slices to CCA via `gh issue create --assignee @copilot`. Trajectories captured to `.forge/trajectories/<plan-slug>.jsonl`	`pforge-mcp/orchestrator.mjs` (`--worker copilot-coding-agent` mode)
AGENTS.md	Plan Forge generates and maintains AGENTS.md alongside `.github/copilot-instructions.md` so any AGENTS.md-aware agent (Claude Code, Cursor, Codex, Amp, Aider, Gemini CLI, Goose, Windsurf) consumes Plan Forge context	`pforge-mcp/server.mjs` setup phase
`.github/instructions/`	Plan Forge ships ~18 instruction files covering architecture, security, testing, database, API, auth, error handling, deployment, performance, observability, version, status reporting, context fuel, self-repair, plan hardening	`templates/.github/instructions/`
`.github/copilot-instructions.md`	Plan Forge generates the project-scoped Copilot instructions during `setup.ps1` / `setup.sh`	`setup.ps1`, `setup.sh`
`.github/agents/`	Plan Forge ships 20 custom agent personas (architecture, database, security, deploy, performance, test-runner, API contracts, accessibility, multi-tenancy, CI/CD, observability, dependency, compliance, plus 6 pipeline agents and an audit classifier)	`templates/.github/agents/`
`.github/hooks/`	Plan Forge ships its own lifecycle hooks: `PreDeploy`, `PreCommit`, `PreAgentHandoff`, `PostSlice`, plus `plan-forge.json` hook configuration. Distinct from Claude Code's hook names.	`templates/.github/hooks/`
`.github/skills/`	Plan Forge ships 11 skills as `/` slash-commands: database-migration, staging-deploy, test-sweep, dependency-audit, security-audit, code-review, release-notes, api-doc-gen, onboarding, health-check, forge-execute, audit-loop, plus pipeline skills	`templates/.github/skills/`
MCP	Plan Forge ships its own MCP server (`pforge-mcp`) with 105 tools covering planning, execution, eval, observability, cost, memory, search, timeline, notifications. Auto-generates `.vscode/mcp.json`	`pforge-mcp/server.mjs`, `pforge-mcp/tools.json`
`github/github-mcp-server`	Plan Forge documents this as the canonical GitHub-side MCP integration. Plan Forge agents call it via the MCP plumbing they already speak	docs reference, `.vscode/mcp.json` example
GitHub Actions	Plan Forge plans can run as Actions workflows; `pforge run-plan` is callable from any runner. CCA itself runs in Actions and Plan Forge plans dispatched via CCA inherit Actions concurrency, runners, and minutes	`action.yml`
GitHub Advanced Security	Plan Forge's `forge_secret_scan`, `forge_dep_watch`, and security-audit skill complement GHAS, not replace it. Plan Forge surfaces GHAS findings into plan-aware bug reports	`pforge-mcp/notifications/`, `dependency-reviewer.agent.md`
Copilot Spaces	Plan Forge plan files + Scope Contract are the equivalent concept for autonomous execution. Spaces serves chat-side context curation; Plan Forge serves execution-time scope binding	docs reference
Copilot Metrics API	Plan Forge does not duplicate it. Plan Forge surfaces quality metrics (gate failure rates, drift scores, plan-adherence, regressions caught at gate boundary, cost per merged PR) that the Metrics API explicitly does not	`forge_health_trend`, `forge_drift_report`, `forge_cost_report`
Copilot SDK	Plan Forge does not embed the Copilot runtime. Plan Forge orchestrates across multiple agent runtimes (CCA, Claude Code, Codex, custom workers). The SDK is the right tool when you want to embed a single agent in your app; Plan Forge is the right tool when you want to coordinate many agent runs as a delivery pipeline	architecture reference
Custom properties	Plan Forge documents the recommended custom-property schema for governing per-team Plan Forge enablement, plan templates, and budget caps	`templates/docs/CUSTOMIZATION.md`
Org runner controls	Plan Forge dispatched plans inherit the org's runner policy. No conflict, no override needed	docs reference

Why this matters for the consolidation thesis

If your strategic direction is "consolidate on GitHub Enterprise + Copilot Enterprise," Plan Forge reinforces that choice rather than competing with it.

Cursor and Sourcegraph Amp are platform-agnostic by design. They work as well on GitLab and Bitbucket as on GitHub. Adopting them does not strengthen your GitHub investment.
GitHub Copilot Cloud Agent shipped the substrate but explicitly leaves orchestration to the ecosystem. Without an orchestration layer, the substrate is incomplete for fleet rollouts.
Plan Forge is the only project in the comparison set built specifically to extend GitHub primitives in the direction GitHub itself signaled is the ecosystem's lane. The architecture is a deliberate "yes, and" to GitHub's stack.

For Microsoft-shop enterprises pursuing the GitHub-native consolidation thesis, this is the cleanest path: GitHub for the substrate, Plan Forge for the orchestration layer, no third vendor in the picture.

Variations for Microsoft Foundry shops

For customers using Microsoft Foundry (Azure OpenAI, Foundry Agent Service, Foundry Toolboxes), Plan Forge composes additionally with:

Azure OpenAI as a first-class LLM provider (alongside GitHub Copilot, Anthropic, OpenAI, xAI). Auth via Entra ID (recommended), API key, or managed identity. Endpoint format https://{resource}.openai.azure.com/openai/v1/. Customer configures deployment names, not model families.
Foundry Toolboxes as MCP-compatible endpoints. Plan Forge already speaks MCP; pointing .vscode/mcp.json at a Foundry Toolbox endpoint is config, not code.
Foundry App Insights as the OTel sink. Plan Forge OTel traces land in the same dashboards as the customer's Foundry agent runs.

See Reference Architecture — Microsoft Foundry variant for the full picture.

Explore deeper

If the four pillars and the picture earned a closer look, jump straight to the chapters that go deep. Grouped for shoppers, builders, and operators.

Get started

Core concepts

Operate & observe

Architecture & deploy

GitHub stack alignment

Reference

… and more. Browse the full manual book index or the quick reference for everything.