
GitHub Stack Alignment
The thesis: GitHub ships the agent runtime + integration standards + customization primitives + engagement metrics. Everything above the runtime is the ecosystem's lane. Plan Forge is built for that lane.
Who this page is for: Engineering leaders, platform engineers, and architects evaluating a complete AI-SDLC stack, whether you've already standardized on GitHub Copilot or you're shopping the category fresh.
Companion to: What is Plan Forge? · How it works · Appendix I — Plan Forge on the GitHub Stack (the surface-by-surface technical reference).
Plan Forge + GitHub Copilot ships four capabilities no other AI-SDLC platform on the market combines today:
- Three-tier memory so context quality compounds across teams instead of being a per-repo lottery
- Multi-model quorum eval, Claude + GPT + Gemini score the same slice independently, 0–100 LLM-as-judge consensus
- Audit Loop, scan-triage-fix loop for AI-generated drift, defaulting off, hard-blocked in production at the schema level
- Watcher, a second IDE session that tails any in-flight run, read-only by schema (literally cannot write to the target)
What you get — the outcomes
Six numbers every AI-SDLC programme is shopping for. Plan Forge surfaces all six on the live dashboard out of the box, no warehouse project, no BI build, no glue code.
The leading-indicator metric leadership usually asks for last, human-intervention frequency, is also captured automatically. Every time a human took over from an agent is recorded; trend lines show whether the harness is getting better or worse. See Health DNA for the full metric catalogue, or the quick reference for the complete dashboard surface.
The picture — harness (orchestration) on substrate (primitives)
Read top-down: outcomes you get, the harness (the orchestration layer Plan Forge provides), the substrate (GitHub Copilot's primitives) it sits on, and the GitHub platform foundation everything inherits.
End to end — harness on substrate
The first complete AI software-development lifecycle stack: GitHub Copilot below, Plan Forge above, your outcomes on top.
The four pillars — what the harness actually does
Plan Forge organises into four pillars. Each card is plain English; click What's inside for the component-level detail and the manual chapter that goes deep.
Plans become slices, slices become work, work becomes audited PRs.
An idea is interviewed into a hardened plan. The plan is split into safe-sized slices. Each slice runs in its own worktree, gets reviewed by 20 specialised reviewer agents, and only ships if its validation gate passes. The platform learns from every run and builds new skills automatically.
What's inside & where to read more
Crucible interview funnel · Tempering quality scorer · Inner Loop competitive worktrees · Forge-Master chat-first router · 20 read-only reviewer agents · 14 slash-command skills · Reflexion retry · auto-skill library · lifecycle hooks (pre/post slice).
→ Crucible · Inner Loop · Forge-Master · Instructions & Agents · Agent Factory recipe · Multi-agent
… and more. Full surface area in the quick reference.
Context quality compounds across teams instead of being a per-repo lottery.
Three tiers: a live event stream you can watch right now, a deterministic file trail every team can audit and grep, and an optional semantic store that lets one team's lessons surface automatically when another team hits a similar problem. Lessons learned in service A become defaults in service B without anyone filing a knowledge-base article.
What's inside & where to read more
L1 Hub, live WebSocket events · L2 Files, .forge/ append-only audit trail · L3 OpenBrain, pgvector semantic store · cross-team federation (read-only) · bridge-and-flush durability · search_thoughts · brain_recall.
… and more. Full surface area in the quick reference.
Quality, not just adoption, the half the GitHub Metrics API doesn't cover.
Three frontier models score the same change independently and a reviewer model produces a 0–100 consensus number. Drift from your architecture is measured per commit. RCA outputs become PR proposals, not tickets. Cost is previewed before the run, not after the bill.
What's inside & where to read more
Quorum (Claude + GPT + Gemini) · 0–100 LLM-as-judge consensus · forge_drift_report per-commit · forge_health_trend with trajectories · forge_estimate_quorum (cancellable cost preview) · forge_fix_proposal (RCA → PR) · % code by AI · MTTR · drift score.
→ Health DNA · Self-deterministic loop · Dashboard
… and more. Full surface area in the quick reference.
Audit-grade by default. Approve from your phone. The platform reports its own bugs upstream.
Hooks fire before every deploy and after every slice. Bugs deduplicate themselves. A separate read-only watcher tails any in-flight run. When the harness itself misbehaves, it files a structured bug report against its own upstream, you're never holding the bag alone on a platform issue.
What's inside & where to read more
LiveGuard hooks (preDeploy / postSlice / preAgentHandoff) · Bug Registry with fingerprint dedupe · Incident Capture + MTTR · Audit Loop (scan → triage → spawn-worker fix) · forge_runbook + Deploy Journal · Remote Bridge (Slack / Teams / PagerDuty / Discord / Telegram) · Watcher (read-only by schema) · forge_meta_bug_file self-repair.
→ What is LiveGuard · LiveGuard dashboard · Audit loop · Bug registry · Watcher · Remote bridge
… and more. Full surface area in the quick reference.
What we deliberately don't try to do
Discipline matters. A platform that tries to own everything ends up owning nothing well. Plan Forge does not:
- Replicate the Copilot Metrics API, we add quality metrics; we don't re-implement adoption metrics
- Embed or fork the Copilot Cloud Agent runtime, we dispatch to it
- Compete with
github/github-mcp-server, we use it; we ship our own MCP server only for orchestration concerns - Reinvent AGENTS.md, Skills, or MCP, we adopt the open standards; we contribute back when we learn something
If GitHub ships a feature that subsumes a Plan Forge capability, the right answer is to delete the Plan Forge code and use GitHub's. We're explicit about that in the project README.
Try it — on your own, on your own time
Plan Forge is MIT-licensed and open source. There's no sales call, no pilot agreement, no license to procure. If you already have GitHub Copilot and GHAS, you have everything you need to evaluate the full stack against your own repos this afternoon.
- Install in one repo. Clone
github.com/srnichols/plan-forge, runsetup.ps1 -Agent claude(or--agent codex/--agent cursor/--agent copilot). Generate Project Principles + initial instruction files viaforge_run_skill /onboarding. Wireaction.ymlinto GitHub Actions for PR-time gates. Walk-through: install + first plan. - Run a real task end-to-end. Take one in-flight ticket through the full pipeline: Crucible → plan → execution → reviewer agents → Bug Registry if you hit one. The trajectory is captured automatically; you can replay it from the dashboard.
- Add a second repo, turn on what makes sense for you. Cloud Agent dispatch (
--worker copilot-coding-agent) for async bulk work. LiveGuard hooks if you have a deploy pipeline. The Audit Loop if you want a Coverity-style scan over an existing module. Everything is opt-in. - Read the dashboard. The six KPIs from "What you get" populate themselves as you run plans. Compare to your baseline. Decide whether to roll wider on your own schedule.
Cost to evaluate: zero beyond your existing Copilot + GHAS subscription. No new licences, no headcount, no infrastructure, no procurement cycle. Bring your own GHCP partner relationship if you have one, Plan Forge composes on top of whatever Copilot Enterprise tier and support arrangement you already use.
Stuck? File an issue at github.com/srnichols/plan-forge/issues, or open a discussion. Plan Forge ships forge_meta_bug_file precisely so problems with the platform get reported back automatically, you're not on your own.
Architect appendix · supporting context for technical readers
The signal: GitHub said this out loud in April 2026
On April 2, 2026, GitHub shipped the Copilot SDK in public preview. The release notes describe it as "the same production-tested agent runtime that powers GitHub Copilot cloud agent and Copilot CLI" exposed for application developers to embed.
The implication is unmistakable:
GitHub views agent orchestration as something built on top of their primitives, not inside them.
This page documents how Plan Forge composes with the primitives GitHub explicitly leaves to the ecosystem.
What GitHub ships (the substrate — primitives)
| Primitive | What it is | Status (May 2026) |
|---|---|---|
| Copilot Cloud Agent (formerly Coding Agent) | Ephemeral Actions-powered runner. Single repo / single branch / single PR per task. Three modes: research-only, plan-only, branch-only | GA |
| AGENTS.md | Open standard for agent context files | Stewarded by Agentic AI Foundation under the Linux Foundation. 60k+ repos use it. GitHub adopts; does not own |
| Agent Skills | Open standard for agent procedural knowledge | Repo agentskills/agentskills, Apache 2.0, maintained by Anthropic. GitHub adopts |
| Model Context Protocol (MCP) | Open standard for agent-to-tool integration | Linux Foundation project. Maintained by Anthropic et al. GitHub ships github/github-mcp-server (29.5k stars, MIT) as the reference implementation |
.github/instructions/ | GitHub-native repo customization | GA. Plan Forge ships ~18 instruction files |
.github/copilot-instructions.md | Repo-wide Copilot context | GA |
.github/agents/ | Custom agent personas | GA on github.com (preview in JetBrains/Eclipse/Xcode) |
.github/hooks/ | Lifecycle hooks (preToolUse, postToolUse, sessionStart, etc.) | GA |
.github/skills/ | Repo-scoped skill definitions | GA |
| GitHub Actions | CI/CD runtime that powers Cloud Agent | GA |
| GitHub Advanced Security (GHAS) | Code scanning, secret scanning, Dependabot | GA |
| Copilot Spaces | Curated context bundles for chat | GA (chat-side; not yet a Cloud Agent execution context) |
| Copilot Metrics API | Adoption + flow metrics (active users, PR throughput, time-to-merge) | GA |
| Copilot SDK | Embed the Cloud Agent runtime in your own app | Public preview, April 2, 2026 |
| Custom properties | Org-level governance primitive | GA |
| Org runner controls + firewall | Cloud Agent runtime governance | GA (April 2026) |
This is a strong, coherent substrate. It is also explicitly just the substrate.
What GitHub deliberately leaves to the ecosystem (the Plan Forge lane)
These are the surfaces GitHub does not ship and shows no sign of shipping, direct evidence from GitHub's own docs and roadmap:
| Gap | Evidence |
|---|---|
| Hardened plan as versioned artifact with scope contract, slices, validation gates, drift detection | Plan-mode is session-scoped one-shot; no plan file format, no scope contract, no slice persistence |
| Cross-repo / multi-service orchestration | Explicit single-repo limitation: "Copilot can only make changes in the repository specified when you start a task. Copilot cannot make changes across multiple repositories in one run." |
| Multi-model quorum / consensus per task | No built-in mechanism. Single model per session |
| Plan execution harness with per-slice gates and resume-from semantics | copilot-setup-steps.yml is one pre-flight hook; nothing slice-aware |
| Semantic eval harness (test pass rate, regression rate, plan-adherence) | Metrics API explicitly does not measure quality, only adoption + flow |
| Cost prediction per task / per plan before execution | Only post-hoc Actions + premium-request totals |
| Live programmatic watch of an in-flight agent from external tools | Session UI is in-product only; no public stream |
| Cross-org / cross-team fleet console with queue, capacity, SLA visibility | Only per-issue / per-project session UI |
| Pre-merge plan-adherence gates | No first-party concept of "this PR drifted from the approved plan" |
| Agent skills / instructions sync across N repos | Up to consumer (.github-private is the only template mechanism) |
| Multi-tenant cost budgets and prioritization | Not in product |
| A/B comparison of custom agents or models for the same task class | Not in product |
| Cross-team / cross-project semantic memory so lessons compound across pilots | Copilot Spaces is chat-side and repo-scoped; no semantic recall across teams or sessions |
| Closed-loop RCA → fix-proposal → validate-fix pipeline | @copilot on issues + GHAS Autofix are open-loop point features; no native bug registry, no multi-model RCA, no fix validation cycle |
| Coverity-style scan → triage → spawn-worker → fix loop for AI-generated drift | GHAS scans + Autofix on findings only; nothing that spawns a worker per finding and iterates to convergence |
| Deploy-aware lifecycle hooks (preDeploy / postSlice / preAgentHandoff) with severity gates | Existing hooks (preToolUse / postToolUse / sessionStart) are session-scoped; nothing fires before deploys with severity blocking |
| Idea → hardened-plan interview funnel with lane-scoped Q&A | Plan-mode is single-shot session output; no interview funnel, no lane classification, no progressive refinement |
| Pre-flight plan-quality scorer (scope-contract clarity, slice sizing, gate strength, forbidden-actions) | Nothing in product scores plan quality before execution |
| Specialized reviewer agent fleet (20+ read-only personas: arch / security / db / perf / a11y / multi-tenancy / CI-CD / compliance / dependency / observability) | Copilot Code Review is singular and chat-prompted; no first-party persona library |
| Remote-bridge approval flows with resume-on-approve (Slack / Teams / PagerDuty / Telegram / Discord) | GitHub notifications fire one-way; no inline-approve → resume-paused-slice flow |
| Deploy Journal + auto-generated runbook per plan | No first-party concept of "audit record per deploy" or "runbook from this plan" |
| … and more. The full capability index lives in the quick reference and the manual book index. | |
GitHub's positioning is consistent: wrap your tool/data source as an MCP server, layer your customization via the open file standards (AGENTS.md, Skills, instructions), and build your orchestration on top of the SDK. That is exactly the Plan Forge architecture.
How Plan Forge composes with each GitHub primitive
A 16-row reference for architects mapping each GitHub-native primitive to the Plan Forge surface that consumes it. Click to expand.
Per-primitive composition table (16 rows)
| GitHub primitive | How Plan Forge consumes it | Where in Plan Forge |
|---|---|---|
| Copilot Cloud Agent | Plan Forge dispatches plan slices to CCA via gh issue create --assignee @copilot. Trajectories captured to .forge/trajectories/<plan-slug>.jsonl | pforge-mcp/orchestrator.mjs (--worker copilot-coding-agent mode) |
| AGENTS.md | Plan Forge generates and maintains AGENTS.md alongside .github/copilot-instructions.md so any AGENTS.md-aware agent (Claude Code, Cursor, Codex, Amp, Aider, Gemini CLI, Goose, Windsurf) consumes Plan Forge context | pforge-mcp/server.mjs setup phase |
.github/instructions/ | Plan Forge ships ~18 instruction files covering architecture, security, testing, database, API, auth, error handling, deployment, performance, observability, version, status reporting, context fuel, self-repair, plan hardening | templates/.github/instructions/ |
.github/copilot-instructions.md | Plan Forge generates the project-scoped Copilot instructions during setup.ps1 / setup.sh | setup.ps1, setup.sh |
.github/agents/ | Plan Forge ships 20 custom agent personas (architecture, database, security, deploy, performance, test-runner, API contracts, accessibility, multi-tenancy, CI/CD, observability, dependency, compliance, plus 6 pipeline agents and an audit classifier) | templates/.github/agents/ |
.github/hooks/ | Plan Forge ships its own lifecycle hooks: PreDeploy, PreCommit, PreAgentHandoff, PostSlice, plus plan-forge.json hook configuration. Distinct from Claude Code's hook names. | templates/.github/hooks/ |
.github/skills/ | Plan Forge ships 11 skills as / slash-commands: database-migration, staging-deploy, test-sweep, dependency-audit, security-audit, code-review, release-notes, api-doc-gen, onboarding, health-check, forge-execute, audit-loop, plus pipeline skills | templates/.github/skills/ |
| MCP | Plan Forge ships its own MCP server (pforge-mcp) with 102 tools covering planning, execution, eval, observability, cost, memory, search, timeline, notifications. Auto-generates .vscode/mcp.json | pforge-mcp/server.mjs, pforge-mcp/tools.json |
github/github-mcp-server | Plan Forge documents this as the canonical GitHub-side MCP integration. Plan Forge agents call it via the MCP plumbing they already speak | docs reference, .vscode/mcp.json example |
| GitHub Actions | Plan Forge plans can run as Actions workflows; pforge run-plan is callable from any runner. CCA itself runs in Actions and Plan Forge plans dispatched via CCA inherit Actions concurrency, runners, and minutes | action.yml |
| GitHub Advanced Security | Plan Forge's forge_secret_scan, forge_dep_watch, and security-audit skill complement GHAS, not replace it. Plan Forge surfaces GHAS findings into plan-aware bug reports | pforge-mcp/notifications/, dependency-reviewer.agent.md |
| Copilot Spaces | Plan Forge plan files + Scope Contract are the equivalent concept for autonomous execution. Spaces serves chat-side context curation; Plan Forge serves execution-time scope binding | docs reference |
| Copilot Metrics API | Plan Forge does not duplicate it. Plan Forge surfaces quality metrics (gate failure rates, drift scores, plan-adherence, regressions caught at gate boundary, cost per merged PR) that the Metrics API explicitly does not | forge_health_trend, forge_drift_report, forge_cost_report |
| Copilot SDK | Plan Forge does not embed the Copilot runtime. Plan Forge orchestrates across multiple agent runtimes (CCA, Claude Code, Codex, custom workers). The SDK is the right tool when you want to embed a single agent in your app; Plan Forge is the right tool when you want to coordinate many agent runs as a delivery pipeline | architecture reference |
| Custom properties | Plan Forge documents the recommended custom-property schema for governing per-team Plan Forge enablement, plan templates, and budget caps | templates/docs/CUSTOMIZATION.md |
| Org runner controls | Plan Forge dispatched plans inherit the org's runner policy. No conflict, no override needed | docs reference |
Why this matters for the consolidation thesis
If your strategic direction is "consolidate on GitHub Enterprise + Copilot Enterprise," Plan Forge reinforces that choice rather than competing with it.
- Cursor and Sourcegraph Amp are platform-agnostic by design. They work as well on GitLab and Bitbucket as on GitHub. Adopting them does not strengthen your GitHub investment.
- GitHub Copilot Cloud Agent shipped the substrate but explicitly leaves orchestration to the ecosystem. Without an orchestration layer, the substrate is incomplete for fleet rollouts.
- Plan Forge is the only project in the comparison set built specifically to extend GitHub primitives in the direction GitHub itself signaled is the ecosystem's lane. The architecture is a deliberate "yes, and" to GitHub's stack.
For Microsoft-shop enterprises pursuing the GitHub-native consolidation thesis, this is the cleanest path: GitHub for the substrate, Plan Forge for the orchestration layer, no third vendor in the picture.
Variations for Microsoft Foundry shops
For customers using Microsoft Foundry (Azure OpenAI, Foundry Agent Service, Foundry Toolboxes), Plan Forge composes additionally with:
- Azure OpenAI as a first-class LLM provider (alongside GitHub Copilot, Anthropic, OpenAI, xAI). Auth via Entra ID (recommended), API key, or managed identity. Endpoint format
https://{resource}.openai.azure.com/openai/v1/. Customer configures deployment names, not model families. - Foundry Toolboxes as MCP-compatible endpoints. Plan Forge already speaks MCP; pointing
.vscode/mcp.jsonat a Foundry Toolbox endpoint is config, not code. - Foundry App Insights as the OTel sink. Plan Forge OTel traces land in the same dashboards as the customer's Foundry agent runs.
See Reference Architecture — Microsoft Foundry variant for the full picture.
Explore deeper
If the four pillars and the picture earned a closer look, jump straight to the chapters that go deep. Grouped for shoppers, builders, and operators.
… and more. Browse the full manual book index or the quick reference for everything.