An architectural blueprint scroll on the workbench drawing itself into a 5-layer stacked tower of glowing translucent rectangles, anchored at the anvil base, with Azure-blue accents suggesting a cloud tenant boundary
Appendix K

Enterprise Reference Architecture

One canonical architecture for a 5-team / 1000-developer fleet, plus the Microsoft Foundry composition variant for Azure-tenant deployments.

Audience: Platform architects and security engineers planning a multi-team Plan Forge deployment.

Scope: Generic enterprise architecture (Pattern A) and the Microsoft Foundry composition variant (Pattern B). Plus three network/isolation patterns including the air-gapped option that's a structural differentiator.

Design principles

Three constraints shape every architecture below:

  1. Local-first control plane. The Plan Forge orchestrator runs on the developer's box or a CI runner. There is no Plan Forge SaaS service. Source code does not leave the customer's network unless the customer chooses to call a hosted LLM.
  2. GitHub-native by design. Plan Forge consumes GitHub Issues, Copilot Cloud Agent, Actions, AGENTS.md, MCP, and the github-mcp-server as its substrate. Reinforces a GitHub Enterprise + Copilot Enterprise consolidation rather than competing with it.
  3. Open standards throughout. AGENTS.md (Linux Foundation), MCP (Linux Foundation), Agent Skills (Apache 2.0, Anthropic-maintained), OpenTelemetry gen_ai.* semantic conventions. No proprietary file formats.

Reference architecture A — Generic enterprise (5 teams, 1000 developers)

Generic 5-team enterprise reference architecture: developer workstations → GitHub Enterprise → CI/fleet execution → observability → LLM provider, all within the customer's network boundary.
Generic enterprise reference architecture, 5 teams × ~200 developers. Plan Forge orchestrator runs in the customer's network; only LLM inference may cross the boundary depending on provider choice.

Component responsibilities

ComponentOwnsDoes not own
Developer workstationLocal plan execution, IDE-time orchestration, the dashboard, all .forge/ artifactsMulti-team aggregation, long-running compute
GitHub EnterpriseSource of truth for repos, issues, PRs. Hosts Copilot Cloud Agent runs. Runs Actions workflowsPlan-level orchestration. Quality / eval / drift detection
Actions runnersLong-running plan execution, scheduled pforge run-plan jobs, fleet-scale dispatchInteractive developer-loop workflows
OTel collector + backendAll trace, metric, and log aggregation across teamsReal-time agent control
LLM providerInference for worker LLM callsPlan state, scope enforcement, gate validation

Data flow

  1. Developer (or CI) starts a plan run.
  2. Plan Forge orchestrator reads the plan file, builds the slice DAG, dispatches each slice to the configured worker (Copilot Cloud Agent for GitHub-native runs, Claude Code / Codex CLI for direct runs, etc.).
  3. Worker consumes AGENTS.md + plan slice context + MCP tools. Calls the configured LLM provider for completions.
  4. Plan Forge runs the slice's validation gate. On pass, advances. On fail, retries with reflexion or escalates per plan policy.
  5. Cost, trace, and event data is appended to .forge/runs/<id>/ locally and emitted to the OTel collector for fleet aggregation.
  6. PR is opened (Cloud Agent path) or commit is staged (direct path). Plan-aware diff (pforge diff) checks scope-contract adherence before merge.

Reference architecture B — Microsoft Foundry variant

For customers running on Microsoft Foundry (Azure OpenAI, Foundry Agent Service, Foundry Toolboxes), Plan Forge composes as the SDLC orchestrator layer above Foundry's model gateway and agent runtime.

Microsoft Foundry composition: Plan Forge orchestrates above Foundry; Foundry serves as model gateway; Foundry Agent Service hosts production agents; both share Foundry Toolbox (MCP) and App Insights (OTel sink). Entra ID and Private VNet support the boundary.
Microsoft Foundry composition variant. Plan Forge sits above Foundry as the SDLC orchestrator; Foundry sits below as model gateway and production agent runtime.

What sits where

  • Plan Forge above Foundry: Plan Forge is the SDLC orchestrator (specify, plan, harden, execute, validate, ship). Foundry is the model gateway and production agent runtime. Plan Forge is not inside Foundry, not beside Foundry as a peer agent product, but above Foundry as the higher-altitude orchestration layer.
  • Foundry as model provider: Plan Forge talks to AOAI via the OpenAI-compatible endpoint https://{resource}.openai.azure.com/openai/v1/. Auth via Entra ID (recommended), API key, or managed identity. Customer configures deployment names, not model families.
  • Foundry Toolbox as shared MCP surface: Customer's curated, governed, audited tool surface, exposed once via Foundry Toolbox, consumed by Plan Forge in worker sessions and by Foundry agents in production. Single source of truth for org tools.
  • App Insights as OTel sink: Plan Forge emits OTel traces (per the gen_ai.* spec). Pointed at the Foundry-attached Application Insights resource, Plan Forge runs show up in the same dashboards as Foundry agent runs.
  • Plan Forge generates code that deploys to Foundry: A Plan Forge plan can ship a feature that is a Foundry agent. deploy.instructions.md and the skill system include /staging-deploy and similar skills that target Foundry deployment paths.

What does not compose

  • Plan Forge workers do not run as Foundry hosted agents. Different lifetimes, different IO models. Plan Forge workers need filesystem/git/terminal; Foundry hosted agents are containerized with VM-isolated sandboxes per session.
  • Plan Forge does not register itself as a Foundry "fleet view" entity. Integration is one-way (Plan Forge writes to App Insights); the single pane of glass for Plan Forge runs is the Plan Forge dashboard.

Auth flow (Entra recommended)

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://ai.azure.com/.default"
)
client = OpenAI(
    base_url="https://YOUR-RESOURCE.openai.azure.com/openai/v1/",
    api_key=token_provider,
)

Required role assignment on the Foundry resource: Cognitive Services OpenAI User or Contributor.

Friction to design around

  1. Deployment-name vs model-name: Customer says "I'm using gpt-5.4-mini"; Plan Forge needs the deployment name (e.g., eastus-prod-mini).
  2. AOAI quota differs from OpenAI: Fixed TPM quotas per region per model, plus PTU for provisioned. A slice estimating 150K tokens against a 100K TPM deployment will throttle mid-run. Plan ahead.
  3. Government cloud: Azure Gov has a reduced model catalog (gpt-5.1, gpt-4.1 family, o3-mini, gpt-4o). Use the power-gov quorum preset (or graceful fallback) when targeting Azure Government.

Network and isolation patterns

Pattern 1: Fully cloud-LLM (typical SaaS company)

  • LLM calls go to public Anthropic / OpenAI / GitHub Copilot endpoints
  • Plan Forge runs locally, traces go to cloud-hosted observability
  • Lowest cost, fastest setup, weakest isolation
  • Right for: most non-regulated companies, internal tooling, dev productivity

Pattern 2: Hybrid (Microsoft-shop typical)

  • LLM calls go to Azure OpenAI in customer's tenant via private endpoint
  • Plan Forge runs locally and in customer's Azure DevOps / GitHub Actions
  • Traces to App Insights in same Azure subscription
  • Right for: regulated SaaS, fintech, healthtech with Microsoft preference

Pattern 3: Air-gapped (defense, sovereign cloud, regulated)

  • LLM calls go to on-prem inference (Foundry Local powered by Azure Local, Ollama, vLLM, or similar)
  • Plan Forge runs entirely in-network; no calls leave the boundary
  • OTel collector + backend in-network
  • GitHub Enterprise Server (GHES) instead of cloud
  • Right for: defense, FedRAMP High, IL5/IL6, sovereign cloud customers

Plan Forge is structurally compatible with all three. Pattern 3 is the differentiator, Cursor cannot offer this (control plane in AWS), Sourcegraph Amp explicitly cannot (no self-host, no BYOK), GitHub Copilot Cloud Agent runs on GitHub-hosted infrastructure. For air-gapped requirements, Plan Forge is structurally the only viable option in the comparison set.

Capacity planning

Per-team sizing (typical)

For a team of ~50 developers running ~3 plans/day per developer:

ResourceEstimate
Plan Forge orchestrator processesOne per active developer, low CPU/memory (Node.js process, dashboard at :3100)
GitHub Actions minutes (CCA-dispatched plans)~15K min/month (varies wildly by plan complexity)
LLM tokens (mixed-mode quorum)~50M input + 10M output per team-month at moderate use
Storage (.forge/runs/ retention)~5GB / team / quarter at typical detail
OTel trace volume~100K spans / team / day

Org-level governance

  • Custom properties on repos to scope which Plan Forge plans are allowed
  • Org runner policies to control which Cloud Agent runners are available
  • Branch protection rules to require Plan Forge gate-passed status before merge
  • Cost budgets in .forge.json per repo or per team

Failure modes and mitigations

FailureDetectionMitigation
LLM provider outageOTel error rate spike on gen_ai.* spansPlan Forge supports multi-provider routing in .forge.json. Failover order configurable per slice
AOAI quota exhausted mid-sliceWorker error, gate failurePreflight quota check (planned), slice retry with backoff, cross-region failover via deployment alias
GitHub Actions runner exhaustionWorkflow queue depth, Cloud Agent session pendingSelf-hosted runner pool, prioritize critical plans via [P] tag and runner labels
Plan drift (PR diverges from approved plan)pforge diff post-executionPre-merge gate fails; reviewer-gate agent flags; review thread opened via forge_review_add
Cost runaway (slice loops or model misroutes)forge_cost_report anomaly, dashboard cost-tile alertPer-slice workerTimeoutMs cap, forge_alert_triage priority queue, in-loop stuck detector (planned)

Reference deployment timeline

For an enterprise rolling out across 5 teams in 90 days:

WeekMilestone
0Stakeholder alignment, pick LLM provider strategy, identify pilot team
1–2Pilot team installs Plan Forge, runs first plan against a known-easy feature, baseline cost + cycle time
3–4Pilot team runs 5+ plans, refines instruction files, captures lessons
5–6Add team 2 + team 3 in parallel; first multi-team observability dashboards
7–8Add teams 4 + 5; introduce shared MCP server (Foundry Toolbox or in-house equivalent)
9–10Org-wide rollout patterns formalized; cost guardrails; quality KPIs reported up
11–12First quarterly review; eval data informs next-quarter planning

See Appendix M — Fleet Operator Playbook for week-by-week specifics.