Chapter 13

Advanced Execution

Model routing, quorum mode, cost optimization, CI integration, and resume strategies.

Prerequisite refresher: This chapter assumes you know what slices, gates, and scope contracts are (Chapter 2) and have run at least one plan (Chapter 4). If those terms are unfamiliar, start there.

Model Routing

Assign different models per role in .forge.json:

Same principle as a human team: let the junior do the legwork, the senior does the final check. Costs less, catches more.

.forge.json
{
  "modelRouting": {
    "default": "grok-4",
    "execute": "claude-sonnet-4.6",
    "review": "claude-opus-4.6"
  }
}

Use a fast/cheap model for execution and a more capable model for review. The orchestrator routes each slice to the appropriate model based on its role.

Escalation Chains

When a model fails a slice, the orchestrator automatically escalates to the next model in the chain:

.forge.json
{
  "escalationChain": ["grok-4", "claude-opus-4.6", "gpt-5.2-codex"]
}

Model A fails → Model B retries the same slice → Model C if B fails too. Emits slice-escalated WebSocket event at each step. No manual intervention required.

Escalation chain: grok-4 fails, escalates to claude-opus-4.6 which fails, escalates to gpt-5.2-codex which passes

Quorum Mode

Multi-model consensus for complex slices. Multiple models analyze the same problem independently, then a reviewer synthesizes the best approach.

Quorum flow: dispatch to 3 models, independent analysis, reviewer synthesizes, then execute
Terminal
# Force quorum on all slices
pforge run-plan docs/plans/Phase-7.md --quorum

# Auto-quorum: only trigger for complex slices (threshold ≥ 6)
pforge run-plan docs/plans/Phase-7.md --quorum=auto

# Custom threshold (1-10, higher = fewer slices use quorum)
pforge run-plan docs/plans/Phase-7.md --quorum=auto --quorum-threshold 8

# Flagship preset (Opus + GPT-5.3-Codex + Grok 4.20, threshold 5)
pforge run-plan docs/plans/Phase-7.md --quorum=power

# Fast preset (Sonnet + GPT-5.4-mini + Grok 4.1 Fast, threshold 7)
pforge run-plan docs/plans/Phase-7.md --quorum=speed
SettingEffectCost Impact
--quorumEvery slice gets multi-model consensus3× normal cost
--quorum=autoOnly slices above complexity threshold1.2–1.5× normal cost
--quorum=powerFlagship models (Opus + GPT-5.3-Codex + Grok 4.20), threshold 5, 5min timeout3× at threshold 5
--quorum=speedFast models (Sonnet + GPT-5.4-mini + Grok 4.1 Fast), threshold 7, 2min timeout1.5× at threshold 7
No flagSingle model per slice1× baseline cost

Cost Optimization

The orchestrator tracks model performance in .forge/model-performance.json — success rate, average cost, and duration per model. It auto-selects the cheapest model with >80% historical pass rate.

  • Preview costs: pforge run-plan --estimate docs/plans/Phase-7.md
  • Review spend: pforge cost or Dashboard Cost tab
  • Agent-per-slice routing: Override model per slice with --model flag
  • Reduce context: Use targeted Context: lists per slice (see Chapter 5)

API Key Configuration

API keys for external providers (xAI Grok, OpenAI) are resolved in order: environment variable → .forge/secrets.json → null.

For local development, store keys in the gitignored .forge/secrets.json:

.forge/secrets.json
{
  "XAI_API_KEY": "xai-...",
  "OPENAI_API_KEY": "sk-..."
}

The .forge/ directory is in .gitignore by default — secrets are never committed.

CI Integration

Add Plan Forge validation to your GitHub Actions PR workflow:

.github/workflows/plan-forge-validate.yml
- uses: srnichols/plan-forge-validate@v1
  with:
    analyze: true          # Run consistency scoring
    sweep: true            # Check for TODO/FIXME markers
    threshold: 60          # Minimum analyze score to pass

PRs that fail the threshold are blocked from merging. The action validates file counts, checks for unresolved placeholders, and runs pforge analyze.

Cloud Agent Execution

GitHub's Copilot cloud agent works on issues autonomously. Plan Forge integrates via .github/copilot-setup-steps.yml, which provisions the agent with Node.js, guardrails, MCP tools, and smith verification before it starts coding.

Parallel Execution

The orchestrator builds a DAG from [P] tags and [depends: Slice N] declarations. Independent slices run concurrently when workers are available. Merge checkpoints validate that all parallel branches resolved cleanly.

Conflict detection: If two parallel slices modify overlapping [scope:] paths, the orchestrator flags the conflict before execution starts.

Resume and Retry

Terminal
# Resume from slice 3 after fixing a failure
pforge run-plan docs/plans/Phase-7.md --resume-from 3

# Dry run — parse and validate without executing
pforge run-plan docs/plans/Phase-7.md --dry-run

When a gate fails, fix the issue manually, then resume. Completed slices are skipped — only remaining slices execute.

OpenBrain Memory

The OpenBrain integration bridges the 3-session model with long-term context. Prior decisions, patterns, and postmortems are automatically searched and injected at the start of each session. After every run, lessons are captured for future phases.

Install via extension: pforge ext add plan-forge-memory

📄 Full reference: capabilities.md, CLI-GUIDE.md → run-plan