Constellation of glowing amber instruction-rule nodes interconnected by threads of light, auto-loading guardrails activating in context
Chapter 10 · Reference

Agents, Skills & Hooks

Complete reference for the 14 reviewer agents, 16 slash-command skills (6 shared + 10 per stack), and the lifecycle hook system.

Instruction files catalog, looking for the auto-loading rules, universal files, and domain catalog? See Chapter 10 — Instruction Files & Agents.

Agents

14 reviewer agents organized in three categories. Agents are read-only, they audit code but can't edit files.

Stack-Specific Agents (6)

Vary by preset, examples for dotnet:

AgentReviews
architecture-reviewerLayer separation, dependency direction, SOLID
database-reviewerQuery patterns, migrations, connection management
deploy-reviewerDockerfile, health checks, container optimization
performance-reviewerHot paths, allocations, async patterns
security-reviewerInput validation, auth, secrets, OWASP
test-runnerTest coverage, test patterns, mocking strategy

Cross-Stack Agents (8)

Shared across all presets, same expertise regardless of language:

AgentReviews
api-contract-reviewerAPI versioning, backward compatibility, OpenAPI
accessibility-reviewerWCAG 2.2, semantic HTML, ARIA, keyboard nav
multi-tenancy-reviewerTenant isolation, data leakage, RLS, cache separation
cicd-reviewerPipeline safety, secrets, rollback strategies
observability-reviewerStructured logging, distributed tracing, metrics
dependency-reviewerCVEs, outdated packages, license conflicts
compliance-reviewerGDPR, CCPA, SOC2, PII handling, audit logs
error-handling-reviewerException hierarchy, error boundaries, ProblemDetails

Pipeline Agents (6)

Drive the 7-step pipeline with handoff buttons between stages:

AgentPipeline StepWhat It Does
specifierStep 0Interviews you, produces specification
preflightStep 1Verifies prerequisites, checks environment readiness
plan-hardenerStep 2Converts spec into hardened execution contract
executorStep 3Executes slices, validates gates
reviewer-gateStep 5Independent audit for drift and compliance
shipperStep 6Commits, updates roadmap, captures lessons

Skills

Skills are multi-step procedures the AI runs end-to-end, they read files, write files, run terminal commands, and emit events the dashboard can watch. Unlike agents (which review) and hooks (which gate), skills do work. There are two tiers: shared skills installed across every preset, and stack-specific skills tailored to the chosen language.

The SKILL.md runtime contract

Every skill is a single Markdown file with YAML frontmatter followed by numbered ### N. Step Name sections. The skill-runner parses the file into a step DAG, executes bash blocks per step, and emits lifecycle events to the WebSocket hub. The contract:

Frontmatter fieldRequiredPurpose
nameYesSlash-command alias (the file's directory name). name: database-migration/database-migration.
descriptionYesOne-paragraph trigger guidance. The classifier matches user prompts against this field. Best practice: include USE FOR and DO NOT USE FOR phrases.
argument-hintOptionalOne-line example of the argument shape, surfaced in the slash-command picker.
toolsOptionalAllow-list of tools the skill may invoke. Inline (tools: [run_in_terminal, read_file]) or block list. Enforces least-privilege at runtime.

After the frontmatter, three Markdown sections are recognized by the runner:

  • ## Steps, the main body. Each ### N. Step Name heading defines one step; bash fences inside are executed in order. A step with no bash block is informational and auto-passes.
  • ## Safety Rules, bullet list of invariants. Surfaced in the dashboard and injected into the skill's context.
  • ## Persistent Memory, optional. Block appended to the skill's L2 capture so OpenBrain remembers cross-run lessons.

Two structural patterns are recognized inside step bodies:

  • Conditionals, a blockquote starting with If <condition> → <action>. If the action contains the word skip, the runner aborts the current step's remaining commands. Used for early-exit paths like “If migration fails → rollback & STOP.”
  • Temper Guards, a Markdown table at the end of the file capturing shortcuts authors took that broke the skill. Surfaced in code review and the Plan Forge knowledge graph.

Events the runner emits

Every skill execution emits four event types on the hub (cataloged in Appendix V — Skills events):

EventWhenPayload
skill-startedOnce, at entry{ skillName, stepCount }
skill-step-startedBefore each step{ skillName, stepNumber, stepName }
skill-step-completedAfter each step{ skillName, stepNumber, stepName, status, duration }
skill-completedOnce, at exit{ skillName, passed, failed, duration }

Three ways to invoke a skill

  • Slash command in chat, /database-migration add user_profiles table. The most common path; works in VS Code Copilot, Claude, Cursor, and Codex once setup.ps1 -Agent <name> has run.
  • MCP tool, forge_run_skill with { name, args }. Returns the same lifecycle events plus a structured result envelope.
  • REST, POST /api/tool/forge_run_skill through the generic dispatcher (see Appendix W). Used by the dashboard and any external integration.

The orchestrator can also defer a skill into the decision tray when it wants a human to choose; clients query GET /api/skills/pending and resolve through POST /api/skills/{accept,reject,defer} (full surface in Appendix W — Skills).

Shared skills (every preset)

Six skills ship under presets/shared/skills/ and install regardless of language. These are the cross-cutting workflows.

SkillInvocationWhat it doesKey tools
audit-loop/audit-loop [--max=N --env=dev]Recursive scan → triage → fix until findings converge to zero. The orchestrator's drain loop, exposed as a one-shot.forge_tempering_*, forge_bug_register, forge_triage_route
forge-execute/forge-executeGuided plan execution: list plans → estimate cost → execute → report. The friendly path for new users.forge_run_plan, forge_estimate_quorum, forge_cost_report
forge-quench/forge-quench <plan>Final hardening pass before committing a plan, runs validators and the completeness sweep.forge_validate, forge_sweep
forge-troubleshoot/forge-troubleshootDiagnose common Plan Forge issues: missing API keys, stale orchestrator logs, broken hub, hook conflicts.forge_smith, forge_diagnose
health-check/health-checkForge diagnostic chain: forge_smithforge_validateforge_sweep. Run on a clean checkout before opening a PR.forge_smith, forge_validate, forge_sweep
security-audit (shared variant)/security-auditGeneric OWASP scan, secrets detection, severity report. Stack presets override with language-specific scanners.forge_secret_scan, forge_dep_watch

Stack-specific skills (per language preset)

Ten skills ship per language preset under presets/<stack>/.github/skills/. Same skill names across stacks, but the implementation calls the language's idiomatic toolchain, database-migration uses Knex / Prisma for TypeScript, EF Core for .NET, Alembic for Python, GORM for Go, and so on.

SkillInvocationWhat it does
api-doc-gen/api-doc-genGenerate or update OpenAPI spec, validate spec-to-code consistency.
code-review/code-reviewComprehensive review: architecture, security, testing, patterns.
database-migration/database-migration "<change>"Generate, review, test locally, deploy to staging, with rollback. Five-step DAG with conditional early-exit on migration failure.
dependency-audit/dependency-auditScan for vulnerabilities, outdated packages, license issues. Wraps npm audit / dotnet list package --vulnerable / pip-audit per stack.
forge-quench (stack variant)/forge-quenchSame shape as shared variant, but invokes the stack's linter and test runner.
onboarding/onboardingWalk a new developer through project setup, architecture, and first task.
release-notes/release-notes "<tag>"Generate release notes from git history and CHANGELOG. Output formatted for GitHub Release, Slack, or email.
security-audit (stack variant)/security-auditLanguage-specific OWASP scan plus shared scanners. Wraps semgrep / bandit / brakeman / govulncheck per stack.
staging-deploy/staging-deployBuild, push, migrate, deploy, and verify on staging with health-check probe.
test-sweep/test-sweep [category]Run all test suites (unit, integration, API, E2E) and aggregate results into a summary report. Run before the Review Gate.

Authoring a new skill

The minimum viable skill is one frontmatter block + one numbered step. Drop it under .github/skills/<name>/SKILL.md and it's available as /<name> in the next chat session. Example:

---
name: deploy-canary
description: "Deploy current branch to canary environment and watch metrics for 10 minutes. USE FOR: gradual rollout. DO NOT USE FOR: hotfixes (use /staging-deploy)."
argument-hint: "[optional: minutes to watch, default 10]"
tools: [run_in_terminal, read_file]
---

# Deploy Canary Skill

## Steps

### 1. Build & Push
```bash
docker build -t myapp:canary .
docker push myregistry/myapp:canary
```

### 2. Apply
```bash
kubectl set image deployment/myapp myapp=myregistry/myapp:canary -n canary
kubectl rollout status deployment/myapp -n canary --timeout=2m
```

### Conditional: Rollout Failure
> If rollout fails → immediately `kubectl rollout undo`, report the error, and STOP. Do not proceed to watch.

### 3. Watch
```bash
sleep ${MINUTES:-600}
kubectl logs -l app=myapp -n canary --tail=200
```

## Safety Rules
- NEVER deploy from a dirty working tree
- ALWAYS rollback within 60s if 5xx rate exceeds 1%

Authoring guidance:

  • Step granularity: one step = one logical unit the dashboard can show pass/fail for. Keep step count under 8; longer skills should split into sub-skills called from a coordinator.
  • Conditional placement: use blockquote conditionals between steps for early-exit paths. They render distinctly in the dashboard timeline.
  • Tool allow-list: list only the tools the skill actually needs. The runtime warns on unused entries and refuses unlisted tool calls.
  • Idempotency: assume the skill may be re-run after a partial failure. Use IF NOT EXISTS, check-then-act patterns, and explicit cleanup steps.
  • Persistent memory: when the skill learns something cross-cutting (a new failure mode, an env-specific quirk), append it under ## Persistent Memory. The capture is routed to L2 and, if configured, L3 OpenBrain.
Reference reading: every skill in presets/<stack>/.github/skills/ is a worked example. The richest are database-migration (5-step DAG with conditional rollback) and audit-loop (recursive convergence loop).

Lifecycle Hooks

Hooks run automatically during agent sessions, no manual activation:

HookWhenWhat It Enforces
SessionStartSession beginsInjects Project Principles, current phase, forbidden patterns
PreToolUseBefore file editBlocks edits to paths listed in plan's Forbidden Actions
PostToolUseAfter file editAuto-formats, warns on TODO/FIXME/stub markers
StopSession endsWarns if code modified but no test run detected
PostToolUse warning: If you see "⚠ Deferred-work marker detected" after an edit, the AI left a TODO or stub. Address it before moving on, the completeness sweep (Step 4) will catch it anyway.

📄 Full reference: capabilities, Multi-Agent Setup — GitHub Copilot