Skip to main content
OrchestKit v6.7.1 — 67 skills, 38 agents, 77 hooks with Opus 4.6 support
OrchestKit
Skills

Verify

Comprehensive verification with parallel test agents. Use when verifying implementations or validating changes.

Command high

Verify Feature

Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.

Quick Start

/ork:verify authentication flow
/ork:verify user profile feature
/ork:verify --scope=backend database migrations

Opus 4.6: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.


STEP 0: Verify User Intent with AskUserQuestion

BEFORE creating tasks, clarify verification scope:

AskUserQuestion(
  questions=[{
    "question": "What scope for this verification?",
    "header": "Scope",
    "options": [
      {"label": "Full verification (Recommended)", "description": "All tests + security + code quality + grades"},
      {"label": "Tests only", "description": "Run unit + integration + e2e tests"},
      {"label": "Security audit", "description": "Focus on security vulnerabilities"},
      {"label": "Code quality", "description": "Lint, types, complexity analysis"},
      {"label": "Quick check", "description": "Just run tests, skip detailed analysis"}
    ],
    "multiSelect": false
  }]
)

Based on answer, adjust workflow:

  • Full verification: All 8 phases, all 6 parallel agents
  • Tests only: Skip phases 2 (security), 5 (UI/UX analysis)
  • Security audit: Focus on security-auditor agent
  • Code quality: Focus on code-quality-reviewer agent
  • Quick check: Run tests only, skip grading and suggestions

STEP 0b: Select Orchestration Mode

See Orchestration Mode for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.

Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.


Task Management (CC 2.1.16)

# Create main verification task
TaskCreate(
  subject="Verify [feature-name] implementation",
  description="Comprehensive verification with nuanced grading",
  activeForm="Verifying [feature-name] implementation"
)

# Create subtasks for 8-phase process
phases = ["Run code quality checks", "Execute security audit",
          "Verify test coverage", "Validate API", "Check UI/UX",
          "Calculate grades", "Generate suggestions", "Compile report"]
for phase in phases:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

8-Phase Workflow

See Verification Phases for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.

PhaseActivitiesOutput
1. Context GatheringGit diff, commit historyChanges summary
2. Parallel Agent Dispatch6 agents evaluate0-10 scores
3. Test ExecutionBackend + frontend testsCoverage data
4. Nuanced GradingComposite score calculationGrade (A-F)
5. Improvement SuggestionsEffort vs impact analysisPrioritized list
6. Alternative ComparisonCompare approaches (optional)Recommendation
7. Metrics TrackingTrend analysisHistorical data
8. Report CompilationEvidence artifactsFinal report

Phase 2 Agents (Quick Reference)

AgentFocusOutput
code-quality-reviewerLint, types, patternsQuality 0-10
security-auditorOWASP, secrets, CVEsSecurity 0-10
test-generatorCoverage, test qualityCoverage 0-10
backend-system-architectAPI design, asyncAPI 0-10
frontend-ui-developerReact 19, Zod, a11yUI 0-10
python-performance-engineerLatency, resources, scalingPerformance 0-10

Launch ALL agents in ONE message with run_in_background=True and max_turns=25.


Grading & Scoring

See Scoring Rubric for composite formula, grade thresholds, verdict criteria, and blocking rules. See Quality Model for dimension weights. See Grading Rubric for per-agent scoring criteria.


Evidence & Test Execution

See Evidence Collection for git commands, test execution patterns, metrics tracking, and post-verification feedback.


Policy-as-Code

See Policy-as-Code for configuration.

Define verification rules in .claude/policies/verification-policy.json:

{
  "thresholds": {
    "composite_minimum": 6.0,
    "security_minimum": 7.0,
    "coverage_minimum": 70
  },
  "blocking_rules": [
    {"dimension": "security", "below": 5.0, "action": "block"}
  ]
}

Report Format

See Report Template for full format. Summary:

# Feature Verification Report

**Composite Score: [N.N]/10** (Grade: [LETTER])

## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**

References

Rules


  • ork:implement - Full implementation with verification
  • ork:review-pr - PR-specific verification
  • run-tests - Detailed test execution
  • ork:quality-gates - Quality gate patterns

Version: 3.1.0 (February 2026)


Rules (2)

Evidence Collection Patterns — HIGH

Evidence Collection Patterns

Phase 1: Context Gathering

Run these commands in parallel in ONE message:

git diff main --stat
git log main..HEAD --oneline
git diff main --name-only | sort -u

Phase 3: Parallel Test Execution

Run backend and frontend tests in parallel:

# PARALLEL - Backend and frontend
cd backend && poetry run pytest tests/ -v --cov=app --cov-report=json
cd frontend && npm run test -- --coverage

Phase 7: Metrics Tracking

Store verification metrics in memory for trend analysis:

mcp__memory__create_entities(entities=[{
  "name": "verification-{date}-{feature}",
  "entityType": "VerificationMetrics",
  "observations": [f"composite_score: {score}", ...]
}])

Query trends: mcp__memory__search_nodes(query="VerificationMetrics")

Phase 8.5: Post-Verification Feedback

After report compilation, send scores to metrics-architect for KPI baseline tracking:

Task(subagent_type="metrics-architect", run_in_background=True, max_turns=15,
     prompt=f"""Receive verification scores for {feature}:

Composite: {composite_score}/10 (Grade: {grade})
Dimensional breakdown:
- Correctness: {scores['correctness']}/10
- Maintainability: {scores['maintainability']}/10
- Performance: {scores['performance']}/10
- Security: {scores['security']}/10
- Scalability: {scores['scalability']}/10
- Testability: {scores['testability']}/10
- Compliance: {scores['compliance']}/10

Update KPI baselines with these scores. Store trend data in memory
for historical comparison. Flag any dimensions that dropped below
their historical average.""")

Scoring Rubric — HIGH

Scoring Rubric

Composite Score

Each agent produces a 0-10 score with decimals for nuance. The composite score is a weighted sum using the weights from Quality Model.

Grade Thresholds

GradeScore RangeVerdict
A9.0-10.0READY FOR MERGE
B7.0-8.9READY FOR MERGE
C5.0-6.9IMPROVEMENTS RECOMMENDED
D3.0-4.9BLOCKED
F0.0-2.9BLOCKED

Key Decisions

DecisionChoiceRationale
Scoring scale0-10 with decimalsNuanced, not binary
Improvement priorityImpact / Effort ratioDo high-value first
Alternative comparisonOptional phaseOnly when multiple valid approaches
Metrics persistenceMemory MCPTrack trends over time

Improvement Suggestions

Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.

Blocking Rules

Verification can be blocked by policy-as-code rules. See Policy-as-Code for configuration of composite minimums, dimension minimums, and blocking rules.


References (8)

Alternative Comparison

Alternative Comparison

Evaluate current implementation against alternative approaches.

When to Compare

  • Multiple valid architectures exist
  • User asks "is this the best way?"
  • Major patterns were chosen (ORM vs raw SQL, REST vs GraphQL)
  • Performance/scalability concerns raised

Comparison Criteria

For Each Alternative

CriterionWeightDescription
Effort30%Implementation complexity (1-5 scale)
Risk25%Technical and operational risk (1-5 scale)
Benefit45%Value delivered, performance, maintainability (1-5 scale)

Migration Cost

FactorEstimate
Code changesFiles/lines affected
Data migrationSchema changes, backfill
TestingNew test coverage needed
Rollback riskReversibility

Decision Matrix Format

ApproachEffortRiskBenefitScore
CurrentNNN(E0.3 + R0.25 + B*0.45)
Alt ANNNcalculated
Alt BNNNcalculated

Note: Higher effort and risk are bad (invert for scoring), higher benefit is good.

Recommendation Formula:

Score = (5 - Effort) * 0.3 + (5 - Risk) * 0.25 + Benefit * 0.45

Output Template

### Alternative Comparison: [Topic]

**Current Approach:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]

**Alternative A:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]
- Migration effort: [1-5]

**Recommendation:** [Keep current / Switch to Alt A]
**Justification:** [1-2 sentences]

Grading Rubric

Verification Grading Rubric

0-10 scoring criteria for each verification dimension.

Score Levels

RangeLevelDescription
0-3PoorCritical issues, blocks merge
4-6AdequateFunctional but needs improvement
7-9GoodReady for merge, minor suggestions
10ExcellentExemplary, reference quality

Dimension Rubrics

Code Quality (Weight: 20%)

ScoreCriteria
10Zero lint errors/warnings, strict types, exemplary patterns
8-9Zero errors, < 5 warnings, minimal any, good patterns
6-71-3 errors, some warnings, acceptable patterns
4-54-10 errors, pattern issues, needs refactoring
1-3Many errors, poor patterns, high complexity
0Lint/type check fails to run

Security (Weight: 25%)

ScoreCriteria
10No vulnerabilities, all OWASP compliant, secure by design
8-9No critical/high, all OWASP, excellent practices
6-7No critical, 1-2 high, most OWASP compliant
4-5No critical, 3-5 high, some gaps
1-31+ critical or many high vulnerabilities
0Multiple critical, secrets exposed

Test Coverage (Weight: 20%)

ScoreCriteria
10>= 90% coverage, meaningful assertions, edge cases
8-9>= 80% coverage, good assertions, critical paths
6-7>= 70% coverage (target), basic assertions
4-550-69% coverage
1-330-49% coverage
0< 30% coverage or tests fail to run

API Compliance (Weight: 20%)

ScoreCriteria
10Perfect REST, RFC 9457 errors, documented, no N+1
8-9Good REST, proper validation, timeout protection
6-7Acceptable API, minor inconsistencies
4-5Several convention violations
1-3Poor API design, missing validation
0Broken or insecure endpoints

UI Compliance (Weight: 15%)

ScoreCriteria
10React 19 APIs, full Zod, WCAG AAA, exhaustive types
8-9Modern patterns, good validation, WCAG AA
6-7Acceptable patterns, some validation
4-5Dated patterns, missing validation
1-3Poor practices, accessibility issues
0Broken or inaccessible components

Grade Interpretation

CompositeGradeVerdict
9.0-10.0A+Ship it
8.0-8.9AReady for merge
7.0-7.9BMinor improvements optional
6.0-6.9CConsider improvements
5.0-5.9DImprovements recommended
< 5.0FDo not merge

Orchestration Mode

<!-- SHARED: keep in sync with ../../../assess/references/orchestration-mode.md -->

Orchestration Mode Selection

Shared logic for choosing between Agent Teams and Task tool orchestration in assess/verify skills.

Environment Check

import os
teams_available = os.environ.get("CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS") is not None
force_task_tool = os.environ.get("ORCHESTKIT_FORCE_TASK_TOOL") == "1"

if force_task_tool or not teams_available:
    mode = "task_tool"
else:
    # Teams available — use for full multi-dimensional work
    mode = "agent_teams" if scope == "full" else "task_tool"

Decision Rules

  1. CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS set --> Agent Teams mode (for full assessment/verification)
  2. Flag not set --> Task tool mode (default)
  3. Quick/single-dimension scope --> Task tool (regardless of flag)
  4. ORCHESTKIT_FORCE_TASK_TOOL=1 --> Task tool (override)

Agent Teams vs Task Tool

AspectTask Tool (Star)Agent Teams (Mesh)
TopologyAll agents report to leadAgents communicate with each other
Finding correlationLead cross-references after completionAgents share findings in real-time
Cross-domain overlapIndependent scoringAgents alert each other about overlapping concerns
Cost~200K tokens~500K tokens
Best forFocused/single-dimension workFull multi-dimensional assessment/verification

Fallback

If Agent Teams encounters issues mid-execution, fall back to Task tool for remaining work. This is safe because both modes produce the same output format (dimensional scores 0-10).

Context Window Note

For full codebase work (>20 files), use the 1M context window to avoid agent context exhaustion. On 200K context, scope discovery should limit files to prevent overflow.

Policy As Code

Policy-as-Code

Define verification policies as machine-readable configuration.

Policy Structure

version: "1.0"
name: policy-name
description: What this policy enforces

thresholds:
  composite_minimum: 6.0
  coverage_minimum: 70

rules:
  blockers: []    # Fail verification
  warnings: []    # Note but continue
  info: []        # Informational only

Rule Definition

Blocker Rules (Must Pass)

blockers:
  - dimension: security
    condition: below
    value: 5.0
    message: "Security score below minimum"

  - check: critical_vulnerabilities
    condition: above
    value: 0
    message: "Critical vulnerabilities found"

  - check: type_errors
    condition: above
    value: 0
    message: "TypeScript errors must be zero"

Warning Rules (Should Fix)

warnings:
  - dimension: code_quality
    condition: below
    value: 7.0
    message: "Code quality could be improved"

  - check: test_coverage
    condition: below
    value: 80
    message: "Coverage below recommended 80%"

Info Rules (Awareness)

info:
  - check: todo_count
    condition: above
    value: 5
    message: "Multiple TODOs found in code"

Threshold Configuration

ThresholdTypeDescription
composite_minimumfloatOverall score minimum (0-10)
coverage_minimumintTest coverage percentage
critical_vulnerabilitiesintMax critical vulns (0)
high_vulnerabilitiesintMax high vulns
lint_errorsintMax lint errors (0)
type_errorsintMax type errors (0)

Custom Rules

custom_rules:
  - name: no_console_log
    pattern: "console\\.log"
    file_glob: "**/*.ts"
    exclude: ["**/*.test.ts"]
    severity: warning
    message: "Remove console.log from production"

Policy Location

Store at: .claude/policies/verification-policy.yaml

Multiple policies: .claude/policies/\{name\}-policy.yaml

Quality Model

<!-- SHARED: keep in sync with ../../../assess/references/quality-model.md -->

Quality Model

Canonical scoring reference for assess and verify skills. Defines unified dimensions, weights, grade thresholds, and improvement prioritization.

Scoring Dimensions (7 Unified)

DimensionWeightWhat It Measures
Correctness0.15Does it work correctly? Functional accuracy, edge cases handled
Maintainability0.15Easy to understand and modify? Readability, complexity, patterns
Performance0.12Efficient execution? No bottlenecks, resource usage, latency
Security0.20Follows security best practices? OWASP, secrets, CVEs, input validation
Scalability0.10Handles growth? Load patterns, data volume, horizontal scaling
Testability0.13Easy to test? Coverage, test quality, isolation, mocking
Compliance0.15Meets API and UI contracts? Conditional on scope (see below)

Total: 1.00

Compliance Dimension — Scope Rules

Compliance weight (0.15) applies differently based on project scope:

ScopeCompliance Covers
Backend-onlyAPI compliance (contracts, schema validation, versioning)
Frontend-onlyUI compliance (design system, a11y, responsive)
Full-stackAPI + UI compliance (split evenly: 0.075 each)

Composite Score

composite = sum(dimension_score * weight for each dimension)

Each dimension is scored 0-10 with decimal precision. Composite is also 0-10.

Grade Thresholds

ScoreGradeVerdictAction
9.0-10.0A+EXCELLENTShip it!
8.0-8.9AGOODReady for merge
7.0-7.9BGOODMinor improvements optional
6.0-6.9CADEQUATEConsider improvements
5.0-5.9DNEEDS WORKImprovements recommended
0.0-4.9FCRITICALDo not merge

Improvement Prioritization

Effort Scale (1-5)

PointsEffortDescription
1Trivial< 15 minutes, single file change
2Low15-60 minutes, few files
3Medium1-4 hours, moderate scope
4High4-8 hours, significant refactoring
5Major1+ days, architectural change

Impact Scale (1-5)

PointsImpactDescription
1MinimalCosmetic, no functional change
2LowMinor improvement, limited scope
3MediumNoticeable quality improvement
4HighSignificant quality or security gain
5CriticalBlocks shipping or fixes major vulnerability

Priority Formula

priority = impact / effort

Higher ratio = do first.

Quick Wins

Effort <= 2 AND Impact >= 4

Always highlight quick wins at the top of improvement suggestions. These are high-value changes that can be done fast.

Report Template

Verification Report Template

Copy this template and fill in results from parallel agent verification.

Quick Copy Template

# Feature Verification Report

**Date**: [TODAY'S DATE]
**Branch**: [branch-name]
**Feature**: [feature description]
**Reviewer**: Claude Code with 5 parallel subagents
**Verification Duration**: [X minutes]

---

## Summary

**Status**: [READY FOR MERGE | NEEDS ATTENTION | BLOCKED]

[1-2 sentence summary of verification results]

---

## Agent Results

### 1. Code Quality (code-quality-reviewer)

| Check | Tool | Exit Code | Errors | Warnings | Status |
|-------|------|-----------|--------|----------|--------|
| Backend Lint | Ruff | 0/1 | N | N | PASS/FAIL |
| Backend Types | ty | 0/1 | N | N | PASS/FAIL |
| Frontend Lint | Biome | 0/1 | N | N | PASS/FAIL |
| Frontend Types | tsc | 0/1 | N | N | PASS/FAIL |

**Pattern Compliance:**
- [ ] No `console.log` in production code
- [ ] No `any` types in TypeScript
- [ ] Exhaustive switches with `assertNever`
- [ ] SOLID principles followed
- [ ] Cyclomatic complexity < 10

**Findings:**
- [List any pattern violations]

---

### 2. Security Audit (security-auditor)

| Check | Tool | Critical | High | Medium | Low | Status |
|-------|------|----------|------|--------|-----|--------|
| JS Dependencies | npm audit | N | N | N | N | PASS/BLOCK |
| Python Dependencies | pip-audit | N | N | N | N | PASS/BLOCK |
| Secrets Scan | grep/gitleaks | N/A | N/A | N/A | N | PASS/BLOCK |

**OWASP Top 10 Compliance:**
- [ ] A01: Broken Access Control
- [ ] A02: Cryptographic Failures
- [ ] A03: Injection
- [ ] A04: Insecure Design
- [ ] A05: Security Misconfiguration
- [ ] A06: Vulnerable Components
- [ ] A07: Auth Failures
- [ ] A08: Data Integrity Failures
- [ ] A09: Logging Failures
- [ ] A10: SSRF

**Findings:**
- [List any security issues]

---

### 3. Test Coverage (test-generator)

| Suite | Total | Passed | Failed | Skipped | Coverage | Target | Status |
|-------|-------|--------|--------|---------|----------|--------|--------|
| Backend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| Backend Integration | N | N | N | N | X% | 70% | PASS/FAIL |
| Frontend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| E2E | N | N | N | N | N/A | N/A | PASS/FAIL |

**Test Quality:**
- [ ] Meaningful assertions (not just `assert result`)
- [ ] Edge cases covered (empty, error, timeout)
- [ ] No flaky tests (no sleep, no timing deps)
- [ ] MSW used for API mocking (not jest.mock)

**Coverage Gaps:**
- [List uncovered critical paths]

---

### 4. API Compliance (backend-system-architect)

| Check | Compliant | Issues |
|-------|-----------|--------|
| REST Conventions | Yes/No | [details] |
| Pydantic v2 Validation | Yes/No | [details] |
| RFC 9457 Error Handling | Yes/No | [details] |
| Async Timeout Protection | Yes/No | [details] |
| No N+1 Queries | Yes/No | [details] |

**Findings:**
- [List any API compliance issues]

---

### 5. UI Compliance (frontend-ui-developer)

| Check | Compliant | Issues |
|-------|-----------|--------|
| React 19 APIs (useOptimistic, useFormStatus, use()) | Yes/No | [details] |
| Zod Validation on API Responses | Yes/No | [details] |
| Exhaustive Type Checking | Yes/No | [details] |
| Skeleton Loading States | Yes/No | [details] |
| Prefetching on Navigation | Yes/No | [details] |
| WCAG 2.1 AA Accessibility | Yes/No | [details] |

**Findings:**
- [List any UI compliance issues]

---

## Quality Gates Summary

| Gate | Required | Actual | Status |
|------|----------|--------|--------|
| Test Coverage | >= 70% | X% | PASS/FAIL |
| Security Critical | 0 | N | PASS/FAIL |
| Security High | <= 5 | N | PASS/FAIL |
| Type Errors | 0 | N | PASS/FAIL |
| Lint Errors | 0 | N | PASS/FAIL |

**Overall Gate Status**: [ALL PASS | SOME FAIL]

---

## Blockers (Must Fix Before Merge)

1. [Blocker description with file:line reference]
2. [Blocker description with file:line reference]

---

## Suggestions (Non-Blocking)

1. [Suggestion for improvement]
2. [Suggestion for improvement]

---

## Evidence Artifacts

| Artifact | Location | Generated |
|----------|----------|-----------|
| Test Results | `/tmp/test_results.log` | [timestamp] |
| Coverage Report | `/tmp/coverage.json` | [timestamp] |
| Security Scan | `/tmp/security_audit.json` | [timestamp] |
| Lint Report | `/tmp/lint_results.log` | [timestamp] |
| E2E Screenshot | `/tmp/verification.png` | [timestamp] |

---

## Verification Metadata

- **Agents Used**: 5 (code-quality-reviewer, security-auditor, test-generator, backend-system-architect, frontend-ui-developer)
- **Parallel Execution**: Yes
- **Total Tool Calls**: ~N
- **Context Usage**: ~N tokens

Status Definitions

StatusEmojiMeaningAction Required
READY FOR MERGEGreenAll checks pass, no blockersApprove PR
NEEDS ATTENTIONYellowMinor issues foundReview suggestions, optionally fix
BLOCKEDRedCritical issues foundMust fix before merge

Severity Levels

LevelThresholdActionBlocks Merge
CriticalAnyFix immediatelyYES
High> 5Fix before mergeYES
Medium> 20Should fixNO (with justification)
Low> 50Nice to haveNO
InfoN/AInformationalNO

Agent Output JSON Schemas

code-quality-reviewer Output

{
  "linting": {"tool": "ruff|biome", "exit_code": 0, "errors": 0, "warnings": 0},
  "type_check": {"tool": "ty|tsc", "exit_code": 0, "errors": 0},
  "patterns": {"violations": [], "compliance": "PASS|FAIL"},
  "approval": {"status": "APPROVED|NEEDS_FIXES", "blockers": []}
}

security-auditor Output

{
  "scan_summary": {"files_scanned": 100, "vulnerabilities_found": 0},
  "critical": [],
  "high": [],
  "secrets_detected": [],
  "recommendations": [],
  "approval": {"status": "PASS|BLOCK", "blockers": []}
}

test-generator Output

{
  "coverage": {"current": 85, "target": 70, "passed": true},
  "test_summary": {"total": 100, "passed": 98, "failed": 2, "skipped": 0},
  "gaps": ["file:line - reason"],
  "quality_issues": [],
  "approval": {"status": "PASS|FAIL", "blockers": []}
}

backend-system-architect Output

{
  "api_compliance": {"rest_conventions": true, "issues": []},
  "validation": {"pydantic_v2": true, "issues": []},
  "error_handling": {"rfc9457": true, "issues": []},
  "async_safety": {"timeouts": true, "issues": []},
  "approval": {"status": "PASS|FAIL", "blockers": []}
}

frontend-ui-developer Output

{
  "react_19": {"apis_used": ["useOptimistic"], "missing": [], "compliant": true},
  "zod_validation": {"validated_endpoints": 10, "unvalidated": []},
  "type_safety": {"exhaustive_switches": true, "any_types": 0},
  "ux_patterns": {"skeletons": true, "prefetching": true},
  "accessibility": {"wcag_issues": []},
  "approval": {"status": "PASS|FAIL", "blockers": []}
}

Verification Checklist

Verification Checklist

Pre-flight checklist for comprehensive feature verification with parallel agents.

Pre-Verification Setup

Context Gathering

  • Run git diff main --stat to understand change scope
  • Run git log main..HEAD --oneline to see commit history
  • Identify affected domains (backend/frontend/both)
  • Check for any existing failing tests

Task Creation (CC 2.1.16)

  • Create parent verification task
  • Create subtasks for each agent domain
  • Set proper dependencies if needed

Agent Dispatch Checklist

Required Agents (Full-Stack)

AgentLaunchedCompletedStatus
code-quality-reviewer[ ][ ]Pending
security-auditor[ ][ ]Pending
test-generator[ ][ ]Pending
backend-system-architect[ ][ ]Pending
frontend-ui-developer[ ][ ]Pending

Optional Agents (Add as Needed)

ConditionAgentLaunched
AI/ML featuresllm-integrator[ ]
Performance-criticalfrontend-performance-engineer[ ]
Database changesdatabase-engineer[ ]

Quality Gate Checklist

Mandatory Gates

GateThresholdActualPass
Test Coverage>= 70%___%[ ]
Security Critical0___[ ]
Security High<= 5___[ ]
Type Errors0___[ ]
Lint Errors0___[ ]

Code Quality Gates

CheckStatus
No console.log in production[ ]
No any types[ ]
Exhaustive switches (assertNever)[ ]
Proper error handling[ ]
No hardcoded secrets[ ]

Frontend-Specific Gates (if applicable)

CheckStatus
React 19 APIs used[ ]
Zod validation on API responses[ ]
Skeleton loading states[ ]
Prefetching on links[ ]
WCAG 2.1 AA compliance[ ]

Backend-Specific Gates (if applicable)

CheckStatus
REST conventions followed[ ]
Pydantic v2 validation[ ]
RFC 9457 error handling[ ]
Async timeout protection[ ]
No N+1 queries[ ]

Evidence Collection

Required Evidence

  • Test results with exit code
  • Coverage report (JSON format)
  • Linting results
  • Type checking results
  • Security scan results

Optional Evidence

  • E2E test screenshots
  • Performance benchmarks
  • Bundle size analysis
  • Accessibility audit

Report Generation

Report Sections

  • Summary (READY/NEEDS ATTENTION/BLOCKED)
  • Agent Results (all 5 domains)
  • Quality Gates table
  • Blockers list (if any)
  • Suggestions list
  • Evidence links

Final Steps

  • Update all task statuses to completed
  • Store verification evidence in context
  • Generate final report markdown

Quick Reference: Agent Prompts

code-quality-reviewer

Focus: Lint, type check, anti-patterns, SOLID, complexity

security-auditor

Focus: Dependency audit, secrets, OWASP Top 10, rate limiting

test-generator

Focus: Coverage gaps, test quality, edge cases, flaky tests

backend-system-architect

Focus: REST, Pydantic v2, RFC 9457, async safety, N+1

frontend-ui-developer

Focus: React 19, Zod, exhaustive types, skeletons, prefetch, a11y

Troubleshooting

Agent Not Responding

  1. Check if agent was launched with run_in_background=True
  2. Verify agent name matches exactly
  3. Check for context window limits

Tests Failing

  1. Run tests locally first
  2. Check for missing dependencies
  3. Verify test database state
  4. Look for timing-dependent tests

Coverage Below Threshold

  1. Identify uncovered files
  2. Check for excluded patterns
  3. Focus on critical paths first

Verification Phases

Verification Phases — Detailed Workflow

Phase Overview

PhaseActivitiesOutput
1. Context GatheringGit diff, commit historyChanges summary
2. Parallel Agent Dispatch6 agents evaluate0-10 scores
3. Test ExecutionBackend + frontend testsCoverage data
4. Nuanced GradingComposite score calculationGrade (A-F)
5. Improvement SuggestionsEffort vs impact analysisPrioritized list
6. Alternative ComparisonCompare approaches (optional)Recommendation
7. Metrics TrackingTrend analysisHistorical data
8. Report CompilationEvidence artifactsFinal report

Phase 2: Parallel Agent Dispatch (6 Agents)

Launch ALL agents in ONE message with run_in_background=True and max_turns=25.

AgentFocusOutput
code-quality-reviewerLint, types, patternsQuality 0-10
security-auditorOWASP, secrets, CVEsSecurity 0-10
test-generatorCoverage, test qualityCoverage 0-10
backend-system-architectAPI design, asyncAPI 0-10
frontend-ui-developerReact 19, Zod, a11yUI 0-10
python-performance-engineerLatency, resources, scalingPerformance 0-10

Use python-performance-engineer for backend-focused verification or frontend-performance-engineer for frontend-focused verification. See Quality Model for Performance (0.12) and Scalability (0.10) weights.

See Grading Rubric for detailed scoring criteria.

Agent Teams Alternative

In Agent Teams mode, form a verification team where agents share findings and coordinate scoring:

TeamCreate(team_name="verify-{feature}", description="Verify {feature}")

Task(subagent_type="code-quality-reviewer", name="quality-verifier",
     team_name="verify-{feature}",
     prompt="""Verify code quality for {feature}. Score 0-10.
     When you find patterns that affect security, message security-verifier.
     When you find untested code paths, message test-verifier.
     Share your quality score with all teammates for composite calculation.""")

Task(subagent_type="security-auditor", name="security-verifier",
     team_name="verify-{feature}",
     prompt="""Security verification for {feature}. Score 0-10.
     When quality-verifier flags security-relevant patterns, investigate deeper.
     When you find vulnerabilities in API endpoints, message api-verifier.
     Share severity findings with test-verifier for test gap analysis.""")

Task(subagent_type="test-generator", name="test-verifier",
     team_name="verify-{feature}",
     prompt="""Verify test coverage for {feature}. Score 0-10.
     When quality-verifier or security-verifier flag untested paths, quantify the gap.
     Run existing tests and report coverage metrics.
     Message the lead with coverage data for composite scoring.""")

Task(subagent_type="backend-system-architect", name="api-verifier",
     team_name="verify-{feature}",
     prompt="""Verify API design and backend patterns for {feature}. Score 0-10.
     When security-verifier flags endpoint issues, validate and score.
     Share API compliance findings with ui-verifier for consistency check.""")

Task(subagent_type="frontend-ui-developer", name="ui-verifier",
     team_name="verify-{feature}",
     prompt="""Verify frontend implementation for {feature}. Score 0-10.
     When api-verifier shares API patterns, verify frontend matches.
     Check React 19 patterns, accessibility, and loading states.
     Share findings with quality-verifier for overall assessment.""")

# Conditional 6th agent — use python-performance-engineer for backend,
# frontend-performance-engineer for frontend
Task(subagent_type="python-performance-engineer", name="perf-verifier",
     team_name="verify-{feature}",
     prompt="""Verify performance and scalability for {feature}. Score 0-10.
     Assess latency, resource usage, caching, and scaling patterns.
     When security-verifier flags resource-intensive endpoints, profile them.
     Share performance findings with api-verifier and quality-verifier.""")

Team teardown after report compilation:

# After composite grading and report generation
SendMessage(type="shutdown_request", recipient="quality-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="security-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="test-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="api-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="ui-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="perf-verifier", content="Verification complete")
TeamDelete()

Fallback: If team formation fails, use standard Phase 2 Task spawns above.

Manual cleanup: If TeamDelete() doesn't terminate all agents, press Ctrl+F twice to force-kill remaining background agents.


Phase 4: Nuanced Grading

See Quality Model for scoring dimensions, weights, and grade interpretation. See Grading Rubric for detailed per-agent scoring criteria.


Phase 5: Improvement Suggestions

Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.


Phase 6: Alternative Comparison (Optional)

See Alternative Comparison for template.

Use when:

  • Multiple valid approaches exist
  • User asked "is this the best way?"
  • Major architectural decisions made

Phase 8: Report Compilation

See Report Template for full format.

# Feature Verification Report

**Composite Score: [N.N]/10** (Grade: [LETTER])

## Top Improvement Suggestions
| # | Suggestion | Effort | Impact | Priority |
|---|------------|--------|--------|----------|
| 1 | [highest] | [N] | [N] | [N.N] |

## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**

Checklists (1)

Verification Checklist

Verification Checklist

Quick checklist for comprehensive feature verification.

Grading Complete

  • All 5 dimensions rated (0-10 scale)
  • Weights applied correctly (20/25/20/20/15)
  • Composite score calculated
  • Grade letter assigned (A+ to F)

Evidence Collected

  • Test results with exit codes
  • Coverage report (JSON)
  • Security scan results
  • Lint/type check output
  • Evidence files linked in report

Improvements Documented

  • Each suggestion has effort estimate (1-5)
  • Each suggestion has impact estimate (1-5)
  • Priority calculated (Impact / Effort)
  • Quick wins identified (low effort, high impact)

Alternatives Considered

  • Current approach scored
  • At least one alternative evaluated
  • Migration cost estimated
  • Recommendation documented

Policy Compliance

  • No blocking rule violations
  • Warning rules acknowledged
  • Thresholds checked (composite, security, coverage)

Report Generated

  • All sections filled
  • Verdict assigned (Ready/Recommended/Blocked)
  • Tasks updated to completed
Edit on GitHub

Last updated on

On this page

Related SkillsVerify FeatureQuick StartSTEP 0: Verify User Intent with AskUserQuestionSTEP 0b: Select Orchestration ModeTask Management (CC 2.1.16)8-Phase WorkflowPhase 2 Agents (Quick Reference)Grading & ScoringEvidence & Test ExecutionPolicy-as-CodeReport FormatReferencesRulesRelated SkillsRules (2)Evidence Collection Patterns — HIGHEvidence Collection PatternsPhase 1: Context GatheringPhase 3: Parallel Test ExecutionPhase 7: Metrics TrackingPhase 8.5: Post-Verification FeedbackScoring Rubric — HIGHScoring RubricComposite ScoreGrade ThresholdsKey DecisionsImprovement SuggestionsBlocking RulesReferences (8)Alternative ComparisonAlternative ComparisonWhen to CompareComparison CriteriaFor Each AlternativeMigration CostDecision Matrix FormatOutput TemplateGrading RubricVerification Grading RubricScore LevelsDimension RubricsCode Quality (Weight: 20%)Security (Weight: 25%)Test Coverage (Weight: 20%)API Compliance (Weight: 20%)UI Compliance (Weight: 15%)Grade InterpretationOrchestration ModeOrchestration Mode SelectionEnvironment CheckDecision RulesAgent Teams vs Task ToolFallbackContext Window NotePolicy As CodePolicy-as-CodePolicy StructureRule DefinitionBlocker Rules (Must Pass)Warning Rules (Should Fix)Info Rules (Awareness)Threshold ConfigurationCustom RulesPolicy LocationQuality ModelQuality ModelScoring Dimensions (7 Unified)Compliance Dimension — Scope RulesComposite ScoreGrade ThresholdsImprovement PrioritizationEffort Scale (1-5)Impact Scale (1-5)Priority FormulaQuick WinsReport TemplateVerification Report TemplateQuick Copy TemplateStatus DefinitionsSeverity LevelsAgent Output JSON Schemascode-quality-reviewer Outputsecurity-auditor Outputtest-generator Outputbackend-system-architect Outputfrontend-ui-developer OutputVerification ChecklistVerification ChecklistPre-Verification SetupContext GatheringTask Creation (CC 2.1.16)Agent Dispatch ChecklistRequired Agents (Full-Stack)Optional Agents (Add as Needed)Quality Gate ChecklistMandatory GatesCode Quality GatesFrontend-Specific Gates (if applicable)Backend-Specific Gates (if applicable)Evidence CollectionRequired EvidenceOptional EvidenceReport GenerationReport SectionsFinal StepsQuick Reference: Agent Promptscode-quality-reviewersecurity-auditortest-generatorbackend-system-architectfrontend-ui-developerTroubleshootingAgent Not RespondingTests FailingCoverage Below ThresholdVerification PhasesVerification Phases — Detailed WorkflowPhase OverviewPhase 2: Parallel Agent Dispatch (6 Agents)Agent Teams AlternativePhase 4: Nuanced GradingPhase 5: Improvement SuggestionsPhase 6: Alternative Comparison (Optional)Phase 8: Report CompilationChecklists (1)Verification ChecklistVerification ChecklistGrading CompleteEvidence CollectedImprovements DocumentedAlternatives ConsideredPolicy ComplianceReport Generated