Comprehensive verification with parallel test agents. Use when verifying implementations or validating changes.

Command high

Verify Feature

Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.

Quick Start

/ork:verify authentication flow
/ork:verify user profile feature
/ork:verify --scope=backend database migrations

Opus 4.6: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.

STEP 0: Verify User Intent with AskUserQuestion

BEFORE creating tasks, clarify verification scope:

AskUserQuestion(
  questions=[{
    "question": "What scope for this verification?",
    "header": "Scope",
    "options": [
      {"label": "Full verification (Recommended)", "description": "All tests + security + code quality + grades"},
      {"label": "Tests only", "description": "Run unit + integration + e2e tests"},
      {"label": "Security audit", "description": "Focus on security vulnerabilities"},
      {"label": "Code quality", "description": "Lint, types, complexity analysis"},
      {"label": "Quick check", "description": "Just run tests, skip detailed analysis"}
    ],
    "multiSelect": false
  }]
)

Based on answer, adjust workflow:

Full verification: All 8 phases, all 6 parallel agents
Tests only: Skip phases 2 (security), 5 (UI/UX analysis)
Security audit: Focus on security-auditor agent
Code quality: Focus on code-quality-reviewer agent
Quick check: Run tests only, skip grading and suggestions

STEP 0b: Select Orchestration Mode

See Orchestration Mode for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.

Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.

Task Management (CC 2.1.16)

# Create main verification task
TaskCreate(
  subject="Verify [feature-name] implementation",
  description="Comprehensive verification with nuanced grading",
  activeForm="Verifying [feature-name] implementation"
)

# Create subtasks for 8-phase process
phases = ["Run code quality checks", "Execute security audit",
          "Verify test coverage", "Validate API", "Check UI/UX",
          "Calculate grades", "Generate suggestions", "Compile report"]
for phase in phases:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

8-Phase Workflow

See Verification Phases for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.

Phase	Activities	Output
1. Context Gathering	Git diff, commit history	Changes summary
2. Parallel Agent Dispatch	6 agents evaluate	0-10 scores
3. Test Execution	Backend + frontend tests	Coverage data
4. Nuanced Grading	Composite score calculation	Grade (A-F)
5. Improvement Suggestions	Effort vs impact analysis	Prioritized list
6. Alternative Comparison	Compare approaches (optional)	Recommendation
7. Metrics Tracking	Trend analysis	Historical data
8. Report Compilation	Evidence artifacts	Final report

Phase 2 Agents (Quick Reference)

Agent	Focus	Output
code-quality-reviewer	Lint, types, patterns	Quality 0-10
security-auditor	OWASP, secrets, CVEs	Security 0-10
test-generator	Coverage, test quality	Coverage 0-10
backend-system-architect	API design, async	API 0-10
frontend-ui-developer	React 19, Zod, a11y	UI 0-10
python-performance-engineer	Latency, resources, scaling	Performance 0-10

Launch ALL agents in ONE message with run_in_background=True and max_turns=25.

Grading & Scoring

See Scoring Rubric for composite formula, grade thresholds, verdict criteria, and blocking rules. See Quality Model for dimension weights. See Grading Rubric for per-agent scoring criteria.

Evidence & Test Execution

See Evidence Collection for git commands, test execution patterns, metrics tracking, and post-verification feedback.

Policy-as-Code

See Policy-as-Code for configuration.

Define verification rules in .claude/policies/verification-policy.json:

{
  "thresholds": {
    "composite_minimum": 6.0,
    "security_minimum": 7.0,
    "coverage_minimum": 70
  },
  "blocking_rules": [
    {"dimension": "security", "below": 5.0, "action": "block"}
  ]
}

Report Format

See Report Template for full format. Summary:

# Feature Verification Report

**Composite Score: [N.N]/10** (Grade: [LETTER])

## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**

References

Verification Phases -- 8-phase workflow, agent spawn definitions, Agent Teams mode
Quality Model -- Scoring dimensions and weights
Grading Rubric -- Per-agent scoring criteria
Report Template -- Full report format
Alternative Comparison -- Approach comparison template
Orchestration Mode -- Agent Teams vs Task Tool
Policy-as-Code -- Verification policy configuration
Verification Checklist -- Pre-flight checklist

Rules

Scoring Rubric -- Composite scoring, grades, verdicts
Evidence Collection -- Evidence gathering and test patterns

ork:implement - Full implementation with verification
ork:review-pr - PR-specific verification
run-tests - Detailed test execution
ork:quality-gates - Quality gate patterns

Version: 3.1.0 (February 2026)

Evidence Collection Patterns

Phase 1: Context Gathering

Run these commands in parallel in ONE message:

git diff main --stat
git log main..HEAD --oneline
git diff main --name-only | sort -u

Phase 3: Parallel Test Execution

Run backend and frontend tests in parallel:

# PARALLEL - Backend and frontend
cd backend && poetry run pytest tests/ -v --cov=app --cov-report=json
cd frontend && npm run test -- --coverage

Phase 7: Metrics Tracking

Store verification metrics in memory for trend analysis:

mcp__memory__create_entities(entities=[{
  "name": "verification-{date}-{feature}",
  "entityType": "VerificationMetrics",
  "observations": [f"composite_score: {score}", ...]
}])

Query trends: mcp__memory__search_nodes(query="VerificationMetrics")

Phase 8.5: Post-Verification Feedback

After report compilation, send scores to metrics-architect for KPI baseline tracking:

Task(subagent_type="metrics-architect", run_in_background=True, max_turns=15,
     prompt=f"""Receive verification scores for {feature}:

Composite: {composite_score}/10 (Grade: {grade})
Dimensional breakdown:
- Correctness: {scores['correctness']}/10
- Maintainability: {scores['maintainability']}/10
- Performance: {scores['performance']}/10
- Security: {scores['security']}/10
- Scalability: {scores['scalability']}/10
- Testability: {scores['testability']}/10
- Compliance: {scores['compliance']}/10

Update KPI baselines with these scores. Store trend data in memory
for historical comparison. Flag any dimensions that dropped below
their historical average.""")

Scoring Rubric

Composite Score

Each agent produces a 0-10 score with decimals for nuance. The composite score is a weighted sum using the weights from Quality Model.

Grade Thresholds

Grade	Score Range	Verdict
A	9.0-10.0	READY FOR MERGE
B	7.0-8.9	READY FOR MERGE
C	5.0-6.9	IMPROVEMENTS RECOMMENDED
D	3.0-4.9	BLOCKED
F	0.0-2.9	BLOCKED

Key Decisions

Decision	Choice	Rationale
Scoring scale	0-10 with decimals	Nuanced, not binary
Improvement priority	Impact / Effort ratio	Do high-value first
Alternative comparison	Optional phase	Only when multiple valid approaches
Metrics persistence	Memory MCP	Track trends over time

Improvement Suggestions

Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.

Blocking Rules

Verification can be blocked by policy-as-code rules. See Policy-as-Code for configuration of composite minimums, dimension minimums, and blocking rules.

References (8)

Alternative Comparison

Evaluate current implementation against alternative approaches.

When to Compare

Multiple valid architectures exist
User asks "is this the best way?"
Major patterns were chosen (ORM vs raw SQL, REST vs GraphQL)
Performance/scalability concerns raised

Comparison Criteria

For Each Alternative

Criterion	Weight	Description
Effort	30%	Implementation complexity (1-5 scale)
Risk	25%	Technical and operational risk (1-5 scale)
Benefit	45%	Value delivered, performance, maintainability (1-5 scale)

Migration Cost

Factor	Estimate
Code changes	Files/lines affected
Data migration	Schema changes, backfill
Testing	New test coverage needed
Rollback risk	Reversibility

Decision Matrix Format

Approach	Effort	Risk	Benefit	Score
Current	N	N	N	(E0.3 + R0.25 + B*0.45)
Alt A	N	N	N	calculated
Alt B	N	N	N	calculated

Note: Higher effort and risk are bad (invert for scoring), higher benefit is good.

Recommendation Formula:

Score = (5 - Effort) * 0.3 + (5 - Risk) * 0.25 + Benefit * 0.45

Output Template

### Alternative Comparison: [Topic]

**Current Approach:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]

**Alternative A:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]
- Migration effort: [1-5]

**Recommendation:** [Keep current / Switch to Alt A]
**Justification:** [1-2 sentences]

Verification Grading Rubric

0-10 scoring criteria for each verification dimension.

Score Levels

Range	Level	Description
0-3	Poor	Critical issues, blocks merge
4-6	Adequate	Functional but needs improvement
7-9	Good	Ready for merge, minor suggestions
10	Excellent	Exemplary, reference quality

Dimension Rubrics

Code Quality (Weight: 20%)

Score	Criteria
10	Zero lint errors/warnings, strict types, exemplary patterns
8-9	Zero errors, < 5 warnings, minimal `any`, good patterns
6-7	1-3 errors, some warnings, acceptable patterns
4-5	4-10 errors, pattern issues, needs refactoring
1-3	Many errors, poor patterns, high complexity
0	Lint/type check fails to run

Security (Weight: 25%)

Score	Criteria
10	No vulnerabilities, all OWASP compliant, secure by design
8-9	No critical/high, all OWASP, excellent practices
6-7	No critical, 1-2 high, most OWASP compliant
4-5	No critical, 3-5 high, some gaps
1-3	1+ critical or many high vulnerabilities
0	Multiple critical, secrets exposed

Test Coverage (Weight: 20%)

Score	Criteria
10	>= 90% coverage, meaningful assertions, edge cases
8-9	>= 80% coverage, good assertions, critical paths
6-7	>= 70% coverage (target), basic assertions
4-5	50-69% coverage
1-3	30-49% coverage
0	< 30% coverage or tests fail to run

API Compliance (Weight: 20%)

Score	Criteria
10	Perfect REST, RFC 9457 errors, documented, no N+1
8-9	Good REST, proper validation, timeout protection
6-7	Acceptable API, minor inconsistencies
4-5	Several convention violations
1-3	Poor API design, missing validation
0	Broken or insecure endpoints

UI Compliance (Weight: 15%)

Score	Criteria
10	React 19 APIs, full Zod, WCAG AAA, exhaustive types
8-9	Modern patterns, good validation, WCAG AA
6-7	Acceptable patterns, some validation
4-5	Dated patterns, missing validation
1-3	Poor practices, accessibility issues
0	Broken or inaccessible components

Grade Interpretation

Composite	Grade	Verdict
9.0-10.0	A+	Ship it
8.0-8.9	A	Ready for merge
7.0-7.9	B	Minor improvements optional
6.0-6.9	C	Consider improvements
5.0-5.9	D	Improvements recommended
< 5.0	F	Do not merge

Orchestration Mode

Orchestration Mode Selection

Shared logic for choosing between Agent Teams and Task tool orchestration in assess/verify skills.

Environment Check

import os
teams_available = os.environ.get("CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS") is not None
force_task_tool = os.environ.get("ORCHESTKIT_FORCE_TASK_TOOL") == "1"

if force_task_tool or not teams_available:
    mode = "task_tool"
else:
    # Teams available — use for full multi-dimensional work
    mode = "agent_teams" if scope == "full" else "task_tool"

Decision Rules

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS set --> Agent Teams mode (for full assessment/verification)
Flag not set --> Task tool mode (default)
Quick/single-dimension scope --> Task tool (regardless of flag)
ORCHESTKIT_FORCE_TASK_TOOL=1 --> Task tool (override)

Agent Teams vs Task Tool

Aspect	Task Tool (Star)	Agent Teams (Mesh)
Topology	All agents report to lead	Agents communicate with each other
Finding correlation	Lead cross-references after completion	Agents share findings in real-time
Cross-domain overlap	Independent scoring	Agents alert each other about overlapping concerns
Cost	~200K tokens	~500K tokens
Best for	Focused/single-dimension work	Full multi-dimensional assessment/verification

Fallback

If Agent Teams encounters issues mid-execution, fall back to Task tool for remaining work. This is safe because both modes produce the same output format (dimensional scores 0-10).

Context Window Note

For full codebase work (>20 files), use the 1M context window to avoid agent context exhaustion. On 200K context, scope discovery should limit files to prevent overflow.

Policy-as-Code

Define verification policies as machine-readable configuration.

Policy Structure

version: "1.0"
name: policy-name
description: What this policy enforces

thresholds:
  composite_minimum: 6.0
  coverage_minimum: 70

rules:
  blockers: []    # Fail verification
  warnings: []    # Note but continue
  info: []        # Informational only

Rule Definition

Blocker Rules (Must Pass)

blockers:
  - dimension: security
    condition: below
    value: 5.0
    message: "Security score below minimum"

  - check: critical_vulnerabilities
    condition: above
    value: 0
    message: "Critical vulnerabilities found"

  - check: type_errors
    condition: above
    value: 0
    message: "TypeScript errors must be zero"

Warning Rules (Should Fix)

warnings:
  - dimension: code_quality
    condition: below
    value: 7.0
    message: "Code quality could be improved"

  - check: test_coverage
    condition: below
    value: 80
    message: "Coverage below recommended 80%"

Info Rules (Awareness)

info:
  - check: todo_count
    condition: above
    value: 5
    message: "Multiple TODOs found in code"

Threshold Configuration

Threshold	Type	Description
composite_minimum	float	Overall score minimum (0-10)
coverage_minimum	int	Test coverage percentage
critical_vulnerabilities	int	Max critical vulns (0)
high_vulnerabilities	int	Max high vulns
lint_errors	int	Max lint errors (0)
type_errors	int	Max type errors (0)

Custom Rules

custom_rules:
  - name: no_console_log
    pattern: "console\\.log"
    file_glob: "**/*.ts"
    exclude: ["**/*.test.ts"]
    severity: warning
    message: "Remove console.log from production"

Policy Location

Store at: .claude/policies/verification-policy.yaml

Multiple policies: .claude/policies/\{name\}-policy.yaml

Quality Model

Quality Model

Canonical scoring reference for assess and verify skills. Defines unified dimensions, weights, grade thresholds, and improvement prioritization.

Scoring Dimensions (7 Unified)

Dimension	Weight	What It Measures
Correctness	0.15	Does it work correctly? Functional accuracy, edge cases handled
Maintainability	0.15	Easy to understand and modify? Readability, complexity, patterns
Performance	0.12	Efficient execution? No bottlenecks, resource usage, latency
Security	0.20	Follows security best practices? OWASP, secrets, CVEs, input validation
Scalability	0.10	Handles growth? Load patterns, data volume, horizontal scaling
Testability	0.13	Easy to test? Coverage, test quality, isolation, mocking
Compliance	0.15	Meets API and UI contracts? Conditional on scope (see below)

Total: 1.00

Compliance Dimension — Scope Rules

Compliance weight (0.15) applies differently based on project scope:

Scope	Compliance Covers
Backend-only	API compliance (contracts, schema validation, versioning)
Frontend-only	UI compliance (design system, a11y, responsive)
Full-stack	API + UI compliance (split evenly: 0.075 each)

Composite Score

composite = sum(dimension_score * weight for each dimension)

Each dimension is scored 0-10 with decimal precision. Composite is also 0-10.

Grade Thresholds

Score	Grade	Verdict	Action
9.0-10.0	A+	EXCELLENT	Ship it!
8.0-8.9	A	GOOD	Ready for merge
7.0-7.9	B	GOOD	Minor improvements optional
6.0-6.9	C	ADEQUATE	Consider improvements
5.0-5.9	D	NEEDS WORK	Improvements recommended
0.0-4.9	F	CRITICAL	Do not merge

Improvement Prioritization

Effort Scale (1-5)

Points	Effort	Description
1	Trivial	< 15 minutes, single file change
2	Low	15-60 minutes, few files
3	Medium	1-4 hours, moderate scope
4	High	4-8 hours, significant refactoring
5	Major	1+ days, architectural change

Impact Scale (1-5)

Points	Impact	Description
1	Minimal	Cosmetic, no functional change
2	Low	Minor improvement, limited scope
3	Medium	Noticeable quality improvement
4	High	Significant quality or security gain
5	Critical	Blocks shipping or fixes major vulnerability

Quick Wins

Effort <= 2 AND Impact >= 4

Always highlight quick wins at the top of improvement suggestions. These are high-value changes that can be done fast.

Verification Report Template

Copy this template and fill in results from parallel agent verification.

Quick Copy Template

# Feature Verification Report

**Date**: [TODAY'S DATE]
**Branch**: [branch-name]
**Feature**: [feature description]
**Reviewer**: Claude Code with 5 parallel subagents
**Verification Duration**: [X minutes]

---

## Summary

**Status**: [READY FOR MERGE | NEEDS ATTENTION | BLOCKED]

[1-2 sentence summary of verification results]

---

## Agent Results

### 1. Code Quality (code-quality-reviewer)

| Check | Tool | Exit Code | Errors | Warnings | Status |
|-------|------|-----------|--------|----------|--------|
| Backend Lint | Ruff | 0/1 | N | N | PASS/FAIL |
| Backend Types | ty | 0/1 | N | N | PASS/FAIL |
| Frontend Lint | Biome | 0/1 | N | N | PASS/FAIL |
| Frontend Types | tsc | 0/1 | N | N | PASS/FAIL |

**Pattern Compliance:**
- [ ] No `console.log` in production code
- [ ] No `any` types in TypeScript
- [ ] Exhaustive switches with `assertNever`
- [ ] SOLID principles followed
- [ ] Cyclomatic complexity < 10

**Findings:**
- [List any pattern violations]

---

### 2. Security Audit (security-auditor)

| Check | Tool | Critical | High | Medium | Low | Status |
|-------|------|----------|------|--------|-----|--------|
| JS Dependencies | npm audit | N | N | N | N | PASS/BLOCK |
| Python Dependencies | pip-audit | N | N | N | N | PASS/BLOCK |
| Secrets Scan | grep/gitleaks | N/A | N/A | N/A | N | PASS/BLOCK |

**OWASP Top 10 Compliance:**
- [ ] A01: Broken Access Control
- [ ] A02: Cryptographic Failures
- [ ] A03: Injection
- [ ] A04: Insecure Design
- [ ] A05: Security Misconfiguration
- [ ] A06: Vulnerable Components
- [ ] A07: Auth Failures
- [ ] A08: Data Integrity Failures
- [ ] A09: Logging Failures
- [ ] A10: SSRF

**Findings:**
- [List any security issues]

---

### 3. Test Coverage (test-generator)

| Suite | Total | Passed | Failed | Skipped | Coverage | Target | Status |
|-------|-------|--------|--------|---------|----------|--------|--------|
| Backend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| Backend Integration | N | N | N | N | X% | 70% | PASS/FAIL |
| Frontend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| E2E | N | N | N | N | N/A | N/A | PASS/FAIL |

**Test Quality:**
- [ ] Meaningful assertions (not just `assert result`)
- [ ] Edge cases covered (empty, error, timeout)
- [ ] No flaky tests (no sleep, no timing deps)
- [ ] MSW used for API mocking (not jest.mock)

**Coverage Gaps:**
- [List uncovered critical paths]

---

### 4. API Compliance (backend-system-architect)

| Check | Compliant | Issues |
|-------|-----------|--------|
| REST Conventions | Yes/No | [details] |
| Pydantic v2 Validation | Yes/No | [details] |
| RFC 9457 Error Handling | Yes/No | [details] |
| Async Timeout Protection | Yes/No | [details] |
| No N+1 Queries | Yes/No | [details] |

**Findings:**
- [List any API compliance issues]

---

### 5. UI Compliance (frontend-ui-developer)

| Check | Compliant | Issues |
|-------|-----------|--------|
| React 19 APIs (useOptimistic, useFormStatus, use()) | Yes/No | [details] |
| Zod Validation on API Responses | Yes/No | [details] |
| Exhaustive Type Checking | Yes/No | [details] |
| Skeleton Loading States | Yes/No | [details] |
| Prefetching on Navigation | Yes/No | [details] |
| WCAG 2.1 AA Accessibility | Yes/No | [details] |

**Findings:**
- [List any UI compliance issues]

---

## Quality Gates Summary

| Gate | Required | Actual | Status |
|------|----------|--------|--------|
| Test Coverage | >= 70% | X% | PASS/FAIL |
| Security Critical | 0 | N | PASS/FAIL |
| Security High | <= 5 | N | PASS/FAIL |
| Type Errors | 0 | N | PASS/FAIL |
| Lint Errors | 0 | N | PASS/FAIL |

**Overall Gate Status**: [ALL PASS | SOME FAIL]

---

## Blockers (Must Fix Before Merge)

1. [Blocker description with file:line reference]
2. [Blocker description with file:line reference]

---

## Suggestions (Non-Blocking)

1. [Suggestion for improvement]
2. [Suggestion for improvement]

---

## Evidence Artifacts

| Artifact | Location | Generated |
|----------|----------|-----------|
| Test Results | `/tmp/test_results.log` | [timestamp] |
| Coverage Report | `/tmp/coverage.json` | [timestamp] |
| Security Scan | `/tmp/security_audit.json` | [timestamp] |
| Lint Report | `/tmp/lint_results.log` | [timestamp] |
| E2E Screenshot | `/tmp/verification.png` | [timestamp] |

---

## Verification Metadata

- **Agents Used**: 5 (code-quality-reviewer, security-auditor, test-generator, backend-system-architect, frontend-ui-developer)
- **Parallel Execution**: Yes
- **Total Tool Calls**: ~N
- **Context Usage**: ~N tokens

Status Definitions

Status	Emoji	Meaning	Action Required
READY FOR MERGE	Green	All checks pass, no blockers	Approve PR
NEEDS ATTENTION	Yellow	Minor issues found	Review suggestions, optionally fix
BLOCKED	Red	Critical issues found	Must fix before merge

Severity Levels

Level	Threshold	Action	Blocks Merge
Critical	Any	Fix immediately	YES
High	> 5	Fix before merge	YES
Medium	> 20	Should fix	NO (with justification)
Low	> 50	Nice to have	NO
Info	N/A	Informational	NO

Agent Output JSON Schemas

code-quality-reviewer Output

{
  "linting": {"tool": "ruff|biome", "exit_code": 0, "errors": 0, "warnings": 0},
  "type_check": {"tool": "ty|tsc", "exit_code": 0, "errors": 0},
  "patterns": {"violations": [], "compliance": "PASS|FAIL"},
  "approval": {"status": "APPROVED|NEEDS_FIXES", "blockers": []}
}

security-auditor Output

{
  "scan_summary": {"files_scanned": 100, "vulnerabilities_found": 0},
  "critical": [],
  "high": [],
  "secrets_detected": [],
  "recommendations": [],
  "approval": {"status": "PASS|BLOCK", "blockers": []}
}

test-generator Output

{
  "coverage": {"current": 85, "target": 70, "passed": true},
  "test_summary": {"total": 100, "passed": 98, "failed": 2, "skipped": 0},
  "gaps": ["file:line - reason"],
  "quality_issues": [],
  "approval": {"status": "PASS|FAIL", "blockers": []}
}

backend-system-architect Output

{
  "api_compliance": {"rest_conventions": true, "issues": []},
  "validation": {"pydantic_v2": true, "issues": []},
  "error_handling": {"rfc9457": true, "issues": []},
  "async_safety": {"timeouts": true, "issues": []},
  "approval": {"status": "PASS|FAIL", "blockers": []}
}

frontend-ui-developer Output

{
  "react_19": {"apis_used": ["useOptimistic"], "missing": [], "compliant": true},
  "zod_validation": {"validated_endpoints": 10, "unvalidated": []},
  "type_safety": {"exhaustive_switches": true, "any_types": 0},
  "ux_patterns": {"skeletons": true, "prefetching": true},
  "accessibility": {"wcag_issues": []},
  "approval": {"status": "PASS|FAIL", "blockers": []}
}

Verification Checklist

Pre-flight checklist for comprehensive feature verification with parallel agents.

Pre-Verification Setup

Context Gathering

Run git diff main --stat to understand change scope
Run git log main..HEAD --oneline to see commit history
Identify affected domains (backend/frontend/both)
Check for any existing failing tests

Task Creation (CC 2.1.16)

Create parent verification task
Create subtasks for each agent domain
Set proper dependencies if needed

Agent Dispatch Checklist

Required Agents (Full-Stack)

Agent	Launched	Completed	Status
code-quality-reviewer	[ ]	[ ]	Pending
security-auditor	[ ]	[ ]	Pending
test-generator	[ ]	[ ]	Pending
backend-system-architect	[ ]	[ ]	Pending
frontend-ui-developer	[ ]	[ ]	Pending

Optional Agents (Add as Needed)

Condition	Agent	Launched
AI/ML features	llm-integrator	[ ]
Performance-critical	frontend-performance-engineer	[ ]
Database changes	database-engineer	[ ]

Quality Gate Checklist

Mandatory Gates

Gate	Threshold	Actual	Pass
Test Coverage	>= 70%	___%	[ ]
Security Critical	0	___	[ ]
Security High	<= 5	___	[ ]
Type Errors	0	___	[ ]
Lint Errors	0	___	[ ]

Code Quality Gates

Check	Status
No console.log in production	[ ]
No `any` types	[ ]
Exhaustive switches (assertNever)	[ ]
Proper error handling	[ ]
No hardcoded secrets	[ ]

Frontend-Specific Gates (if applicable)

Check	Status
React 19 APIs used	[ ]
Zod validation on API responses	[ ]
Skeleton loading states	[ ]
Prefetching on links	[ ]
WCAG 2.1 AA compliance	[ ]

Backend-Specific Gates (if applicable)

Check	Status
REST conventions followed	[ ]
Pydantic v2 validation	[ ]
RFC 9457 error handling	[ ]
Async timeout protection	[ ]
No N+1 queries	[ ]

Evidence Collection

Optional Evidence

E2E test screenshots
Performance benchmarks
Bundle size analysis
Accessibility audit

Report Generation

Final Steps

Update all task statuses to completed
Store verification evidence in context
Generate final report markdown

Quick Reference: Agent Prompts

code-quality-reviewer

Focus: Lint, type check, anti-patterns, SOLID, complexity

security-auditor

Focus: Dependency audit, secrets, OWASP Top 10, rate limiting

test-generator

Focus: Coverage gaps, test quality, edge cases, flaky tests

backend-system-architect

Focus: REST, Pydantic v2, RFC 9457, async safety, N+1

frontend-ui-developer

Focus: React 19, Zod, exhaustive types, skeletons, prefetch, a11y

Troubleshooting

Agent Not Responding

Check if agent was launched with run_in_background=True
Verify agent name matches exactly
Check for context window limits

Tests Failing

Run tests locally first
Check for missing dependencies
Verify test database state
Look for timing-dependent tests

Coverage Below Threshold

Identify uncovered files
Check for excluded patterns
Focus on critical paths first

Verification Phases — Detailed Workflow

Phase Overview

Phase	Activities	Output
1. Context Gathering	Git diff, commit history	Changes summary
2. Parallel Agent Dispatch	6 agents evaluate	0-10 scores
3. Test Execution	Backend + frontend tests	Coverage data
4. Nuanced Grading	Composite score calculation	Grade (A-F)
5. Improvement Suggestions	Effort vs impact analysis	Prioritized list
6. Alternative Comparison	Compare approaches (optional)	Recommendation
7. Metrics Tracking	Trend analysis	Historical data
8. Report Compilation	Evidence artifacts	Final report

Phase 2: Parallel Agent Dispatch (6 Agents)

Launch ALL agents in ONE message with run_in_background=True and max_turns=25.

Agent	Focus	Output
code-quality-reviewer	Lint, types, patterns	Quality 0-10
security-auditor	OWASP, secrets, CVEs	Security 0-10
test-generator	Coverage, test quality	Coverage 0-10
backend-system-architect	API design, async	API 0-10
frontend-ui-developer	React 19, Zod, a11y	UI 0-10
python-performance-engineer	Latency, resources, scaling	Performance 0-10

Use python-performance-engineer for backend-focused verification or frontend-performance-engineer for frontend-focused verification. See Quality Model for Performance (0.12) and Scalability (0.10) weights.

See Grading Rubric for detailed scoring criteria.

Agent Teams Alternative

In Agent Teams mode, form a verification team where agents share findings and coordinate scoring:

TeamCreate(team_name="verify-{feature}", description="Verify {feature}")

Task(subagent_type="code-quality-reviewer", name="quality-verifier",
     team_name="verify-{feature}",
     prompt="""Verify code quality for {feature}. Score 0-10.
     When you find patterns that affect security, message security-verifier.
     When you find untested code paths, message test-verifier.
     Share your quality score with all teammates for composite calculation.""")

Task(subagent_type="security-auditor", name="security-verifier",
     team_name="verify-{feature}",
     prompt="""Security verification for {feature}. Score 0-10.
     When quality-verifier flags security-relevant patterns, investigate deeper.
     When you find vulnerabilities in API endpoints, message api-verifier.
     Share severity findings with test-verifier for test gap analysis.""")

Task(subagent_type="test-generator", name="test-verifier",
     team_name="verify-{feature}",
     prompt="""Verify test coverage for {feature}. Score 0-10.
     When quality-verifier or security-verifier flag untested paths, quantify the gap.
     Run existing tests and report coverage metrics.
     Message the lead with coverage data for composite scoring.""")

Task(subagent_type="backend-system-architect", name="api-verifier",
     team_name="verify-{feature}",
     prompt="""Verify API design and backend patterns for {feature}. Score 0-10.
     When security-verifier flags endpoint issues, validate and score.
     Share API compliance findings with ui-verifier for consistency check.""")

Task(subagent_type="frontend-ui-developer", name="ui-verifier",
     team_name="verify-{feature}",
     prompt="""Verify frontend implementation for {feature}. Score 0-10.
     When api-verifier shares API patterns, verify frontend matches.
     Check React 19 patterns, accessibility, and loading states.
     Share findings with quality-verifier for overall assessment.""")

# Conditional 6th agent — use python-performance-engineer for backend,
# frontend-performance-engineer for frontend
Task(subagent_type="python-performance-engineer", name="perf-verifier",
     team_name="verify-{feature}",
     prompt="""Verify performance and scalability for {feature}. Score 0-10.
     Assess latency, resource usage, caching, and scaling patterns.
     When security-verifier flags resource-intensive endpoints, profile them.
     Share performance findings with api-verifier and quality-verifier.""")

Team teardown after report compilation:

# After composite grading and report generation
SendMessage(type="shutdown_request", recipient="quality-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="security-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="test-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="api-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="ui-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="perf-verifier", content="Verification complete")
TeamDelete()

Fallback: If team formation fails, use standard Phase 2 Task spawns above.

Manual cleanup: If TeamDelete() doesn't terminate all agents, press Ctrl+F twice to force-kill remaining background agents.

Multiple valid approaches exist
User asked "is this the best way?"
Major architectural decisions made

Phase 8: Report Compilation

See Report Template for full format.

# Feature Verification Report

**Composite Score: [N.N]/10** (Grade: [LETTER])

## Top Improvement Suggestions
| # | Suggestion | Effort | Impact | Priority |
|---|------------|--------|--------|----------|
| 1 | [highest] | [N] | [N] | [N.N] |

## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**

All 5 dimensions rated (0-10 scale)
Weights applied correctly (20/25/20/20/15)
Composite score calculated
Grade letter assigned (A+ to F)

Evidence Collected

Improvements Documented

Each suggestion has effort estimate (1-5)
Each suggestion has impact estimate (1-5)
Priority calculated (Impact / Effort)
Quick wins identified (low effort, high impact)

Alternatives Considered

Current approach scored
At least one alternative evaluated
Migration cost estimated
Recommendation documented

Policy Compliance

No blocking rule violations
Warning rules acknowledged
Thresholds checked (composite, security, coverage)

Report Generated

All sections filled
Verdict assigned (Ready/Recommended/Blocked)
Tasks updated to completed

Verify

On this page