Verify
Comprehensive verification with parallel test agents. Use when verifying implementations or validating changes.
Related Skills
Verify Feature
Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.
Quick Start
/ork:verify authentication flow
/ork:verify user profile feature
/ork:verify --scope=backend database migrationsOpus 4.6: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.
STEP 0: Verify User Intent with AskUserQuestion
BEFORE creating tasks, clarify verification scope:
AskUserQuestion(
questions=[{
"question": "What scope for this verification?",
"header": "Scope",
"options": [
{"label": "Full verification (Recommended)", "description": "All tests + security + code quality + grades"},
{"label": "Tests only", "description": "Run unit + integration + e2e tests"},
{"label": "Security audit", "description": "Focus on security vulnerabilities"},
{"label": "Code quality", "description": "Lint, types, complexity analysis"},
{"label": "Quick check", "description": "Just run tests, skip detailed analysis"}
],
"multiSelect": false
}]
)Based on answer, adjust workflow:
- Full verification: All 8 phases, all 6 parallel agents
- Tests only: Skip phases 2 (security), 5 (UI/UX analysis)
- Security audit: Focus on security-auditor agent
- Code quality: Focus on code-quality-reviewer agent
- Quick check: Run tests only, skip grading and suggestions
STEP 0b: Select Orchestration Mode
See Orchestration Mode for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.
Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.
Task Management (CC 2.1.16)
# Create main verification task
TaskCreate(
subject="Verify [feature-name] implementation",
description="Comprehensive verification with nuanced grading",
activeForm="Verifying [feature-name] implementation"
)
# Create subtasks for 8-phase process
phases = ["Run code quality checks", "Execute security audit",
"Verify test coverage", "Validate API", "Check UI/UX",
"Calculate grades", "Generate suggestions", "Compile report"]
for phase in phases:
TaskCreate(subject=phase, activeForm=f"{phase}ing")8-Phase Workflow
See Verification Phases for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.
| Phase | Activities | Output |
|---|---|---|
| 1. Context Gathering | Git diff, commit history | Changes summary |
| 2. Parallel Agent Dispatch | 6 agents evaluate | 0-10 scores |
| 3. Test Execution | Backend + frontend tests | Coverage data |
| 4. Nuanced Grading | Composite score calculation | Grade (A-F) |
| 5. Improvement Suggestions | Effort vs impact analysis | Prioritized list |
| 6. Alternative Comparison | Compare approaches (optional) | Recommendation |
| 7. Metrics Tracking | Trend analysis | Historical data |
| 8. Report Compilation | Evidence artifacts | Final report |
Phase 2 Agents (Quick Reference)
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Lint, types, patterns | Quality 0-10 |
| security-auditor | OWASP, secrets, CVEs | Security 0-10 |
| test-generator | Coverage, test quality | Coverage 0-10 |
| backend-system-architect | API design, async | API 0-10 |
| frontend-ui-developer | React 19, Zod, a11y | UI 0-10 |
| python-performance-engineer | Latency, resources, scaling | Performance 0-10 |
Launch ALL agents in ONE message with run_in_background=True and max_turns=25.
Grading & Scoring
See Scoring Rubric for composite formula, grade thresholds, verdict criteria, and blocking rules. See Quality Model for dimension weights. See Grading Rubric for per-agent scoring criteria.
Evidence & Test Execution
See Evidence Collection for git commands, test execution patterns, metrics tracking, and post-verification feedback.
Policy-as-Code
See Policy-as-Code for configuration.
Define verification rules in .claude/policies/verification-policy.json:
{
"thresholds": {
"composite_minimum": 6.0,
"security_minimum": 7.0,
"coverage_minimum": 70
},
"blocking_rules": [
{"dimension": "security", "below": 5.0, "action": "block"}
]
}Report Format
See Report Template for full format. Summary:
# Feature Verification Report
**Composite Score: [N.N]/10** (Grade: [LETTER])
## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**References
- Verification Phases -- 8-phase workflow, agent spawn definitions, Agent Teams mode
- Quality Model -- Scoring dimensions and weights
- Grading Rubric -- Per-agent scoring criteria
- Report Template -- Full report format
- Alternative Comparison -- Approach comparison template
- Orchestration Mode -- Agent Teams vs Task Tool
- Policy-as-Code -- Verification policy configuration
- Verification Checklist -- Pre-flight checklist
Rules
- Scoring Rubric -- Composite scoring, grades, verdicts
- Evidence Collection -- Evidence gathering and test patterns
Related Skills
ork:implement- Full implementation with verificationork:review-pr- PR-specific verificationrun-tests- Detailed test executionork:quality-gates- Quality gate patterns
Version: 3.1.0 (February 2026)
Rules (2)
Evidence Collection Patterns — HIGH
Evidence Collection Patterns
Phase 1: Context Gathering
Run these commands in parallel in ONE message:
git diff main --stat
git log main..HEAD --oneline
git diff main --name-only | sort -uPhase 3: Parallel Test Execution
Run backend and frontend tests in parallel:
# PARALLEL - Backend and frontend
cd backend && poetry run pytest tests/ -v --cov=app --cov-report=json
cd frontend && npm run test -- --coveragePhase 7: Metrics Tracking
Store verification metrics in memory for trend analysis:
mcp__memory__create_entities(entities=[{
"name": "verification-{date}-{feature}",
"entityType": "VerificationMetrics",
"observations": [f"composite_score: {score}", ...]
}])Query trends: mcp__memory__search_nodes(query="VerificationMetrics")
Phase 8.5: Post-Verification Feedback
After report compilation, send scores to metrics-architect for KPI baseline tracking:
Task(subagent_type="metrics-architect", run_in_background=True, max_turns=15,
prompt=f"""Receive verification scores for {feature}:
Composite: {composite_score}/10 (Grade: {grade})
Dimensional breakdown:
- Correctness: {scores['correctness']}/10
- Maintainability: {scores['maintainability']}/10
- Performance: {scores['performance']}/10
- Security: {scores['security']}/10
- Scalability: {scores['scalability']}/10
- Testability: {scores['testability']}/10
- Compliance: {scores['compliance']}/10
Update KPI baselines with these scores. Store trend data in memory
for historical comparison. Flag any dimensions that dropped below
their historical average.""")Scoring Rubric — HIGH
Scoring Rubric
Composite Score
Each agent produces a 0-10 score with decimals for nuance. The composite score is a weighted sum using the weights from Quality Model.
Grade Thresholds
| Grade | Score Range | Verdict |
|---|---|---|
| A | 9.0-10.0 | READY FOR MERGE |
| B | 7.0-8.9 | READY FOR MERGE |
| C | 5.0-6.9 | IMPROVEMENTS RECOMMENDED |
| D | 3.0-4.9 | BLOCKED |
| F | 0.0-2.9 | BLOCKED |
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Scoring scale | 0-10 with decimals | Nuanced, not binary |
| Improvement priority | Impact / Effort ratio | Do high-value first |
| Alternative comparison | Optional phase | Only when multiple valid approaches |
| Metrics persistence | Memory MCP | Track trends over time |
Improvement Suggestions
Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.
Blocking Rules
Verification can be blocked by policy-as-code rules. See Policy-as-Code for configuration of composite minimums, dimension minimums, and blocking rules.
References (8)
Alternative Comparison
Alternative Comparison
Evaluate current implementation against alternative approaches.
When to Compare
- Multiple valid architectures exist
- User asks "is this the best way?"
- Major patterns were chosen (ORM vs raw SQL, REST vs GraphQL)
- Performance/scalability concerns raised
Comparison Criteria
For Each Alternative
| Criterion | Weight | Description |
|---|---|---|
| Effort | 30% | Implementation complexity (1-5 scale) |
| Risk | 25% | Technical and operational risk (1-5 scale) |
| Benefit | 45% | Value delivered, performance, maintainability (1-5 scale) |
Migration Cost
| Factor | Estimate |
|---|---|
| Code changes | Files/lines affected |
| Data migration | Schema changes, backfill |
| Testing | New test coverage needed |
| Rollback risk | Reversibility |
Decision Matrix Format
| Approach | Effort | Risk | Benefit | Score |
|---|---|---|---|---|
| Current | N | N | N | (E0.3 + R0.25 + B*0.45) |
| Alt A | N | N | N | calculated |
| Alt B | N | N | N | calculated |
Note: Higher effort and risk are bad (invert for scoring), higher benefit is good.
Recommendation Formula:
Score = (5 - Effort) * 0.3 + (5 - Risk) * 0.25 + Benefit * 0.45Output Template
### Alternative Comparison: [Topic]
**Current Approach:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]
**Alternative A:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]
- Migration effort: [1-5]
**Recommendation:** [Keep current / Switch to Alt A]
**Justification:** [1-2 sentences]Grading Rubric
Verification Grading Rubric
0-10 scoring criteria for each verification dimension.
Score Levels
| Range | Level | Description |
|---|---|---|
| 0-3 | Poor | Critical issues, blocks merge |
| 4-6 | Adequate | Functional but needs improvement |
| 7-9 | Good | Ready for merge, minor suggestions |
| 10 | Excellent | Exemplary, reference quality |
Dimension Rubrics
Code Quality (Weight: 20%)
| Score | Criteria |
|---|---|
| 10 | Zero lint errors/warnings, strict types, exemplary patterns |
| 8-9 | Zero errors, < 5 warnings, minimal any, good patterns |
| 6-7 | 1-3 errors, some warnings, acceptable patterns |
| 4-5 | 4-10 errors, pattern issues, needs refactoring |
| 1-3 | Many errors, poor patterns, high complexity |
| 0 | Lint/type check fails to run |
Security (Weight: 25%)
| Score | Criteria |
|---|---|
| 10 | No vulnerabilities, all OWASP compliant, secure by design |
| 8-9 | No critical/high, all OWASP, excellent practices |
| 6-7 | No critical, 1-2 high, most OWASP compliant |
| 4-5 | No critical, 3-5 high, some gaps |
| 1-3 | 1+ critical or many high vulnerabilities |
| 0 | Multiple critical, secrets exposed |
Test Coverage (Weight: 20%)
| Score | Criteria |
|---|---|
| 10 | >= 90% coverage, meaningful assertions, edge cases |
| 8-9 | >= 80% coverage, good assertions, critical paths |
| 6-7 | >= 70% coverage (target), basic assertions |
| 4-5 | 50-69% coverage |
| 1-3 | 30-49% coverage |
| 0 | < 30% coverage or tests fail to run |
API Compliance (Weight: 20%)
| Score | Criteria |
|---|---|
| 10 | Perfect REST, RFC 9457 errors, documented, no N+1 |
| 8-9 | Good REST, proper validation, timeout protection |
| 6-7 | Acceptable API, minor inconsistencies |
| 4-5 | Several convention violations |
| 1-3 | Poor API design, missing validation |
| 0 | Broken or insecure endpoints |
UI Compliance (Weight: 15%)
| Score | Criteria |
|---|---|
| 10 | React 19 APIs, full Zod, WCAG AAA, exhaustive types |
| 8-9 | Modern patterns, good validation, WCAG AA |
| 6-7 | Acceptable patterns, some validation |
| 4-5 | Dated patterns, missing validation |
| 1-3 | Poor practices, accessibility issues |
| 0 | Broken or inaccessible components |
Grade Interpretation
| Composite | Grade | Verdict |
|---|---|---|
| 9.0-10.0 | A+ | Ship it |
| 8.0-8.9 | A | Ready for merge |
| 7.0-7.9 | B | Minor improvements optional |
| 6.0-6.9 | C | Consider improvements |
| 5.0-5.9 | D | Improvements recommended |
| < 5.0 | F | Do not merge |
Orchestration Mode
<!-- SHARED: keep in sync with ../../../assess/references/orchestration-mode.md -->
Orchestration Mode Selection
Shared logic for choosing between Agent Teams and Task tool orchestration in assess/verify skills.
Environment Check
import os
teams_available = os.environ.get("CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS") is not None
force_task_tool = os.environ.get("ORCHESTKIT_FORCE_TASK_TOOL") == "1"
if force_task_tool or not teams_available:
mode = "task_tool"
else:
# Teams available — use for full multi-dimensional work
mode = "agent_teams" if scope == "full" else "task_tool"Decision Rules
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMSset --> Agent Teams mode (for full assessment/verification)- Flag not set --> Task tool mode (default)
- Quick/single-dimension scope --> Task tool (regardless of flag)
ORCHESTKIT_FORCE_TASK_TOOL=1--> Task tool (override)
Agent Teams vs Task Tool
| Aspect | Task Tool (Star) | Agent Teams (Mesh) |
|---|---|---|
| Topology | All agents report to lead | Agents communicate with each other |
| Finding correlation | Lead cross-references after completion | Agents share findings in real-time |
| Cross-domain overlap | Independent scoring | Agents alert each other about overlapping concerns |
| Cost | ~200K tokens | ~500K tokens |
| Best for | Focused/single-dimension work | Full multi-dimensional assessment/verification |
Fallback
If Agent Teams encounters issues mid-execution, fall back to Task tool for remaining work. This is safe because both modes produce the same output format (dimensional scores 0-10).
Context Window Note
For full codebase work (>20 files), use the 1M context window to avoid agent context exhaustion. On 200K context, scope discovery should limit files to prevent overflow.
Policy As Code
Policy-as-Code
Define verification policies as machine-readable configuration.
Policy Structure
version: "1.0"
name: policy-name
description: What this policy enforces
thresholds:
composite_minimum: 6.0
coverage_minimum: 70
rules:
blockers: [] # Fail verification
warnings: [] # Note but continue
info: [] # Informational onlyRule Definition
Blocker Rules (Must Pass)
blockers:
- dimension: security
condition: below
value: 5.0
message: "Security score below minimum"
- check: critical_vulnerabilities
condition: above
value: 0
message: "Critical vulnerabilities found"
- check: type_errors
condition: above
value: 0
message: "TypeScript errors must be zero"Warning Rules (Should Fix)
warnings:
- dimension: code_quality
condition: below
value: 7.0
message: "Code quality could be improved"
- check: test_coverage
condition: below
value: 80
message: "Coverage below recommended 80%"Info Rules (Awareness)
info:
- check: todo_count
condition: above
value: 5
message: "Multiple TODOs found in code"Threshold Configuration
| Threshold | Type | Description |
|---|---|---|
| composite_minimum | float | Overall score minimum (0-10) |
| coverage_minimum | int | Test coverage percentage |
| critical_vulnerabilities | int | Max critical vulns (0) |
| high_vulnerabilities | int | Max high vulns |
| lint_errors | int | Max lint errors (0) |
| type_errors | int | Max type errors (0) |
Custom Rules
custom_rules:
- name: no_console_log
pattern: "console\\.log"
file_glob: "**/*.ts"
exclude: ["**/*.test.ts"]
severity: warning
message: "Remove console.log from production"Policy Location
Store at: .claude/policies/verification-policy.yaml
Multiple policies: .claude/policies/\{name\}-policy.yaml
Quality Model
<!-- SHARED: keep in sync with ../../../assess/references/quality-model.md -->
Quality Model
Canonical scoring reference for assess and verify skills. Defines unified dimensions, weights, grade thresholds, and improvement prioritization.
Scoring Dimensions (7 Unified)
| Dimension | Weight | What It Measures |
|---|---|---|
| Correctness | 0.15 | Does it work correctly? Functional accuracy, edge cases handled |
| Maintainability | 0.15 | Easy to understand and modify? Readability, complexity, patterns |
| Performance | 0.12 | Efficient execution? No bottlenecks, resource usage, latency |
| Security | 0.20 | Follows security best practices? OWASP, secrets, CVEs, input validation |
| Scalability | 0.10 | Handles growth? Load patterns, data volume, horizontal scaling |
| Testability | 0.13 | Easy to test? Coverage, test quality, isolation, mocking |
| Compliance | 0.15 | Meets API and UI contracts? Conditional on scope (see below) |
Total: 1.00
Compliance Dimension — Scope Rules
Compliance weight (0.15) applies differently based on project scope:
| Scope | Compliance Covers |
|---|---|
| Backend-only | API compliance (contracts, schema validation, versioning) |
| Frontend-only | UI compliance (design system, a11y, responsive) |
| Full-stack | API + UI compliance (split evenly: 0.075 each) |
Composite Score
composite = sum(dimension_score * weight for each dimension)Each dimension is scored 0-10 with decimal precision. Composite is also 0-10.
Grade Thresholds
| Score | Grade | Verdict | Action |
|---|---|---|---|
| 9.0-10.0 | A+ | EXCELLENT | Ship it! |
| 8.0-8.9 | A | GOOD | Ready for merge |
| 7.0-7.9 | B | GOOD | Minor improvements optional |
| 6.0-6.9 | C | ADEQUATE | Consider improvements |
| 5.0-5.9 | D | NEEDS WORK | Improvements recommended |
| 0.0-4.9 | F | CRITICAL | Do not merge |
Improvement Prioritization
Effort Scale (1-5)
| Points | Effort | Description |
|---|---|---|
| 1 | Trivial | < 15 minutes, single file change |
| 2 | Low | 15-60 minutes, few files |
| 3 | Medium | 1-4 hours, moderate scope |
| 4 | High | 4-8 hours, significant refactoring |
| 5 | Major | 1+ days, architectural change |
Impact Scale (1-5)
| Points | Impact | Description |
|---|---|---|
| 1 | Minimal | Cosmetic, no functional change |
| 2 | Low | Minor improvement, limited scope |
| 3 | Medium | Noticeable quality improvement |
| 4 | High | Significant quality or security gain |
| 5 | Critical | Blocks shipping or fixes major vulnerability |
Priority Formula
priority = impact / effortHigher ratio = do first.
Quick Wins
Effort <= 2 AND Impact >= 4
Always highlight quick wins at the top of improvement suggestions. These are high-value changes that can be done fast.
Report Template
Verification Report Template
Copy this template and fill in results from parallel agent verification.
Quick Copy Template
# Feature Verification Report
**Date**: [TODAY'S DATE]
**Branch**: [branch-name]
**Feature**: [feature description]
**Reviewer**: Claude Code with 5 parallel subagents
**Verification Duration**: [X minutes]
---
## Summary
**Status**: [READY FOR MERGE | NEEDS ATTENTION | BLOCKED]
[1-2 sentence summary of verification results]
---
## Agent Results
### 1. Code Quality (code-quality-reviewer)
| Check | Tool | Exit Code | Errors | Warnings | Status |
|-------|------|-----------|--------|----------|--------|
| Backend Lint | Ruff | 0/1 | N | N | PASS/FAIL |
| Backend Types | ty | 0/1 | N | N | PASS/FAIL |
| Frontend Lint | Biome | 0/1 | N | N | PASS/FAIL |
| Frontend Types | tsc | 0/1 | N | N | PASS/FAIL |
**Pattern Compliance:**
- [ ] No `console.log` in production code
- [ ] No `any` types in TypeScript
- [ ] Exhaustive switches with `assertNever`
- [ ] SOLID principles followed
- [ ] Cyclomatic complexity < 10
**Findings:**
- [List any pattern violations]
---
### 2. Security Audit (security-auditor)
| Check | Tool | Critical | High | Medium | Low | Status |
|-------|------|----------|------|--------|-----|--------|
| JS Dependencies | npm audit | N | N | N | N | PASS/BLOCK |
| Python Dependencies | pip-audit | N | N | N | N | PASS/BLOCK |
| Secrets Scan | grep/gitleaks | N/A | N/A | N/A | N | PASS/BLOCK |
**OWASP Top 10 Compliance:**
- [ ] A01: Broken Access Control
- [ ] A02: Cryptographic Failures
- [ ] A03: Injection
- [ ] A04: Insecure Design
- [ ] A05: Security Misconfiguration
- [ ] A06: Vulnerable Components
- [ ] A07: Auth Failures
- [ ] A08: Data Integrity Failures
- [ ] A09: Logging Failures
- [ ] A10: SSRF
**Findings:**
- [List any security issues]
---
### 3. Test Coverage (test-generator)
| Suite | Total | Passed | Failed | Skipped | Coverage | Target | Status |
|-------|-------|--------|--------|---------|----------|--------|--------|
| Backend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| Backend Integration | N | N | N | N | X% | 70% | PASS/FAIL |
| Frontend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| E2E | N | N | N | N | N/A | N/A | PASS/FAIL |
**Test Quality:**
- [ ] Meaningful assertions (not just `assert result`)
- [ ] Edge cases covered (empty, error, timeout)
- [ ] No flaky tests (no sleep, no timing deps)
- [ ] MSW used for API mocking (not jest.mock)
**Coverage Gaps:**
- [List uncovered critical paths]
---
### 4. API Compliance (backend-system-architect)
| Check | Compliant | Issues |
|-------|-----------|--------|
| REST Conventions | Yes/No | [details] |
| Pydantic v2 Validation | Yes/No | [details] |
| RFC 9457 Error Handling | Yes/No | [details] |
| Async Timeout Protection | Yes/No | [details] |
| No N+1 Queries | Yes/No | [details] |
**Findings:**
- [List any API compliance issues]
---
### 5. UI Compliance (frontend-ui-developer)
| Check | Compliant | Issues |
|-------|-----------|--------|
| React 19 APIs (useOptimistic, useFormStatus, use()) | Yes/No | [details] |
| Zod Validation on API Responses | Yes/No | [details] |
| Exhaustive Type Checking | Yes/No | [details] |
| Skeleton Loading States | Yes/No | [details] |
| Prefetching on Navigation | Yes/No | [details] |
| WCAG 2.1 AA Accessibility | Yes/No | [details] |
**Findings:**
- [List any UI compliance issues]
---
## Quality Gates Summary
| Gate | Required | Actual | Status |
|------|----------|--------|--------|
| Test Coverage | >= 70% | X% | PASS/FAIL |
| Security Critical | 0 | N | PASS/FAIL |
| Security High | <= 5 | N | PASS/FAIL |
| Type Errors | 0 | N | PASS/FAIL |
| Lint Errors | 0 | N | PASS/FAIL |
**Overall Gate Status**: [ALL PASS | SOME FAIL]
---
## Blockers (Must Fix Before Merge)
1. [Blocker description with file:line reference]
2. [Blocker description with file:line reference]
---
## Suggestions (Non-Blocking)
1. [Suggestion for improvement]
2. [Suggestion for improvement]
---
## Evidence Artifacts
| Artifact | Location | Generated |
|----------|----------|-----------|
| Test Results | `/tmp/test_results.log` | [timestamp] |
| Coverage Report | `/tmp/coverage.json` | [timestamp] |
| Security Scan | `/tmp/security_audit.json` | [timestamp] |
| Lint Report | `/tmp/lint_results.log` | [timestamp] |
| E2E Screenshot | `/tmp/verification.png` | [timestamp] |
---
## Verification Metadata
- **Agents Used**: 5 (code-quality-reviewer, security-auditor, test-generator, backend-system-architect, frontend-ui-developer)
- **Parallel Execution**: Yes
- **Total Tool Calls**: ~N
- **Context Usage**: ~N tokensStatus Definitions
| Status | Emoji | Meaning | Action Required |
|---|---|---|---|
| READY FOR MERGE | Green | All checks pass, no blockers | Approve PR |
| NEEDS ATTENTION | Yellow | Minor issues found | Review suggestions, optionally fix |
| BLOCKED | Red | Critical issues found | Must fix before merge |
Severity Levels
| Level | Threshold | Action | Blocks Merge |
|---|---|---|---|
| Critical | Any | Fix immediately | YES |
| High | > 5 | Fix before merge | YES |
| Medium | > 20 | Should fix | NO (with justification) |
| Low | > 50 | Nice to have | NO |
| Info | N/A | Informational | NO |
Agent Output JSON Schemas
code-quality-reviewer Output
{
"linting": {"tool": "ruff|biome", "exit_code": 0, "errors": 0, "warnings": 0},
"type_check": {"tool": "ty|tsc", "exit_code": 0, "errors": 0},
"patterns": {"violations": [], "compliance": "PASS|FAIL"},
"approval": {"status": "APPROVED|NEEDS_FIXES", "blockers": []}
}security-auditor Output
{
"scan_summary": {"files_scanned": 100, "vulnerabilities_found": 0},
"critical": [],
"high": [],
"secrets_detected": [],
"recommendations": [],
"approval": {"status": "PASS|BLOCK", "blockers": []}
}test-generator Output
{
"coverage": {"current": 85, "target": 70, "passed": true},
"test_summary": {"total": 100, "passed": 98, "failed": 2, "skipped": 0},
"gaps": ["file:line - reason"],
"quality_issues": [],
"approval": {"status": "PASS|FAIL", "blockers": []}
}backend-system-architect Output
{
"api_compliance": {"rest_conventions": true, "issues": []},
"validation": {"pydantic_v2": true, "issues": []},
"error_handling": {"rfc9457": true, "issues": []},
"async_safety": {"timeouts": true, "issues": []},
"approval": {"status": "PASS|FAIL", "blockers": []}
}frontend-ui-developer Output
{
"react_19": {"apis_used": ["useOptimistic"], "missing": [], "compliant": true},
"zod_validation": {"validated_endpoints": 10, "unvalidated": []},
"type_safety": {"exhaustive_switches": true, "any_types": 0},
"ux_patterns": {"skeletons": true, "prefetching": true},
"accessibility": {"wcag_issues": []},
"approval": {"status": "PASS|FAIL", "blockers": []}
}Verification Checklist
Verification Checklist
Pre-flight checklist for comprehensive feature verification with parallel agents.
Pre-Verification Setup
Context Gathering
- Run
git diff main --statto understand change scope - Run
git log main..HEAD --onelineto see commit history - Identify affected domains (backend/frontend/both)
- Check for any existing failing tests
Task Creation (CC 2.1.16)
- Create parent verification task
- Create subtasks for each agent domain
- Set proper dependencies if needed
Agent Dispatch Checklist
Required Agents (Full-Stack)
| Agent | Launched | Completed | Status |
|---|---|---|---|
| code-quality-reviewer | [ ] | [ ] | Pending |
| security-auditor | [ ] | [ ] | Pending |
| test-generator | [ ] | [ ] | Pending |
| backend-system-architect | [ ] | [ ] | Pending |
| frontend-ui-developer | [ ] | [ ] | Pending |
Optional Agents (Add as Needed)
| Condition | Agent | Launched |
|---|---|---|
| AI/ML features | llm-integrator | [ ] |
| Performance-critical | frontend-performance-engineer | [ ] |
| Database changes | database-engineer | [ ] |
Quality Gate Checklist
Mandatory Gates
| Gate | Threshold | Actual | Pass |
|---|---|---|---|
| Test Coverage | >= 70% | ___% | [ ] |
| Security Critical | 0 | ___ | [ ] |
| Security High | <= 5 | ___ | [ ] |
| Type Errors | 0 | ___ | [ ] |
| Lint Errors | 0 | ___ | [ ] |
Code Quality Gates
| Check | Status |
|---|---|
| No console.log in production | [ ] |
No any types | [ ] |
| Exhaustive switches (assertNever) | [ ] |
| Proper error handling | [ ] |
| No hardcoded secrets | [ ] |
Frontend-Specific Gates (if applicable)
| Check | Status |
|---|---|
| React 19 APIs used | [ ] |
| Zod validation on API responses | [ ] |
| Skeleton loading states | [ ] |
| Prefetching on links | [ ] |
| WCAG 2.1 AA compliance | [ ] |
Backend-Specific Gates (if applicable)
| Check | Status |
|---|---|
| REST conventions followed | [ ] |
| Pydantic v2 validation | [ ] |
| RFC 9457 error handling | [ ] |
| Async timeout protection | [ ] |
| No N+1 queries | [ ] |
Evidence Collection
Required Evidence
- Test results with exit code
- Coverage report (JSON format)
- Linting results
- Type checking results
- Security scan results
Optional Evidence
- E2E test screenshots
- Performance benchmarks
- Bundle size analysis
- Accessibility audit
Report Generation
Report Sections
- Summary (READY/NEEDS ATTENTION/BLOCKED)
- Agent Results (all 5 domains)
- Quality Gates table
- Blockers list (if any)
- Suggestions list
- Evidence links
Final Steps
- Update all task statuses to completed
- Store verification evidence in context
- Generate final report markdown
Quick Reference: Agent Prompts
code-quality-reviewer
Focus: Lint, type check, anti-patterns, SOLID, complexity
security-auditor
Focus: Dependency audit, secrets, OWASP Top 10, rate limiting
test-generator
Focus: Coverage gaps, test quality, edge cases, flaky tests
backend-system-architect
Focus: REST, Pydantic v2, RFC 9457, async safety, N+1
frontend-ui-developer
Focus: React 19, Zod, exhaustive types, skeletons, prefetch, a11y
Troubleshooting
Agent Not Responding
- Check if agent was launched with
run_in_background=True - Verify agent name matches exactly
- Check for context window limits
Tests Failing
- Run tests locally first
- Check for missing dependencies
- Verify test database state
- Look for timing-dependent tests
Coverage Below Threshold
- Identify uncovered files
- Check for excluded patterns
- Focus on critical paths first
Verification Phases
Verification Phases — Detailed Workflow
Phase Overview
| Phase | Activities | Output |
|---|---|---|
| 1. Context Gathering | Git diff, commit history | Changes summary |
| 2. Parallel Agent Dispatch | 6 agents evaluate | 0-10 scores |
| 3. Test Execution | Backend + frontend tests | Coverage data |
| 4. Nuanced Grading | Composite score calculation | Grade (A-F) |
| 5. Improvement Suggestions | Effort vs impact analysis | Prioritized list |
| 6. Alternative Comparison | Compare approaches (optional) | Recommendation |
| 7. Metrics Tracking | Trend analysis | Historical data |
| 8. Report Compilation | Evidence artifacts | Final report |
Phase 2: Parallel Agent Dispatch (6 Agents)
Launch ALL agents in ONE message with run_in_background=True and max_turns=25.
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Lint, types, patterns | Quality 0-10 |
| security-auditor | OWASP, secrets, CVEs | Security 0-10 |
| test-generator | Coverage, test quality | Coverage 0-10 |
| backend-system-architect | API design, async | API 0-10 |
| frontend-ui-developer | React 19, Zod, a11y | UI 0-10 |
| python-performance-engineer | Latency, resources, scaling | Performance 0-10 |
Use python-performance-engineer for backend-focused verification or frontend-performance-engineer for frontend-focused verification. See Quality Model for Performance (0.12) and Scalability (0.10) weights.
See Grading Rubric for detailed scoring criteria.
Agent Teams Alternative
In Agent Teams mode, form a verification team where agents share findings and coordinate scoring:
TeamCreate(team_name="verify-{feature}", description="Verify {feature}")
Task(subagent_type="code-quality-reviewer", name="quality-verifier",
team_name="verify-{feature}",
prompt="""Verify code quality for {feature}. Score 0-10.
When you find patterns that affect security, message security-verifier.
When you find untested code paths, message test-verifier.
Share your quality score with all teammates for composite calculation.""")
Task(subagent_type="security-auditor", name="security-verifier",
team_name="verify-{feature}",
prompt="""Security verification for {feature}. Score 0-10.
When quality-verifier flags security-relevant patterns, investigate deeper.
When you find vulnerabilities in API endpoints, message api-verifier.
Share severity findings with test-verifier for test gap analysis.""")
Task(subagent_type="test-generator", name="test-verifier",
team_name="verify-{feature}",
prompt="""Verify test coverage for {feature}. Score 0-10.
When quality-verifier or security-verifier flag untested paths, quantify the gap.
Run existing tests and report coverage metrics.
Message the lead with coverage data for composite scoring.""")
Task(subagent_type="backend-system-architect", name="api-verifier",
team_name="verify-{feature}",
prompt="""Verify API design and backend patterns for {feature}. Score 0-10.
When security-verifier flags endpoint issues, validate and score.
Share API compliance findings with ui-verifier for consistency check.""")
Task(subagent_type="frontend-ui-developer", name="ui-verifier",
team_name="verify-{feature}",
prompt="""Verify frontend implementation for {feature}. Score 0-10.
When api-verifier shares API patterns, verify frontend matches.
Check React 19 patterns, accessibility, and loading states.
Share findings with quality-verifier for overall assessment.""")
# Conditional 6th agent — use python-performance-engineer for backend,
# frontend-performance-engineer for frontend
Task(subagent_type="python-performance-engineer", name="perf-verifier",
team_name="verify-{feature}",
prompt="""Verify performance and scalability for {feature}. Score 0-10.
Assess latency, resource usage, caching, and scaling patterns.
When security-verifier flags resource-intensive endpoints, profile them.
Share performance findings with api-verifier and quality-verifier.""")Team teardown after report compilation:
# After composite grading and report generation
SendMessage(type="shutdown_request", recipient="quality-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="security-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="test-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="api-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="ui-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="perf-verifier", content="Verification complete")
TeamDelete()Fallback: If team formation fails, use standard Phase 2 Task spawns above.
Manual cleanup: If
TeamDelete()doesn't terminate all agents, pressCtrl+Ftwice to force-kill remaining background agents.
Phase 4: Nuanced Grading
See Quality Model for scoring dimensions, weights, and grade interpretation. See Grading Rubric for detailed per-agent scoring criteria.
Phase 5: Improvement Suggestions
Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.
Phase 6: Alternative Comparison (Optional)
See Alternative Comparison for template.
Use when:
- Multiple valid approaches exist
- User asked "is this the best way?"
- Major architectural decisions made
Phase 8: Report Compilation
See Report Template for full format.
# Feature Verification Report
**Composite Score: [N.N]/10** (Grade: [LETTER])
## Top Improvement Suggestions
| # | Suggestion | Effort | Impact | Priority |
|---|------------|--------|--------|----------|
| 1 | [highest] | [N] | [N] | [N.N] |
## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**Checklists (1)
Verification Checklist
Verification Checklist
Quick checklist for comprehensive feature verification.
Grading Complete
- All 5 dimensions rated (0-10 scale)
- Weights applied correctly (20/25/20/20/15)
- Composite score calculated
- Grade letter assigned (A+ to F)
Evidence Collected
- Test results with exit codes
- Coverage report (JSON)
- Security scan results
- Lint/type check output
- Evidence files linked in report
Improvements Documented
- Each suggestion has effort estimate (1-5)
- Each suggestion has impact estimate (1-5)
- Priority calculated (Impact / Effort)
- Quick wins identified (low effort, high impact)
Alternatives Considered
- Current approach scored
- At least one alternative evaluated
- Migration cost estimated
- Recommendation documented
Policy Compliance
- No blocking rule violations
- Warning rules acknowledged
- Thresholds checked (composite, security, coverage)
Report Generated
- All sections filled
- Verdict assigned (Ready/Recommended/Blocked)
- Tasks updated to completed
Validate Counts
Validates hook, skill, and agent counts are consistent across CLAUDE.md, hooks.json, manifests, and source directories. Use when counts may be stale after adding or removing components, before releases, or when CLAUDE.md Project Overview looks wrong.
Vite Advanced
Advanced Vite 7+ patterns including Environment API, plugin development, SSR configuration, library mode, and build optimization. Use when customizing build pipelines, creating plugins, or configuring multi-environment builds.
Last updated on