Verify
Comprehensive verification with parallel test agents. Use when verifying implementations or validating changes.
/ork:verifyVerify Feature
Comprehensive verification using parallel specialized agents with nuanced grading (0-10 scale) and improvement suggestions.
Quick Start
/ork:verify authentication flow
/ork:verify --model=opus user profile feature
/ork:verify --scope=backend database migrationsArgument Resolution
SCOPE = "$ARGUMENTS" # Full argument string, e.g., "authentication flow"
SCOPE_TOKEN = "$ARGUMENTS[0]" # First token for flag detection (e.g., "--scope=backend")
# $ARGUMENTS[0], $ARGUMENTS[1] etc. for indexed access (CC 2.1.59)
# Model override detection (CC 2.1.72)
MODEL_OVERRIDE = None
for token in "$ARGUMENTS".split():
if token.startswith("--model="):
MODEL_OVERRIDE = token.split("=", 1)[1] # "opus", "sonnet", "haiku"
SCOPE = SCOPE.replace(token, "").strip()Pass MODEL_OVERRIDE to all Agent() calls via model=MODEL_OVERRIDE when set. Accepts symbolic names (opus, sonnet, haiku) or full IDs (claude-opus-4-6) per CC 2.1.74.
Opus 4.6: Agents use native adaptive thinking (no MCP sequential-thinking needed). Extended 128K output supports comprehensive verification reports.
STEP 0: Effort-Aware Verification Scaling (CC 2.1.76)
Scale verification depth based on /effort level:
| Effort Level | Phases Run | Agents | Output |
|---|---|---|---|
| low | Run tests only → pass/fail | 0 agents | Quick check |
| medium | Tests + code quality + security | 3 agents | Score + top issues |
| high (default) | All 8 phases + visual capture | 6-7 agents | Full report + grades |
Override: Explicit user selection (e.g., "Full verification") overrides
/effortdownscaling.
STEP 0a: Verify User Intent with AskUserQuestion
BEFORE creating tasks, clarify verification scope:
AskUserQuestion(
questions=[{
"question": "What scope for this verification?",
"header": "Scope",
"options": [
{"label": "Full verification (Recommended)", "description": "All tests + security + code quality + visual + grades", "markdown": "```\nFull Verification (10 phases)\n─────────────────────────────\n 7 parallel agents:\n ┌────────────┐ ┌────────────┐\n │ Code │ │ Security │\n │ Quality │ │ Auditor │\n ├────────────┤ ├────────────┤\n │ Test │ │ Backend │\n │ Generator │ │ Architect │\n ├────────────┤ ├────────────┤\n │ Frontend │ │ Performance│\n │ Developer │ │ Engineer │\n ├────────────┤ └────────────┘\n │ Visual │\n │ Capture │ → gallery.html\n └────────────┘\n ▼\n Composite Score (0-10)\n 8 dimensions + Grade\n + Visual Gallery\n```"},
{"label": "Tests only", "description": "Run unit + integration + e2e tests", "markdown": "```\nTests Only\n──────────\n npm test ──▶ Results\n ┌─────────────────────┐\n │ Unit tests ✓/✗ │\n │ Integration ✓/✗ │\n │ E2E ✓/✗ │\n │ Coverage NN% │\n └─────────────────────┘\n Skip: security, quality, UI\n Output: Pass/fail + coverage\n```"},
{"label": "Security audit", "description": "Focus on security vulnerabilities", "markdown": "```\nSecurity Audit\n──────────────\n security-auditor agent:\n ┌─────────────────────────┐\n │ OWASP Top 10 ✓/✗ │\n │ Dependency CVEs ✓/✗ │\n │ Secrets scan ✓/✗ │\n │ Auth flow review ✓/✗ │\n │ Input validation ✓/✗ │\n └─────────────────────────┘\n Output: Security score 0-10\n + vulnerability list\n```"},
{"label": "Code quality", "description": "Lint, types, complexity analysis", "markdown": "```\nCode Quality\n────────────\n code-quality-reviewer agent:\n ┌─────────────────────────┐\n │ Lint errors N │\n │ Type coverage NN% │\n │ Cyclomatic complex N.N │\n │ Dead code N │\n │ Pattern violations N │\n └─────────────────────────┘\n Output: Quality score 0-10\n + refactor suggestions\n```"},
{"label": "Quick check", "description": "Just run tests, skip detailed analysis", "markdown": "```\nQuick Check (~1 min)\n────────────────────\n Run tests ──▶ Pass/Fail\n\n Output:\n ├── Test results\n ├── Build status\n └── Lint status\n No agents, no grading,\n no report generation\n```"}
],
"multiSelect": true
}]
)Based on answer, adjust workflow:
- Full verification: All 10 phases (8 + 2.5 + 8.5), 7 parallel agents including visual capture
- Tests only: Skip phases 2 (security), 5 (UI/UX analysis)
- Security audit: Focus on security-auditor agent
- Code quality: Focus on code-quality-reviewer agent
- Quick check: Run tests only, skip grading and suggestions
STEP 0b: Select Orchestration Mode
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/orchestration-mode.md") for env var check logic, Agent Teams vs Task Tool comparison, and mode selection rules.
Choose Agent Teams (mesh -- verifiers share findings) or Task tool (star -- all report to lead) based on the orchestration mode reference.
MCP Probe + Resume
ToolSearch(query="select:mcp__memory__search_nodes")
Write(".claude/chain/capabilities.json", { memory, timestamp })
Read(".claude/chain/state.json") # resume if existsHandoff File
After verification completes, write results:
Write(".claude/chain/verify-results.json", JSON.stringify({
"phase": "verify", "skill": "verify",
"timestamp": now(), "status": "completed",
"outputs": {
"tests_passed": N, "tests_failed": N,
"coverage": "87%", "security_scan": "clean"
}
}))Regression Monitor (CC 2.1.71)
Optionally schedule post-verification monitoring:
# Guard: Skip cron in headless/CI (CLAUDE_CODE_DISABLE_CRON)
# if env CLAUDE_CODE_DISABLE_CRON is set, run a single check instead
CronCreate(
schedule="0 8 * * *",
prompt="Daily regression check: npm test.
If 7 consecutive passes → CronDelete.
If failures → alert with details."
)Task Management (CC 2.1.16)
# 1. Create main verification task
TaskCreate(
subject="Verify [feature-name] implementation",
description="Comprehensive verification with nuanced grading",
activeForm="Verifying [feature-name] implementation"
)
# 2. Create subtasks for 8-phase process
TaskCreate(subject="Run code quality checks", activeForm="Running quality checks") # id=2
TaskCreate(subject="Execute security audit", activeForm="Running security audit") # id=3
TaskCreate(subject="Verify test coverage", activeForm="Verifying test coverage") # id=4
TaskCreate(subject="Validate API", activeForm="Validating API") # id=5
TaskCreate(subject="Check UI/UX", activeForm="Checking UI/UX") # id=6
TaskCreate(subject="Calculate grades", activeForm="Calculating grades") # id=7
TaskCreate(subject="Generate suggestions", activeForm="Generating suggestions") # id=8
TaskCreate(subject="Compile report", activeForm="Compiling report") # id=9
# 3. Set dependencies — phases 2-6 run in parallel, 7-9 are sequential
TaskUpdate(taskId="7", addBlockedBy=["2", "3", "4", "5", "6"]) # Grading needs all checks
TaskUpdate(taskId="8", addBlockedBy=["7"]) # Suggestions need grades
TaskUpdate(taskId="9", addBlockedBy=["8"]) # Report needs suggestions
# 4. Before starting each task, verify it's unblocked
task = TaskGet(taskId="2") # Verify blockedBy is empty
# 5. Update status as you progress
TaskUpdate(taskId="2", status="in_progress") # When starting
TaskUpdate(taskId="2", status="completed") # When done — repeat for each subtask8-Phase Workflow
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/verification-phases.md") for complete phase details, agent spawn definitions, Agent Teams alternative, and team teardown.
| Phase | Activities | Output |
|---|---|---|
| 1. Context Gathering | Git diff, commit history | Changes summary |
| 2. Parallel Agent Dispatch | 6 agents evaluate | 0-10 scores |
| 2.5 Visual Capture | Screenshot routes, AI vision eval | Gallery + visual score |
| 3. Test Execution | Backend + frontend tests | Coverage data |
| 4. Nuanced Grading | Composite score calculation | Grade (A-F) |
| 5. Improvement Suggestions | Effort vs impact analysis | Prioritized list |
| 6. Alternative Comparison | Compare approaches (optional) | Recommendation |
| 7. Metrics Tracking | Trend analysis | Historical data |
| 8. Report Compilation | Evidence artifacts + gallery.html | Final report |
| 8.5 Agentation Loop | User annotates, ui-feedback fixes | Before/after diffs |
Phase 2 Agents (Quick Reference)
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Lint, types, patterns | Quality 0-10 |
| security-auditor | OWASP, secrets, CVEs | Security 0-10 |
| test-generator | Coverage, test quality | Coverage 0-10 |
| backend-system-architect | API design, async | API 0-10 |
| frontend-ui-developer | React 19, Zod, a11y | UI 0-10 |
| python-performance-engineer | Latency, resources, scaling | Performance 0-10 |
Launch ALL agents in ONE message with run_in_background=True and max_turns=25.
Progressive Output (CC 2.1.76+)
Output each agent's score as soon as it completes — don't wait for all 6-7 agents.
Focus mode (CC 2.1.101): In focus mode, include the full composite score, all dimension scores, and the verdict in your final message — the user didn't see the incremental outputs.
Security: 8.2/10 — No critical vulnerabilities found
Code Quality: 7.5/10 — 3 complexity hotspots identified
[...remaining agents still running...]This gives users real-time visibility into multi-agent verification. If any dimension scores below the security_minimum threshold (default 5.0), flag it as a blocker immediately — the user can terminate early without waiting for remaining agents.
Monitor + Partial Results (CC 2.1.98)
Use Monitor for streaming test execution output from background scripts:
# Stream test output in real-time instead of waiting for completion
Bash(command="npm test 2>&1", run_in_background=true)
Monitor(pid=test_task_id) # Each line → notificationPartial results (CC 2.1.98): If a verification agent fails mid-analysis, synthesize partial scores rather than re-spawning:
for agent_result in verification_results:
if "[PARTIAL RESULT]" in agent_result.output:
# Extract whatever scores the agent produced before crashing
partial_score = parse_score(agent_result.output) # May be incomplete
scores[agent_result.dimension] = {
"score": partial_score, "partial": True,
"note": "Agent crashed — score based on partial analysis"
}
# A 4-dimension score is better than no score. Do NOT re-spawn.Phase 2.5: Visual Capture (NEW — runs in parallel with Phase 2)
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/visual-capture.md") for auto-detection, route discovery, screenshot capture, and AI vision evaluation.
Summary: Auto-detects project framework, starts dev server, discovers routes, uses agent-browser to screenshot each route, evaluates with Claude vision, generates self-contained gallery.html with base64-embedded images.
Output: verification-output/\{timestamp\}/gallery.html — open in browser to see all screenshots with AI evaluations, scores, and annotation diffs.
Graceful degradation: If no frontend detected or server won't start, skips visual capture with a warning — never blocks verification.
Phase 8.5: Agentation Visual Feedback (opt-in)
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/visual-capture.md") (Phase 8.5 section) for agentation loop workflow.
Trigger: Only when agentation MCP is configured. Offers user the choice to annotate the live UI. ui-feedback agent processes annotations, re-screenshots show before/after.
Grading & Scoring
Load Read("$\{CLAUDE_PLUGIN_ROOT\}/skills/quality-gates/references/unified-scoring-framework.md") for dimensions, weights, grade thresholds, and improvement prioritization. Load Read("$\{CLAUDE_SKILL_DIR\}/references/quality-model.md") for verify-specific extensions (Visual dimension). Load Read("$\{CLAUDE_SKILL_DIR\}/references/grading-rubric.md") for per-agent scoring criteria.
Evidence & Test Execution
Load details: Read("$\{CLAUDE_SKILL_DIR\}/rules/evidence-collection.md") for git commands, test execution patterns, metrics tracking, and post-verification feedback.
Policy-as-Code
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/policy-as-code.md") for configuration.
Define verification rules in .claude/policies/verification-policy.json:
{
"thresholds": {
"composite_minimum": 6.0,
"security_minimum": 7.0,
"coverage_minimum": 70
},
"blocking_rules": [
{"dimension": "security", "below": 5.0, "action": "block"}
]
}Report Format
Load details: Read("$\{CLAUDE_SKILL_DIR\}/references/report-template.md") for full format. Summary:
# Feature Verification Report
**Composite Score: [N.N]/10** (Grade: [LETTER])
## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**References
Load on demand with Read("$\{CLAUDE_SKILL_DIR\}/references/<file>"):
| File | Content |
|---|---|
verification-phases.md | 8-phase workflow, agent spawn definitions, Agent Teams mode |
visual-capture.md | Phase 2.5 + 8.5: screenshot capture, AI vision, gallery generation, agentation loop |
quality-model.md | Scoring dimensions and weights (8 unified) |
grading-rubric.md | Per-agent scoring criteria |
report-template.md | Full report format with visual evidence section |
alternative-comparison.md | Approach comparison template |
orchestration-mode.md | Agent Teams vs Task Tool |
policy-as-code.md | Verification policy configuration |
verification-checklist.md | Pre-flight checklist |
Rules
Load on demand with Read("$\{CLAUDE_SKILL_DIR\}/rules/<file>"):
| File | Content |
|---|---|
scoring-rubric.md | Composite scoring, grades, verdicts |
evidence-collection.md | Evidence gathering and test patterns |
Verification Gate (Cross-Cutting)
Load Read("$\{CLAUDE_PLUGIN_ROOT\}/skills/shared/rules/verification-gate.md") — the minimum 5-step gate that applies to ALL completion claims across all skills. This is non-negotiable: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
Anti-Sycophancy Protocol
Load Read("$\{CLAUDE_PLUGIN_ROOT\}/skills/shared/rules/anti-sycophancy.md") — all verification agents report findings directly without performative agreement. "Should be fine" is not evidence. "Tests pass (exit 0, 47/47)" is.
Agent Status Protocol
All verification agents MUST report using the standardized protocol: Read("$\{CLAUDE_PLUGIN_ROOT\}/agents/shared/status-protocol.md"). Never report DONE if concerns exist. Never silently produce work you're unsure about.
Agent Coordination
SendMessage (Cross-Agent Findings)
When a security agent finds a critical issue, share it with other verification agents:
SendMessage(to="test-generator", message="Security: SQL injection in user_service.py:88 — add parameterized query test")
SendMessage(to="code-quality-reviewer", message="Security finding at user_service.py:88 — flag in review")Skill Chain
After verification, chain to commit if all gates pass:
TaskCreate(subject="Commit verified changes", activeForm="Committing", addBlockedBy=[verify_task_id])
# Then: /ork:commitRelated Skills
ork:implement- Full implementation with verificationork:review-pr- PR-specific verificationtesting-unit/testing-integration/testing-e2e- Test execution patternsork:quality-gates- Quality gate patternsbrowser-tools- Browser automation for visual capture
Version: 4.2.0 (March 2026) — Added progressive output for incremental agent scores
Rules (2)
Evidence Collection Patterns — HIGH
Evidence Collection Patterns
Phase 1: Context Gathering
Run these commands in parallel in ONE message:
git diff main --stat
git log main..HEAD --oneline
git diff main --name-only | sort -uIncorrect:
# Sequential — wastes time, no coverage data
cd backend && pytest tests/
cd frontend && npm testCorrect:
# Parallel with coverage — run both in ONE message
cd backend && poetry run pytest tests/ -v --cov=app --cov-report=json
cd frontend && npm run test -- --coveragePhase 3: Parallel Test Execution
Run backend and frontend tests in parallel:
# PARALLEL - Backend and frontend
cd backend && poetry run pytest tests/ -v --cov=app --cov-report=json
cd frontend && npm run test -- --coveragePhase 7: Metrics Tracking
Store verification metrics in memory for trend analysis:
mcp__memory__create_entities(entities=[{
"name": "verification-{date}-{feature}",
"entityType": "VerificationMetrics",
"observations": [f"composite_score: {score}", ...]
}])Query trends: mcp__memory__search_nodes(query="VerificationMetrics")
Phase 2.5: Visual Evidence Collection
Run in parallel with Phase 2 agents. Auto-detects frontend framework and captures screenshots.
Incorrect:
# Manual screenshots with no structure
open http://localhost:3000
# Take manual screenshot...Correct:
# Automated visual capture with AI evaluation
Agent(
subagent_type="general-purpose",
prompt="Visual capture: detect framework, start server, screenshot routes via agent-browser, evaluate with Claude vision, generate gallery.html",
run_in_background=True
)Output structure:
verification-output/{timestamp}/
├── screenshots/ (PNGs per route, base64 in gallery)
├── ai-evaluations/ (JSON per screenshot with score + issues)
├── annotations/ (before/after if agentation used)
│ ├── before/
│ └── after/
└── gallery.html (self-contained, open in browser)Phase 8.5: Post-Verification Feedback
After report compilation, store verification scores in the memory graph for KPI baseline tracking:
Query trends: mcp__memory__search_nodes(query="VerificationScores")
Scoring Rubric — HIGH
Scoring Rubric
Composite Score
Each agent produces a 0-10 score with decimals for nuance. The composite score is a weighted sum using the weights from Quality Model.
Grade Thresholds
<!-- Canonical source: ../references/quality-model.md — keep in sync -->
| Grade | Score Range | Verdict |
|---|---|---|
| A+ | 9.0-10.0 | EXCELLENT |
| A | 8.0-8.9 | READY FOR MERGE |
| B | 7.0-7.9 | READY FOR MERGE |
| C | 6.0-6.9 | IMPROVEMENTS RECOMMENDED |
| D | 5.0-5.9 | IMPROVEMENTS RECOMMENDED |
| F | 0.0-4.9 | BLOCKED |
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Scoring scale | 0-10 with decimals | Nuanced, not binary |
| Improvement priority | Impact / Effort ratio | Do high-value first |
| Alternative comparison | Optional phase | Only when multiple valid approaches |
| Metrics persistence | Memory MCP | Track trends over time |
Incorrect:
Security: "looks fine" → 8/10 # No evidence, subjective
Performance: "fast enough" → 7/10 # No benchmarksCorrect:
Security: "11/11 injection tests pass, 13 deny patterns, 0 CVEs" → 9/10
Performance: "p99 latency 142ms (budget: 300ms), 0 N+1 queries" → 8.5/10Improvement Suggestions
Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.
Blocking Rules
Verification can be blocked by policy-as-code rules. See Policy-as-Code for configuration of composite minimums, dimension minimums, and blocking rules.
References (9)
Alternative Comparison
Alternative Comparison
Evaluate current implementation against alternative approaches.
When to Compare
- Multiple valid architectures exist
- User asks "is this the best way?"
- Major patterns were chosen (ORM vs raw SQL, REST vs GraphQL)
- Performance/scalability concerns raised
Comparison Criteria
For Each Alternative
| Criterion | Weight | Description |
|---|---|---|
| Effort | 30% | Implementation complexity (1-5 scale) |
| Risk | 25% | Technical and operational risk (1-5 scale) |
| Benefit | 45% | Value delivered, performance, maintainability (1-5 scale) |
Migration Cost
| Factor | Estimate |
|---|---|
| Code changes | Files/lines affected |
| Data migration | Schema changes, backfill |
| Testing | New test coverage needed |
| Rollback risk | Reversibility |
Decision Matrix Format
| Approach | Effort | Risk | Benefit | Score |
|---|---|---|---|---|
| Current | N | N | N | (E0.3 + R0.25 + B*0.45) |
| Alt A | N | N | N | calculated |
| Alt B | N | N | N | calculated |
Note: Higher effort and risk are bad (invert for scoring), higher benefit is good.
Recommendation Formula:
Score = (5 - Effort) * 0.3 + (5 - Risk) * 0.25 + Benefit * 0.45Output Template
### Alternative Comparison: [Topic]
**Current Approach:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]
**Alternative A:** [description]
- Score: N/10
- Pros: [strengths]
- Cons: [weaknesses]
- Migration effort: [1-5]
**Recommendation:** [Keep current / Switch to Alt A]
**Justification:** [1-2 sentences]Grading Rubric
Verification Grading Rubric
0-10 scoring criteria for each verification dimension.
Score Levels
| Range | Level | Description |
|---|---|---|
| 0-3 | Poor | Critical issues, blocks merge |
| 4-6 | Adequate | Functional but needs improvement |
| 7-9 | Good | Ready for merge, minor suggestions |
| 10 | Excellent | Exemplary, reference quality |
Dimension Rubrics
<!-- Weights from canonical source: ../references/quality-model.md — keep in sync -->
Correctness (Weight: 14%)
| Score | Criteria |
|---|---|
| 10 | All functional requirements met, edge cases handled, zero regressions |
| 8-9 | Core requirements met, most edge cases handled |
| 6-7 | Core paths work, some edge cases missing |
| 4-5 | Partial functionality, notable gaps |
| 1-3 | Broken core paths |
| 0 | Does not run |
Maintainability (Weight: 14%)
| Score | Criteria |
|---|---|
| 10 | Zero lint errors/warnings, strict types, exemplary patterns, low complexity |
| 8-9 | Zero errors, < 5 warnings, minimal any, good patterns |
| 6-7 | 1-3 errors, some warnings, acceptable patterns |
| 4-5 | 4-10 errors, pattern issues, needs refactoring |
| 1-3 | Many errors, poor patterns, high complexity |
| 0 | Lint/type check fails to run |
Performance (Weight: 11%)
| Score | Criteria |
|---|---|
| 10 | p99 within budget, zero N+1, optimal caching, efficient resource usage |
| 8-9 | Good latency, no N+1, reasonable caching |
| 6-7 | Acceptable latency, minor inefficiencies |
| 4-5 | Notable bottlenecks, missing caching |
| 1-3 | Severe bottlenecks, resource leaks |
| 0 | Unresponsive or crashes under load |
Security (Weight: 18%)
| Score | Criteria |
|---|---|
| 10 | No vulnerabilities, all OWASP compliant, secure by design |
| 8-9 | No critical/high, all OWASP, excellent practices |
| 6-7 | No critical, 1-2 high, most OWASP compliant |
| 4-5 | No critical, 3-5 high, some gaps |
| 1-3 | 1+ critical or many high vulnerabilities |
| 0 | Multiple critical, secrets exposed |
Scalability (Weight: 9%)
| Score | Criteria |
|---|---|
| 10 | Horizontal scaling ready, stateless design, efficient data patterns |
| 8-9 | Good scaling patterns, minor bottlenecks |
| 6-7 | Scales for current needs, some concerns |
| 4-5 | Will hit limits soon, needs rework |
| 1-3 | Single-instance only, monolithic state |
| 0 | Cannot handle production load |
Testability (Weight: 12%)
| Score | Criteria |
|---|---|
| 10 | >= 90% coverage, meaningful assertions, edge cases, no flaky tests |
| 8-9 | >= 80% coverage, good assertions, critical paths |
| 6-7 | >= 70% coverage (target), basic assertions |
| 4-5 | 50-69% coverage |
| 1-3 | 30-49% coverage |
| 0 | < 30% coverage or tests fail to run |
Compliance (Weight: 12%)
| Score | Criteria |
|---|---|
| 10 | Perfect REST/UI contracts, RFC 9457 errors, full Zod, WCAG AA |
| 8-9 | Good conventions, proper validation, accessibility |
| 6-7 | Acceptable patterns, minor inconsistencies |
| 4-5 | Several convention violations |
| 1-3 | Poor API/UI design, missing validation |
| 0 | Broken contracts or inaccessible |
Visual (Weight: 10%)
| Score | Criteria |
|---|---|
| 10 | Pixel-perfect layout, full a11y, complete content, responsive |
| 8-9 | Good layout, minor visual issues, WCAG AA |
| 6-7 | Acceptable layout, some a11y gaps |
| 4-5 | Layout issues, missing content, a11y problems |
| 1-3 | Broken layout, major content missing |
| 0 | Page fails to render |
Note: Visual weight is 0.00 for API-only projects — redistributed proportionally. See Quality Model.
Grade Interpretation
<!-- Canonical source: quality-model.md — keep in sync -->
| Composite | Grade | Verdict |
|---|---|---|
| 9.0-10.0 | A+ | EXCELLENT |
| 8.0-8.9 | A | READY FOR MERGE |
| 7.0-7.9 | B | READY FOR MERGE |
| 6.0-6.9 | C | IMPROVEMENTS RECOMMENDED |
| 5.0-5.9 | D | IMPROVEMENTS RECOMMENDED |
| 0.0-4.9 | F | BLOCKED |
Orchestration Mode
<!-- SHARED: keep in sync with ../../../assess/references/orchestration-mode.md -->
Orchestration Mode Selection
Shared logic for choosing between Agent Teams and Task tool orchestration in assess/verify skills.
Environment Check
# Agent Teams is GA since CC 2.1.33
import os
force_task_tool = os.environ.get("ORCHESTKIT_FORCE_TASK_TOOL") == "1"
if force_task_tool:
mode = "task_tool"
else:
# Teams available by default — use for full multi-dimensional work
mode = "agent_teams" if scope == "full" else "task_tool"Decision Rules
- Full assessment/verification scope --> Agent Teams mode (GA since CC 2.1.33)
- Quick/single-dimension scope --> Task tool mode
ORCHESTKIT_FORCE_TASK_TOOL=1--> Task tool (override)
Agent Teams vs Task Tool
| Aspect | Task Tool (Star) | Agent Teams (Mesh) |
|---|---|---|
| Topology | All agents report to lead | Agents communicate with each other |
| Finding correlation | Lead cross-references after completion | Agents share findings in real-time |
| Cross-domain overlap | Independent scoring | Agents alert each other about overlapping concerns |
| Cost | ~200K tokens | ~500K tokens |
| Best for | Focused/single-dimension work | Full multi-dimensional assessment/verification |
Fallback
If Agent Teams encounters issues mid-execution, fall back to Task tool for remaining work. This is safe because both modes produce the same output format (dimensional scores 0-10).
Context Window Note
For full codebase work (>20 files), use the 1M context window to avoid agent context exhaustion. On 200K context, scope discovery should limit files to prevent overflow.
Policy As Code
Policy-as-Code
Define verification policies as machine-readable configuration.
Policy Structure
version: "1.0"
name: policy-name
description: What this policy enforces
thresholds:
composite_minimum: 6.0
coverage_minimum: 70
rules:
blockers: [] # Fail verification
warnings: [] # Note but continue
info: [] # Informational onlyRule Definition
Blocker Rules (Must Pass)
blockers:
- dimension: security
condition: below
value: 5.0
message: "Security score below minimum"
- check: critical_vulnerabilities
condition: above
value: 0
message: "Critical vulnerabilities found"
- check: type_errors
condition: above
value: 0
message: "TypeScript errors must be zero"Warning Rules (Should Fix)
warnings:
- dimension: code_quality
condition: below
value: 7.0
message: "Code quality could be improved"
- check: test_coverage
condition: below
value: 80
message: "Coverage below recommended 80%"Info Rules (Awareness)
info:
- check: todo_count
condition: above
value: 5
message: "Multiple TODOs found in code"Threshold Configuration
| Threshold | Type | Description |
|---|---|---|
| composite_minimum | float | Overall score minimum (0-10) |
| coverage_minimum | int | Test coverage percentage |
| critical_vulnerabilities | int | Max critical vulns (0) |
| high_vulnerabilities | int | Max high vulns |
| lint_errors | int | Max lint errors (0) |
| type_errors | int | Max type errors (0) |
Custom Rules
custom_rules:
- name: no_console_log
pattern: "console\\.log"
file_glob: "**/*.ts"
exclude: ["**/*.test.ts"]
severity: warning
message: "Remove console.log from production"Policy Location
Store at: .claude/policies/verification-policy.yaml
Multiple policies: .claude/policies/\{name\}-policy.yaml
Quality Model
Quality Model (verify)
Extends the unified scoring framework with Visual as the 8th dimension.
Canonical source:
quality-gates/references/unified-scoring-framework.mdLoad:Read("$\{CLAUDE_PLUGIN_ROOT\}/skills/quality-gates/references/unified-scoring-framework.md")
verify-Specific Extensions
Visual Dimension (8th)
| Dimension | Weight | What It Measures |
|---|---|---|
| Visual | 0.10 | Layout correctness, a11y, content completeness, responsiveness |
When Visual is active, base dimensions scale: adjusted = base_weight * (1.0 / 1.10).
When Visual is skipped (API-only), base weights stay at 1.00.
Dimensions Used (with Visual)
| Dimension | Adjusted Weight |
|---|---|
| Correctness | 0.14 |
| Maintainability | 0.14 |
| Performance | 0.11 |
| Security | 0.18 |
| Scalability | 0.09 |
| Testability | 0.12 |
| Compliance | 0.12 |
| Visual | 0.10 |
See unified framework for grade thresholds, improvement prioritization, effort/impact scales, and blocking rules.
Report Template
Verification Report Template
Copy this template and fill in results from parallel agent verification.
Quick Copy Template
# Feature Verification Report
**Date**: [TODAY'S DATE]
**Branch**: [branch-name]
**Feature**: [feature description]
**Reviewer**: Claude Code with 5 parallel subagents
**Verification Duration**: [X minutes]
---
## Summary
**Status**: [READY FOR MERGE | NEEDS ATTENTION | BLOCKED]
[1-2 sentence summary of verification results]
---
## Agent Results
### 1. Code Quality (code-quality-reviewer)
| Check | Tool | Exit Code | Errors | Warnings | Status |
|-------|------|-----------|--------|----------|--------|
| Backend Lint | Ruff | 0/1 | N | N | PASS/FAIL |
| Backend Types | ty | 0/1 | N | N | PASS/FAIL |
| Frontend Lint | Biome | 0/1 | N | N | PASS/FAIL |
| Frontend Types | tsc | 0/1 | N | N | PASS/FAIL |
**Pattern Compliance:**
- [ ] No `console.log` in production code
- [ ] No `any` types in TypeScript
- [ ] Exhaustive switches with `assertNever`
- [ ] SOLID principles followed
- [ ] Cyclomatic complexity < 10
**Findings:**
- [List any pattern violations]
---
### 2. Security Audit (security-auditor)
| Check | Tool | Critical | High | Medium | Low | Status |
|-------|------|----------|------|--------|-----|--------|
| JS Dependencies | npm audit | N | N | N | N | PASS/BLOCK |
| Python Dependencies | pip-audit | N | N | N | N | PASS/BLOCK |
| Secrets Scan | grep/gitleaks | N/A | N/A | N/A | N | PASS/BLOCK |
**OWASP Top 10 Compliance:**
- [ ] A01: Broken Access Control
- [ ] A02: Cryptographic Failures
- [ ] A03: Injection
- [ ] A04: Insecure Design
- [ ] A05: Security Misconfiguration
- [ ] A06: Vulnerable Components
- [ ] A07: Auth Failures
- [ ] A08: Data Integrity Failures
- [ ] A09: Logging Failures
- [ ] A10: SSRF
**Findings:**
- [List any security issues]
---
### 3. Test Coverage (test-generator)
| Suite | Total | Passed | Failed | Skipped | Coverage | Target | Status |
|-------|-------|--------|--------|---------|----------|--------|--------|
| Backend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| Backend Integration | N | N | N | N | X% | 70% | PASS/FAIL |
| Frontend Unit | N | N | N | N | X% | 70% | PASS/FAIL |
| E2E | N | N | N | N | N/A | N/A | PASS/FAIL |
**Test Quality:**
- [ ] Meaningful assertions (not just `assert result`)
- [ ] Edge cases covered (empty, error, timeout)
- [ ] No flaky tests (no sleep, no timing deps)
- [ ] MSW used for API mocking (not jest.mock)
**Coverage Gaps:**
- [List uncovered critical paths]
---
### 4. API Compliance (backend-system-architect)
| Check | Compliant | Issues |
|-------|-----------|--------|
| REST Conventions | Yes/No | [details] |
| Pydantic v2 Validation | Yes/No | [details] |
| RFC 9457 Error Handling | Yes/No | [details] |
| Async Timeout Protection | Yes/No | [details] |
| No N+1 Queries | Yes/No | [details] |
**Findings:**
- [List any API compliance issues]
---
### 5. UI Compliance (frontend-ui-developer)
| Check | Compliant | Issues |
|-------|-----------|--------|
| React 19 APIs (useOptimistic, useFormStatus, use()) | Yes/No | [details] |
| Zod Validation on API Responses | Yes/No | [details] |
| Exhaustive Type Checking | Yes/No | [details] |
| Skeleton Loading States | Yes/No | [details] |
| Prefetching on Navigation | Yes/No | [details] |
| WCAG 2.1 AA Accessibility | Yes/No | [details] |
**Findings:**
- [List any UI compliance issues]
---
## Quality Gates Summary
| Gate | Required | Actual | Status |
|------|----------|--------|--------|
| Test Coverage | >= 70% | X% | PASS/FAIL |
| Security Critical | 0 | N | PASS/FAIL |
| Security High | <= 5 | N | PASS/FAIL |
| Type Errors | 0 | N | PASS/FAIL |
| Lint Errors | 0 | N | PASS/FAIL |
**Overall Gate Status**: [ALL PASS | SOME FAIL]
---
## Blockers (Must Fix Before Merge)
1. [Blocker description with file:line reference]
2. [Blocker description with file:line reference]
---
## Suggestions (Non-Blocking)
1. [Suggestion for improvement]
2. [Suggestion for improvement]
---
## Visual Verification
**Visual Score: [N.N]/10**
| Route | Screenshot | AI Score | Issues | Status |
|-------|-----------|----------|--------|--------|
| / | [thumbnail] | N.N/10 | N | PASS/WARN/FAIL |
| /dashboard | [thumbnail] | N.N/10 | N | PASS/WARN/FAIL |
| /settings | [thumbnail] | N.N/10 | N | PASS/WARN/FAIL |
**Gallery**: Open `verification-output/{timestamp}/gallery.html` for full screenshots with AI evaluations.
### Agentation Annotations (if applicable)
| Annotation | Route | Resolution | Before/After |
|-----------|-------|------------|--------------|
| [user comment] | /dashboard | [fix summary] | [see gallery] |
---
## Evidence Artifacts
| Artifact | Location | Generated |
|----------|----------|-----------|
| Test Results | `/tmp/test_results.log` | [timestamp] |
| Coverage Report | `/tmp/coverage.json` | [timestamp] |
| Security Scan | `/tmp/security_audit.json` | [timestamp] |
| Lint Report | `/tmp/lint_results.log` | [timestamp] |
| Visual Gallery | `verification-output/{timestamp}/gallery.html` | [timestamp] |
| Screenshots | `verification-output/{timestamp}/screenshots/` | [timestamp] |
| AI Evaluations | `verification-output/{timestamp}/ai-evaluations/` | [timestamp] |
---
## Verification Metadata
- **Agents Used**: 7 (code-quality-reviewer, security-auditor, test-generator, backend-system-architect, frontend-ui-developer, python-performance-engineer, visual-capture)
- **Parallel Execution**: Yes
- **Total Tool Calls**: ~N
- **Context Usage**: ~N tokensStatus Definitions
| Status | Emoji | Meaning | Action Required |
|---|---|---|---|
| READY FOR MERGE | Green | All checks pass, no blockers | Approve PR |
| NEEDS ATTENTION | Yellow | Minor issues found | Review suggestions, optionally fix |
| BLOCKED | Red | Critical issues found | Must fix before merge |
Severity Levels
| Level | Threshold | Action | Blocks Merge |
|---|---|---|---|
| Critical | Any | Fix immediately | YES |
| High | > 5 | Fix before merge | YES |
| Medium | > 20 | Should fix | NO (with justification) |
| Low | > 50 | Nice to have | NO |
| Info | N/A | Informational | NO |
Agent Output JSON Schemas
code-quality-reviewer Output
{
"linting": {"tool": "ruff|biome", "exit_code": 0, "errors": 0, "warnings": 0},
"type_check": {"tool": "ty|tsc", "exit_code": 0, "errors": 0},
"patterns": {"violations": [], "compliance": "PASS|FAIL"},
"approval": {"status": "APPROVED|NEEDS_FIXES", "blockers": []}
}security-auditor Output
{
"scan_summary": {"files_scanned": 100, "vulnerabilities_found": 0},
"critical": [],
"high": [],
"secrets_detected": [],
"recommendations": [],
"approval": {"status": "PASS|BLOCK", "blockers": []}
}test-generator Output
{
"coverage": {"current": 85, "target": 70, "passed": true},
"test_summary": {"total": 100, "passed": 98, "failed": 2, "skipped": 0},
"gaps": ["file:line - reason"],
"quality_issues": [],
"approval": {"status": "PASS|FAIL", "blockers": []}
}backend-system-architect Output
{
"api_compliance": {"rest_conventions": true, "issues": []},
"validation": {"pydantic_v2": true, "issues": []},
"error_handling": {"rfc9457": true, "issues": []},
"async_safety": {"timeouts": true, "issues": []},
"approval": {"status": "PASS|FAIL", "blockers": []}
}frontend-ui-developer Output
{
"react_19": {"apis_used": ["useOptimistic"], "missing": [], "compliant": true},
"zod_validation": {"validated_endpoints": 10, "unvalidated": []},
"type_safety": {"exhaustive_switches": true, "any_types": 0},
"ux_patterns": {"skeletons": true, "prefetching": true},
"accessibility": {"wcag_issues": []},
"approval": {"status": "PASS|FAIL", "blockers": []}
}Verification Checklist
Verification Checklist
Pre-flight checklist for comprehensive feature verification with parallel agents.
Pre-Verification Setup
Context Gathering
- Run
git diff main --statto understand change scope - Run
git log main..HEAD --onelineto see commit history - Identify affected domains (backend/frontend/both)
- Check for any existing failing tests
Task Creation (CC 2.1.16)
- Create parent verification task
- Create subtasks for each agent domain
- Set proper dependencies if needed
Agent Dispatch Checklist
Required Agents (Full-Stack)
| Agent | Launched | Completed | Status |
|---|---|---|---|
| code-quality-reviewer | [ ] | [ ] | Pending |
| security-auditor | [ ] | [ ] | Pending |
| test-generator | [ ] | [ ] | Pending |
| backend-system-architect | [ ] | [ ] | Pending |
| frontend-ui-developer | [ ] | [ ] | Pending |
Optional Agents (Add as Needed)
| Condition | Agent | Launched |
|---|---|---|
| AI/ML features | llm-integrator | [ ] |
| Performance-critical | frontend-performance-engineer | [ ] |
| Database changes | database-engineer | [ ] |
Quality Gate Checklist
Mandatory Gates
| Gate | Threshold | Actual | Pass |
|---|---|---|---|
| Test Coverage | >= 70% | ___% | [ ] |
| Security Critical | 0 | ___ | [ ] |
| Security High | <= 5 | ___ | [ ] |
| Type Errors | 0 | ___ | [ ] |
| Lint Errors | 0 | ___ | [ ] |
Code Quality Gates
| Check | Status |
|---|---|
| No console.log in production | [ ] |
No any types | [ ] |
| Exhaustive switches (assertNever) | [ ] |
| Proper error handling | [ ] |
| No hardcoded secrets | [ ] |
Frontend-Specific Gates (if applicable)
| Check | Status |
|---|---|
| React 19 APIs used | [ ] |
| Zod validation on API responses | [ ] |
| Skeleton loading states | [ ] |
| Prefetching on links | [ ] |
| WCAG 2.1 AA compliance | [ ] |
Backend-Specific Gates (if applicable)
| Check | Status |
|---|---|
| REST conventions followed | [ ] |
| Pydantic v2 validation | [ ] |
| RFC 9457 error handling | [ ] |
| Async timeout protection | [ ] |
| No N+1 queries | [ ] |
Evidence Collection
Required Evidence
- Test results with exit code
- Coverage report (JSON format)
- Linting results
- Type checking results
- Security scan results
Optional Evidence
- E2E test screenshots
- Performance benchmarks
- Bundle size analysis
- Accessibility audit
Report Generation
Report Sections
- Summary (READY/NEEDS ATTENTION/BLOCKED)
- Agent Results (all 5 domains)
- Quality Gates table
- Blockers list (if any)
- Suggestions list
- Evidence links
Final Steps
- Update all task statuses to completed
- Store verification evidence in context
- Generate final report markdown
Quick Reference: Agent Prompts
code-quality-reviewer
Focus: Lint, type check, anti-patterns, SOLID, complexity
security-auditor
Focus: Dependency audit, secrets, OWASP Top 10, rate limiting
test-generator
Focus: Coverage gaps, test quality, edge cases, flaky tests
backend-system-architect
Focus: REST, Pydantic v2, RFC 9457, async safety, N+1
frontend-ui-developer
Focus: React 19, Zod, exhaustive types, skeletons, prefetch, a11y
Troubleshooting
Agent Not Responding
- Check if agent was launched with
run_in_background=True - Verify agent name matches exactly
- Check for context window limits
Tests Failing
- Run tests locally first
- Check for missing dependencies
- Verify test database state
- Look for timing-dependent tests
Coverage Below Threshold
- Identify uncovered files
- Check for excluded patterns
- Focus on critical paths first
Verification Phases
Verification Phases — Detailed Workflow
Phase Overview
| Phase | Activities | Output |
|---|---|---|
| 1. Context Gathering | Git diff, commit history | Changes summary |
| 2. Parallel Agent Dispatch | 6 agents evaluate | 0-10 scores |
| 2.5 Visual Capture | Screenshot routes, AI vision eval | Gallery + visual score |
| 3. Test Execution | Backend + frontend tests | Coverage data |
| 4. Nuanced Grading | Composite score calculation | Grade (A-F) |
| 5. Improvement Suggestions | Effort vs impact analysis | Prioritized list |
| 6. Alternative Comparison | Compare approaches (optional) | Recommendation |
| 7. Metrics Tracking | Trend analysis | Historical data |
| 8. Report Compilation | Evidence artifacts + gallery.html | Final report |
| 8.5 Agentation Loop | User annotates, ui-feedback fixes | Before/after diffs |
Phase 2: Parallel Agent Dispatch (6 Agents)
Launch ALL agents in ONE message with run_in_background=True and max_turns=25. Pass model=MODEL_OVERRIDE when user specifies --model=opus (CC 2.1.72).
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Lint, types, patterns | Quality 0-10 |
| security-auditor | OWASP, secrets, CVEs | Security 0-10 |
| test-generator | Coverage, test quality | Coverage 0-10 |
| backend-system-architect | API design, async | API 0-10 |
| frontend-ui-developer | React 19, Zod, a11y | UI 0-10 |
| python-performance-engineer | Latency, resources, scaling | Performance 0-10 |
Use python-performance-engineer for backend-focused verification or frontend-performance-engineer for frontend-focused verification. See Quality Model for Performance (0.11) and Scalability (0.09) weights.
See Grading Rubric for detailed scoring criteria.
Task Tool Mode (Default)
# PARALLEL — All 6 in ONE message
Agent(
subagent_type="code-quality-reviewer",
model=MODEL_OVERRIDE, # None inherits default; "opus" for thorough verification (CC 2.1.72)
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify code quality. Score 0-10.
Check: lint errors, type coverage, cyclomatic complexity, DRY, SOLID.
Budget: 15 tool calls max.
Return: score (0-10), reasoning, evidence, 2-3 improvement suggestions.
Feature: {feature}. Scope: ONLY review files in {scope_files}.""",
run_in_background=True, max_turns=25
)
Agent(
subagent_type="security-auditor",
model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Security verification. Score 0-10.
Check: OWASP Top 10, secrets in code, dependency CVEs, auth patterns.
Budget: 15 tool calls max.
Return: score (0-10), vulnerabilities found, severity ratings.
Feature: {feature}. Scope: ONLY review files in {scope_files}.""",
run_in_background=True, max_turns=25
)
Agent(
subagent_type="test-generator",
model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify test coverage. Score 0-10.
Check: test existence, type matching, quality, edge cases, coverage %.
Run existing tests and report results.
Budget: 15 tool calls max.
Return: score (0-10), coverage %, gaps identified.
Feature: {feature}. Scope: ONLY review files in {scope_files}.""",
run_in_background=True, max_turns=25
)
Agent(
subagent_type="backend-system-architect",
model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify API design and backend patterns. Score 0-10.
Check: REST conventions, async patterns, transaction boundaries, error handling.
Budget: 15 tool calls max.
Return: score (0-10), pattern compliance, issues found.
Feature: {feature}. Scope: ONLY review files in {scope_files}.""",
run_in_background=True, max_turns=25
)
Agent(
subagent_type="frontend-ui-developer",
model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify frontend implementation. Score 0-10.
Check: React 19 patterns, Zod schemas, accessibility (WCAG 2.1 AA), loading states.
Budget: 15 tool calls max.
Return: score (0-10), pattern compliance, a11y issues.
Feature: {feature}. Scope: ONLY review files in {scope_files}.""",
run_in_background=True, max_turns=25
)
Agent(
subagent_type="python-performance-engineer",
model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify performance and scalability. Score 0-10.
Check: latency hotspots, N+1 queries, resource usage, caching, scaling patterns.
Budget: 15 tool calls max.
Return: score (0-10), bottlenecks found, optimization suggestions.
Feature: {feature}. Scope: ONLY review files in {scope_files}.""",
run_in_background=True, max_turns=25
)Agent Teams Alternative
In Agent Teams mode, form a verification team where agents share findings and coordinate scoring:
TeamCreate(team_name="verify-{feature}", description="Verify {feature}")
Agent(subagent_type="code-quality-reviewer", name="quality-verifier",
team_name="verify-{feature}", model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify code quality. Score 0-10.
When you find patterns that affect security, message security-verifier.
When you find untested code paths, message test-verifier.
Share your quality score with all teammates for composite calculation.
Feature: {feature}.""")
Agent(subagent_type="security-auditor", name="security-verifier",
team_name="verify-{feature}", model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Security verification. Score 0-10.
When quality-verifier flags security-relevant patterns, investigate deeper.
When you find vulnerabilities in API endpoints, message api-verifier.
Share severity findings with test-verifier for test gap analysis.
Feature: {feature}.""")
Agent(subagent_type="test-generator", name="test-verifier",
team_name="verify-{feature}", model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify test coverage. Score 0-10.
When quality-verifier or security-verifier flag untested paths, quantify the gap.
Run existing tests and report coverage metrics.
Message the lead with coverage data for composite scoring.
Feature: {feature}.""")
Agent(subagent_type="backend-system-architect", name="api-verifier",
team_name="verify-{feature}", model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify API design and backend patterns. Score 0-10.
When security-verifier flags endpoint issues, validate and score.
Share API compliance findings with ui-verifier for consistency check.
Feature: {feature}.""")
Agent(subagent_type="frontend-ui-developer", name="ui-verifier",
team_name="verify-{feature}", model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify frontend implementation. Score 0-10.
When api-verifier shares API patterns, verify frontend matches.
Check React 19 patterns, accessibility, and loading states.
Share findings with quality-verifier for overall assessment.
Feature: {feature}.""")
# Conditional 6th agent — use python-performance-engineer for backend,
# frontend-performance-engineer for frontend
Agent(subagent_type="python-performance-engineer", name="perf-verifier",
team_name="verify-{feature}", model=MODEL_OVERRIDE,
prompt="""# Cache-optimized: stable content first (CC 2.1.72)
Verify performance and scalability. Score 0-10.
Assess latency, resource usage, caching, and scaling patterns.
When security-verifier flags resource-intensive endpoints, profile them.
Share performance findings with api-verifier and quality-verifier.
Feature: {feature}.""")Team teardown after report compilation:
# After composite grading and report generation
SendMessage(type="shutdown_request", recipient="quality-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="security-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="test-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="api-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="ui-verifier", content="Verification complete")
SendMessage(type="shutdown_request", recipient="perf-verifier", content="Verification complete")
TeamDelete()
# Worktree cleanup (CC 2.1.72)
ExitWorktree(action="keep")Fallback: If team formation fails, use standard Phase 2 Task spawns above.
Manual cleanup: If
TeamDelete()doesn't terminate all agents, pressCtrl+Ftwice to force-kill remaining background agents.
Phase 2.5: Visual Capture (Parallel with Phase 2)
Runs as a 7th parallel agent alongside the 6 verification agents. See Visual Capture for full details.
# Launch IN THE SAME MESSAGE as Phase 2 agents
Agent(
subagent_type="general-purpose",
description="Visual capture and AI evaluation",
prompt="""Visual verification capture for: {feature}
1. Detect project type from package.json
2. Start dev server (auto-detect framework)
3. Discover routes (framework-aware scan)
4. Use agent-browser to screenshot each route (max 20)
5. Read each screenshot PNG for AI vision evaluation
6. Score layout, accessibility, content completeness (0-10 per route)
7. Read gallery template from ${CLAUDE_SKILL_DIR}/assets/gallery-template.html
8. Generate gallery.html with base64-embedded screenshots
9. Write to verification-output/{timestamp}/gallery.html
10. Kill dev server
If no frontend detected, write skip notice and exit.
If server fails to start, write warning and exit.
Never block — graceful degradation only.""",
run_in_background=True, max_turns=30
)Output: verification-output/\{timestamp\}/ folder with screenshots, AI evaluations (JSON), and gallery.html.
Phase 8.5: Agentation Visual Feedback (Opt-In)
Trigger: Only when agentation MCP is configured in .mcp.json. Runs AFTER Phase 8 report compilation.
# Check agentation availability
ToolSearch(query="select:mcp__agentation__agentation_get_all_pending")
# If available, offer user choice
AskUserQuestion(questions=[{
"question": "Agentation detected. Annotate the live UI before finalizing?",
"header": "Visual Feedback Loop",
"options": [
{"label": "Yes", "description": "I'll mark issues, ui-feedback agent fixes them, gallery updates with before/after"},
{"label": "Skip", "description": "Finalize with current screenshots"}
]
}])
# If yes: watch → acknowledge → dispatch ui-feedback → re-screenshot → update gallery
# Max 3 rounds (configurable in verification-config.yaml)Phase 4: Nuanced Grading
See Quality Model for scoring dimensions, weights, and grade interpretation. See Grading Rubric for detailed per-agent scoring criteria.
Phase 5: Improvement Suggestions
Each suggestion includes effort (1-5) and impact (1-5) with priority = impact/effort. See Quality Model for scale definitions and quick wins formula.
Phase 6: Alternative Comparison (Optional)
See Alternative Comparison for template.
Use when:
- Multiple valid approaches exist
- User asked "is this the best way?"
- Major architectural decisions made
Phase 8: Report Compilation
See Report Template for full format.
# Feature Verification Report
**Composite Score: [N.N]/10** (Grade: [LETTER])
## Top Improvement Suggestions
| # | Suggestion | Effort | Impact | Priority |
|---|------------|--------|--------|----------|
| 1 | [highest] | [N] | [N] | [N.N] |
## Verdict
**[READY FOR MERGE | IMPROVEMENTS RECOMMENDED | BLOCKED]**Visual Capture
Visual Capture — Phase 2.5
Visual verification that produces browsable screenshot evidence with AI evaluation.
Architecture
Phase 2 agents (parallel)
|
Phase 2.5 (runs IN PARALLEL with Phase 2 agents)
|
v
┌─────────────────────────────────────────────────┐
│ 1. Detect project type (package.json scan) │
│ 2. Start dev server (framework-aware) │
│ 3. Wait for server ready (poll localhost) │
│ 4. Discover routes (framework-aware) │
│ 5. agent-browser: navigate + screenshot each │
│ 6. Claude vision: evaluate each screenshot │
│ 7. Generate gallery.html (self-contained) │
│ 8. Stop dev server │
└─────────────────────────────────────────────────┘Step 1: Project Type Detection
Scan codebase to determine framework and dev server command:
# PARALLEL — detect framework signals
Grep(pattern="\"next\":", glob="package.json", output_mode="content")
Grep(pattern="\"vite\":", glob="package.json", output_mode="content")
Grep(pattern="\"react-scripts\":", glob="package.json", output_mode="content")
Grep(pattern="\"vue\":", glob="package.json", output_mode="content")
Grep(pattern="\"nuxt\":", glob="package.json", output_mode="content")
Grep(pattern="\"@angular/core\":", glob="package.json", output_mode="content")
Glob(pattern="**/manage.py")
Glob(pattern="**/main.py")
Glob(pattern="**/app.py")
Glob(pattern="**/index.html")Detection Matrix
| Signal | Framework | Start Command | Default Port |
|---|---|---|---|
"next": in package.json | Next.js | npm run dev | 3000 |
"vite": in package.json | Vite | npm run dev | 5173 |
"react-scripts": | CRA | npm start | 3000 |
"vue": + no vite | Vue CLI | npm run serve | 8080 |
"nuxt": | Nuxt | npm run dev | 3000 |
"@angular/core": | Angular | npx ng serve | 4200 |
manage.py exists | Django | python manage.py runserver | 8000 |
main.py/app.py + FastAPI | FastAPI | uvicorn app:app | 8000 |
index.html only | Static | npx serve . | 3000 |
| None of the above | Skip visual capture | N/A | N/A |
Override via Config
If .claude/verification-config.yaml exists with a visual section, use those settings instead of auto-detection.
Step 2: Start Dev Server
Bash(
command=f"{start_command} &",
description="Start dev server for visual capture",
run_in_background=True
)Wait for server readiness:
Bash(command=f"for i in $(seq 1 30); do curl -s http://localhost:{port} > /dev/null && exit 0; sleep 1; done; exit 1",
description="Wait for dev server to be ready (max 30s)")If server fails to start: Skip visual capture with a warning in the report. Do NOT block verification.
Step 3: Route Discovery
Next.js App Router
Glob(pattern="**/app/**/page.{tsx,jsx,ts,js}")
# Extract route from file path: app/dashboard/page.tsx → /dashboardNext.js Pages Router
Glob(pattern="**/pages/**/*.{tsx,jsx,ts,js}")
# Exclude _app, _document, _error, api/
# Extract route: pages/about.tsx → /aboutReact Router
Grep(pattern="<Route.*path=[\"']([^\"']+)", glob="**/*.{tsx,jsx}", output_mode="content")FastAPI / Express
Grep(pattern="@(app|router)\\.(get|post)\\([\"'](/[^\"']*)", glob="**/*.py", output_mode="content")
Grep(pattern="(app|router)\\.(get|post)\\([\"'](/[^\"']*)", glob="**/*.{ts,js}", output_mode="content")Fallback
If no routes discovered, screenshot just the root URL: http://localhost:\{port\}/
Max Routes
Cap at 20 routes to keep gallery manageable and generation fast. Prioritize:
- Root
/ - Routes matching changed files (from Phase 1 git diff)
- Routes with most sub-routes (likely important sections)
Step 4: Screenshot Capture
Use agent-browser to navigate and screenshot each route:
# For each route:
# 1. Navigate
agent-browser navigate http://localhost:{port}{route_path}
# 2. Wait for content
agent-browser wait-for-network-idle
# 3. Capture
agent-browser screenshot --full-page --path verification-output/{timestamp}/screenshots/{idx}-{slug}.pngAuth-Protected Routes
If verification-config.yaml specifies auth:
# Login first
agent-browser navigate http://localhost:{port}/login
agent-browser fill "#email" "test@example.com"
agent-browser fill "#password" "test123"
agent-browser click "button[type=submit]"
agent-browser wait-for-navigation
# Then screenshot protected routesViewport Options
Default: 1280x720. If mobile: true in config, also capture at 375x812.
Step 5: AI Vision Evaluation
For each screenshot, use Claude's vision (Read tool on PNG) with a structured evaluation prompt:
Read(file_path=f"verification-output/{timestamp}/screenshots/{filename}")Then evaluate using this prompt template (include it in the visual capture agent's instructions):
Evaluate this screenshot of route "{route_path}" against these 6 criteria.
For EACH criterion, provide a severity (ok/warning/error) and specific observation.
Do NOT use generic "looks good" — cite what you actually see.
1. LAYOUT: Overflow, alignment, spacing, responsive grid. Check: content cut off? Overlapping elements? Scroll needed?
2. NAVIGATION: Is nav present and functional? Sidebar, breadcrumbs, TOC visible? Active state correct?
3. CONTENT: Text readable? Headings hierarchical? Data populated (not placeholder/loading)? Counts/numbers accurate?
4. ACCESSIBILITY: Contrast sufficient? Focus indicators visible? Text size adequate? Color-only information?
5. INTERACTIVITY: Buttons/links styled consistently? Hover/focus states? Forms labeled? CTAs discoverable?
6. BRANDING: Consistent with site theme? Dark/light mode correct? Typography matches design system?
Output as JSON array — exactly 6 items, one per criterion:
[{"severity": "ok|warning|error", "message": "CRITERION: specific observation with evidence"}]
Score 0-10 based on: 0 errors=9+, 1-2 warnings=7-8, errors=5-6, multiple errors=<5.Per-route evaluation output (6+ items, never a single line):
{
"route": "/dashboard",
"score": 7.5,
"evaluation": [
{"severity": "ok", "message": "LAYOUT: Content within viewport, no horizontal overflow, grid columns align properly"},
{"severity": "ok", "message": "NAVIGATION: Sidebar present with 8 sections, 'Dashboard' correctly highlighted as active"},
{"severity": "warning", "message": "CONTENT: Stats show '79 skills' but should be '89 skills' — stale count detected"},
{"severity": "ok", "message": "ACCESSIBILITY: Body text ~16px on dark bg (#e6edf3 on #0d1117), contrast ratio ~13:1, passes WCAG AAA"},
{"severity": "warning", "message": "INTERACTIVITY: Code block copy buttons present but no visible hover state change"},
{"severity": "ok", "message": "BRANDING: Dark theme consistent, green accent (#3fb950) used for active states, monospace for code"}
]
}Cross-Route Summary
After evaluating all routes, synthesize a summary object for the gallery:
# Build summary from all per-route evaluations
summary = {
"total_routes": len(routes),
"avg_score": round(sum(r.score for r in routes) / len(routes), 1),
"pass_count": len([r for r in routes if r.score >= 7]),
"warn_count": len([r for r in routes if 5 <= r.score < 7]),
"fail_count": len([r for r in routes if r.score < 5]),
"common_issues": [ # Issues appearing on 2+ routes
{"count": 3, "message": "Stale skill count (79 instead of 89) on 3/5 pages"},
{"count": 2, "message": "Code block copy buttons lack hover state feedback"}
],
"strengths": [ # Positive patterns across routes
"Consistent dark theme and typography across all pages",
"Sidebar navigation present and correctly highlights active page"
]
}Include this summary in GALLERY_JSON alongside routes.
Step 6: Gallery Generation
Read the gallery template:
Read(file_path="${CLAUDE_SKILL_DIR}/assets/gallery-template.html")Build the GALLERY_JSON data structure:
{
"branch": "feat/new-feature",
"date": "2026-03-10",
"timestamp": "2026-03-10T14:30:00Z",
"compositeScore": 8.2,
"visualScore": 7.8,
"routes": [
{
"id": "homepage",
"name": "Homepage",
"path": "/",
"screenshot": "data:image/png;base64,...",
"score": 8.5,
"evaluation": [
{"severity": "ok", "message": "Layout consistent"},
{"severity": "warning", "message": "Hero image loading slowly"}
],
"annotations": [],
"apiResponse": null
}
]
}Base64 encoding: Convert each PNG to base64 for self-contained HTML:
base64 -i screenshots/01-homepage.pngSize guard: If total HTML > 10MB, use maxDiffPixelRatio compression or reduce to top 10 routes.
Write the final gallery:
Write(file_path=f"verification-output/{timestamp}/gallery.html", content=rendered_html)Step 7: Cleanup
# Kill dev server
Bash(command="kill $(lsof -ti :PORT) 2>/dev/null || true", description="Stop dev server")Phase 8.5: Agentation Loop (Opt-In)
Trigger: Only when agentation MCP is configured in .mcp.json.
# Check if agentation is available
ToolSearch(query="select:mcp__agentation__agentation_get_all_pending")If available, offer the user:
AskUserQuestion(questions=[{
"question": "Agentation is configured. Want to annotate the UI before finalizing?",
"header": "Visual Feedback Loop",
"options": [
{"label": "Yes, let me annotate", "description": "I'll mark issues on the live UI, then ui-feedback agent fixes them"},
{"label": "Skip", "description": "Finalize gallery with current screenshots"}
]
}])If yes:
# 1. Watch for annotations
mcp__agentation__agentation_get_all_pending()
# 2. For each annotation:
mcp__agentation__agentation_acknowledge(annotationId=id)
# 3. Dispatch ui-feedback agent
Agent(subagent_type="ork:ui-feedback",
prompt="Process agentation annotation: {annotation}. Fix the issue, then resolve.",
run_in_background=True)
# 4. After fixes, re-screenshot affected routes
# 5. Save before/after pairs
# 6. Update gallery with annotation diffsMax Rounds
Default 3 rounds of annotate-fix-verify. Configurable in verification-config.yaml.
Graceful Degradation
| Failure | Behavior |
|---|---|
| No frontend detected | Skip visual capture, log info in report |
| Dev server won't start | Skip visual capture with warning |
| agent-browser unavailable | Skip screenshots, try curl for API-only |
| Screenshot fails on a route | Skip that route, continue with others |
| Base64 output too large | Compress or reduce route count |
| Agentation not configured | Skip Layer 2 entirely (no prompt) |
| Auth flow fails | Skip protected routes, screenshot public only |
Checklists (1)
Verification Checklist
Verification Checklist
Quick checklist for comprehensive feature verification.
Grading Complete
- All 5 dimensions rated (0-10 scale)
- Weights applied correctly (20/25/20/20/15)
- Composite score calculated
- Grade letter assigned (A+ to F)
Evidence Collected
- Test results with exit codes
- Coverage report (JSON)
- Security scan results
- Lint/type check output
- Evidence files linked in report
Improvements Documented
- Each suggestion has effort estimate (1-5)
- Each suggestion has impact estimate (1-5)
- Priority calculated (Impact / Effort)
- Quick wins identified (low effort, high impact)
Alternatives Considered
- Current approach scored
- At least one alternative evaluated
- Migration cost estimated
- Recommendation documented
Policy Compliance
- No blocking rule violations
- Warning rules acknowledged
- Thresholds checked (composite, security, coverage)
Report Generated
- All sections filled
- Verdict assigned (Ready/Recommended/Blocked)
- Tasks updated to completed
Validate Counts
Validates hook, skill, and agent counts are consistent across CLAUDE.md, hooks.json, manifests, and source directories. Use when counts may be stale after adding or removing components, before releases, or when CLAUDE.md Project Overview looks wrong.
Visualize Plan
Visualize planned changes before implementation. Use when reviewing plans, comparing before/after architecture, assessing risk, or analyzing execution order and impact.
Last updated on