Cover
Generate and run comprehensive test suites — unit tests, integration tests with real services (testcontainers/docker-compose), and Playwright E2E tests. Analyzes coverage gaps, spawns parallel test-generator agents per tier, runs tests, and heals failures (max 3 iterations). Use when generating tests for existing code, improving coverage after implementation, or creating a full test suite from scratch. Chains naturally after /ork:implement. Do NOT use for verifying/grading existing tests (use /ork:verify) or running tests without generation (use npm test directly).
Related Skills
- testing-unit
- testing-integration
- testing-e2e
- testing-perf
- testing-llm
- chain-patterns
- memory
- quality-gates
Cover — Test Suite Generator
Generate comprehensive test suites for existing code with real-service integration testing and automated failure healing.
Quick Start
/ork:cover authentication flow
/ork:cover --model=opus payment processing
/ork:cover --tier=unit,integration user service
/ork:cover --real-services checkout pipelineArgument Resolution
SCOPE = "$ARGUMENTS" # e.g., "authentication flow"
# Flag parsing
MODEL_OVERRIDE = None
TIERS = ["unit", "integration", "e2e"] # default: all three
REAL_SERVICES = False
for token in "$ARGUMENTS".split():
if token.startswith("--model="):
MODEL_OVERRIDE = token.split("=", 1)[1]
SCOPE = SCOPE.replace(token, "").strip()
elif token.startswith("--tier="):
TIERS = token.split("=", 1)[1].split(",")
SCOPE = SCOPE.replace(token, "").strip()
elif token == "--real-services":
REAL_SERVICES = True
SCOPE = SCOPE.replace(token, "").strip()Step -1: MCP Probe + Resume Check
# Probe MCPs (parallel):
ToolSearch(query="select:mcp__memory__search_nodes")
ToolSearch(query="select:mcp__context7__resolve-library-id")
Write(".claude/chain/capabilities.json", {
"memory": <true if found>,
"context7": <true if found>,
"skill": "cover",
"timestamp": now()
})
# Resume check:
Read(".claude/chain/state.json")
# If exists and skill == "cover": resume from current_phase
# Otherwise: initialize stateStep 0: Scope & Tier Selection
AskUserQuestion(
questions=[
{
"question": "What test tiers should I generate?",
"header": "Test Tiers",
"options": [
{"label": "Full coverage (Recommended)", "description": "Unit + Integration (real services) + E2E", "markdown": "```\nFull Coverage\n─────────────\n Unit Integration E2E\n ┌─────────┐ ┌─────────────┐ ┌──────────┐\n │ AAA │ │ Real DB │ │Playwright│\n │ Mocks │ │ Real APIs │ │Page obj │\n │ Factory │ │ Testcontain │ │A11y │\n └─────────┘ └─────────────┘ └──────────┘\n 3 parallel test-generator agents\n```"},
{"label": "Unit + Integration", "description": "Skip E2E, focus on logic and service boundaries", "markdown": "```\nUnit + Integration\n──────────────────\n Unit tests for business logic\n Integration tests at API boundaries\n Real services if docker-compose found\n Skip: browser automation\n```"},
{"label": "Unit only", "description": "Fast isolated tests for business logic", "markdown": "```\nUnit Only (~2 min)\n──────────────────\n AAA pattern tests\n MSW/VCR mocking\n Factory-based data\n Coverage gap analysis\n Skip: real services, browser\n```"},
{"label": "Integration only", "description": "API boundary and real-service tests", "markdown": "```\nIntegration Only\n────────────────\n API endpoint tests (Supertest/httpx)\n Database tests (real or in-memory)\n Contract tests (Pact)\n Testcontainers if available\n```"},
{"label": "E2E only", "description": "Playwright browser tests", "markdown": "```\nE2E Only\n────────\n Playwright page objects\n User flow tests\n Visual regression\n Accessibility (axe-core)\n```"}
],
"multiSelect": false
},
{
"question": "Healing strategy for failing tests?",
"header": "Failure Handling",
"options": [
{"label": "Auto-heal (Recommended)", "description": "Fix failing tests up to 3 iterations"},
{"label": "Generate only", "description": "Write tests, report failures, don't fix"},
{"label": "Strict", "description": "All tests must pass or abort"}
],
"multiSelect": false
}
]
)Override TIERS based on selection. Skip this step if --tier= flag was provided.
Task Management (MANDATORY)
TaskCreate(
subject=f"Cover: {SCOPE}",
description="Generate comprehensive test suite with real-service testing",
activeForm=f"Generating tests for {SCOPE}"
)
# Subtasks per phase
TaskCreate(subject="Discover scope and detect frameworks", activeForm="Discovering test scope")
TaskCreate(subject="Analyze coverage gaps", activeForm="Analyzing coverage gaps")
TaskCreate(subject="Generate tests (parallel per tier)", activeForm="Generating tests")
TaskCreate(subject="Execute generated tests", activeForm="Running tests")
TaskCreate(subject="Heal failing tests", activeForm="Healing test failures")
TaskCreate(subject="Generate coverage report", activeForm="Generating report")6-Phase Workflow
| Phase | Activities | Output |
|---|---|---|
| 1. Discovery | Detect frameworks, scan scope, find untested code | Framework map, file list |
| 2. Coverage Analysis | Run existing tests, map gaps per tier | Coverage baseline, gap map |
| 3. Generation | Parallel test-generator agents per tier | Test files created |
| 4. Execution | Run all generated tests | Pass/fail results |
| 5. Heal | Fix failures, re-run (max 3 iterations) | Green test suite |
| 6. Report | Coverage delta, test count, summary | Coverage report |
Phase Handoffs
| After Phase | Handoff File | Key Outputs |
|---|---|---|
| 1. Discovery | 01-cover-discovery.json | Frameworks, scope files, tier plan |
| 2. Analysis | 02-cover-analysis.json | Baseline coverage, gap map |
| 3. Generation | 03-cover-generation.json | Files created, test count per tier |
| 5. Heal | 05-cover-healed.json | Final pass/fail, iterations used |
Phase 1: Discovery
Detect the project's test infrastructure and scope the work.
# PARALLEL — all in ONE message:
# 1. Framework detection (hook handles this, but also scan manually)
Grep(pattern="vitest|jest|mocha|playwright|cypress", glob="package.json", output_mode="content")
Grep(pattern="pytest|unittest|hypothesis", glob="pyproject.toml", output_mode="content")
Grep(pattern="pytest|unittest|hypothesis", glob="requirements*.txt", output_mode="content")
# 2. Real-service infrastructure
Glob(pattern="**/docker-compose*.yml")
Glob(pattern="**/testcontainers*")
Grep(pattern="testcontainers", glob="**/package.json", output_mode="content")
Grep(pattern="testcontainers", glob="**/requirements*.txt", output_mode="content")
# 3. Existing test structure
Glob(pattern="**/tests/**/*.test.*")
Glob(pattern="**/tests/**/*.spec.*")
Glob(pattern="**/__tests__/**/*")
Glob(pattern="**/test_*.py")
# 4. Scope files (what to test)
# If SCOPE specified, find matching source files
Grep(pattern=SCOPE, output_mode="files_with_matches")Real-service decision:
docker-compose*.ymlfound → integration tests use real servicestestcontainersin deps → use testcontainers for isolated service instances- Neither found +
--real-servicesflag → error: "No docker-compose or testcontainers found. Install testcontainers or remove --real-services flag." - Neither found, no flag → integration tests use mocks (MSW/VCR)
Load real-service detection details: Read("$\{CLAUDE_SKILL_DIR\}/references/real-service-detection.md")
Phase 2: Coverage Analysis
Run existing tests and identify gaps.
# Detect and run coverage command
# TypeScript: npx vitest run --coverage --reporter=json
# Python: pytest --cov=<scope> --cov-report=json
# Go: go test -coverprofile=coverage.out ./...
# Parse coverage output to identify:
# 1. Files with 0% coverage (priority targets)
# 2. Files below threshold (default 70%)
# 3. Uncovered functions/methods
# 4. Untested edge cases (error paths, boundary conditions)Output coverage baseline to user immediately (progressive output).
Phase 3: Generation (Parallel Agents)
Spawn test-generator agents per tier. Launch ALL in ONE message with run_in_background=true.
# Unit tests agent
if "unit" in TIERS:
Agent(
subagent_type="test-generator",
prompt=f"""Generate unit tests for: {SCOPE}
Coverage gaps: {gap_map.unit_gaps}
Framework: {detected_framework}
Existing tests: {existing_test_files}
Focus on:
- AAA pattern (Arrange-Act-Assert)
- Parametrized tests for multiple inputs
- MSW/VCR for HTTP mocking (never mock fetch directly)
- Factory-based test data (FactoryBoy/faker-js)
- Edge cases: empty input, errors, timeouts, boundary values
- Target: 90%+ business logic coverage""",
isolation="worktree",
run_in_background=True,
max_turns=50,
model=MODEL_OVERRIDE
)
# Integration tests agent
if "integration" in TIERS:
Agent(
subagent_type="test-generator",
prompt=f"""Generate integration tests for: {SCOPE}
Coverage gaps: {gap_map.integration_gaps}
Framework: {detected_framework}
Real services available: {real_service_infra}
Focus on:
- API endpoint tests (Supertest/httpx)
- Database tests with {'real DB via testcontainers/docker-compose' if real_services else 'in-memory/mocked DB'}
- Contract tests (Pact) for service boundaries
- Zod/Pydantic schema validation at edges
- Fresh state per test (transaction rollback or cleanup)
- Target: all API endpoints and service boundaries""",
isolation="worktree",
run_in_background=True,
max_turns=50,
model=MODEL_OVERRIDE
)
# E2E tests agent
if "e2e" in TIERS:
Agent(
subagent_type="test-generator",
prompt=f"""Generate E2E tests for: {SCOPE}
Framework: Playwright
Routes/pages: {discovered_routes}
Focus on:
- Semantic locators (getByRole > getByLabel > getByTestId)
- Page Object Model for complex pages
- User flow tests (happy path + error paths)
- Accessibility tests (axe-core WCAG 2.2 AA)
- Visual regression (toHaveScreenshot)
- No hardcoded waits (use auto-wait)""",
isolation="worktree",
run_in_background=True,
max_turns=50,
model=MODEL_OVERRIDE
)Output each agent's results as soon as it returns — don't wait for all agents. This lets users see generated tests incrementally.
Partial results (CC 2.1.76): If an agent is killed (timeout, context limit), its response is tagged
[PARTIAL RESULT]. Include partial tests but flag them in Phase 4.
Phase 4: Execution
Run all generated tests and collect results.
# Run test commands per tier (PARALLEL if independent):
# Unit: npx vitest run tests/unit/ OR pytest tests/unit/
# Integration: npx vitest run tests/integration/ OR pytest tests/integration/
# E2E: npx playwright test
# Collect: pass count, fail count, error details, coverage deltaPhase 5: Heal Loop
Fix failing tests iteratively. Max 3 iterations to prevent infinite loops.
for iteration in range(3):
if all_tests_pass:
break
# For each failing test:
# 1. Read the test file and the source code it tests
# 2. Analyze the failure (assertion error? import error? timeout?)
# 3. Fix the test (not the source code — tests only)
# 4. Re-run the fixed tests
# Common fixes:
# - Wrong assertions (expected value mismatch)
# - Missing imports or setup
# - Stale selectors in E2E tests
# - Race conditions (add proper waits)
# - Mock configuration errorsLoad heal strategy details: Read("$\{CLAUDE_SKILL_DIR\}/references/heal-loop-strategy.md")
Boundary: heal fixes TESTS, not source code. If a test fails because the source code has a bug, report it — don't silently fix production code.
Phase 6: Report
Generate coverage report with before/after comparison.
Coverage Report: {SCOPE}
═══════════════════════════
Baseline → After
────────────────
Unit: 67.2% → 91.3% (+24.1%)
Integration: 42.0% → 78.5% (+36.5%)
E2E: 0.0% → 65.0% (+65.0%)
Overall: 48.4% → 82.1% (+33.7%)
Tests Generated
───────────────
Unit: 23 tests (18 pass, 5 healed)
Integration: 12 tests (10 pass, 2 healed)
E2E: 8 tests (8 pass)
Total: 43 tests
Heal Iterations: 2/3
Files Created
─────────────
tests/unit/services/test_auth.py
tests/unit/services/test_payment.py
tests/integration/api/test_users.py
tests/integration/api/test_checkout.py
tests/e2e/checkout.spec.ts
tests/e2e/pages/CheckoutPage.ts
Real Services Used: PostgreSQL (testcontainers), Redis (docker-compose)
Remaining Gaps
──────────────
- src/services/notification.ts (0% — no tests generated, out of scope)
- src/utils/crypto.ts (45% — edge cases not covered)
Next Steps
──────────
/ork:verify {SCOPE} # Grade the implementation + tests
/ork:commit # Commit generated testsKey Principles
- Tests only — never modify production source code, only generate test files
- Real services when available — prefer testcontainers/docker-compose over mocks for integration tests because mock/prod divergence causes silent failures in production
- Parallel generation — spawn one test-generator agent per tier in ONE message
- Heal, don't loop forever — max 3 iterations, then report remaining failures
- Progressive output — show results as each agent completes
- Factory over fixtures — use FactoryBoy/faker-js for test data, not hardcoded values
- Mock at network level — MSW/VCR, never mock fetch/axios directly
Related Skills
ork:implement— generates tests during implementation (Phase 5); use/ork:coverafter for deeper coverageork:verify— grades existing tests 0-10; chain:implement → cover → verifytesting-unit/testing-integration/testing-e2e— knowledge skills loaded by test-generator agentsork:commit— commit generated test files
References
Load on demand with Read("$\{CLAUDE_SKILL_DIR\}/references/<file>"):
| File | Content |
|---|---|
real-service-detection.md | Docker-compose/testcontainers detection, service startup, teardown |
heal-loop-strategy.md | Failure classification, fix patterns, iteration budget |
coverage-report-template.md | Report format, delta calculation, gap analysis |
Version: 1.0.0 (March 2026) — Initial release
References (3)
Coverage Report Template
Coverage Report Template
Format for the Phase 6 report output.
Report Structure
# Coverage Report: {SCOPE}
## Summary
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| Unit coverage | {N}% | {N}% | +{N}% |
| Integration coverage | {N}% | {N}% | +{N}% |
| E2E coverage | {N}% | {N}% | +{N}% |
| **Overall** | **{N}%** | **{N}%** | **+{N}%** |
## Tests Generated
| Tier | Count | Pass | Healed | Failed |
|------|-------|------|--------|--------|
| Unit | {N} | {N} | {N} | {N} |
| Integration | {N} | {N} | {N} | {N} |
| E2E | {N} | {N} | {N} | {N} |
| **Total** | **{N}** | **{N}** | **{N}** | **{N}** |
Heal iterations used: {N}/3
## Files Created
{list of test files created, grouped by tier}
## Real Services Used
{list of services started via docker-compose or testcontainers, or "None (mocks only)"}
## Remaining Gaps
{files or functions still below coverage threshold, with reasons}
## Failures (if any)
{tests that could not be healed after 3 iterations, with failure reason and suggested fix}
## Next Steps
- `/ork:verify {SCOPE}` — grade the implementation + tests
- `/ork:commit` — commit generated test files
- Fix source bugs detected during test generation (if any)Delta Calculation
# Before: run coverage with existing tests only
baseline = run_coverage(existing_tests)
# After: run coverage with existing + generated tests
final = run_coverage(existing_tests + generated_tests)
# Delta per file
for file in scope_files:
delta = final[file] - baseline[file]
# Report files with biggest delta firstCoverage Tool Commands
| Stack | Command | Output |
|---|---|---|
| Vitest | npx vitest run --coverage --reporter=json | coverage/coverage-final.json |
| Jest | npx jest --coverage --json | coverage/coverage-final.json |
| pytest | pytest --cov=\{scope\} --cov-report=json | coverage.json |
| Go | go test -coverprofile=coverage.out ./... | coverage.out |
| Playwright | Coverage via Istanbul instrumentation | coverage/ dir |
Thresholds
| Tier | Target | Minimum |
|---|---|---|
| Unit (business logic) | 90% | 70% |
| Integration (API boundaries) | 80% | 60% |
| E2E (critical user flows) | N/A | Key flows covered |
| Overall | 80% | 70% |
Heal Loop Strategy
Heal Loop Strategy
Fix failing generated tests iteratively. Max 3 iterations to prevent infinite loops.
Failure Classification
| Category | Example | Fix Strategy |
|---|---|---|
| Assertion error | expected 200, got 201 | Update expected value after verifying source behavior |
| Import error | Cannot find module './auth' | Fix import path, check tsconfig/conftest |
| Setup error | Connection refused | Add missing service setup, check fixture scope |
| Timeout | Test exceeded 5000ms | Add proper waits (Playwright: auto-wait; API: increase timeout) |
| Selector stale | Element not found: [data-testid="submit"] | Switch to semantic locator (getByRole) |
| Type error | Property 'id' does not exist | Fix type assertion or factory output |
| Flaky | Passes sometimes, fails others | Remove timing deps, add deterministic waits |
Iteration Budget
Iteration 1: Fix obvious errors (imports, assertions, setup)
Iteration 2: Fix interaction errors (selectors, timing, state)
Iteration 3: Fix remaining edge cases or mark as known-failingAfter 3 iterations, any still-failing tests are reported with:
- Failure reason
- File and line number
- Suggested manual fix
Fix Rules
- Never modify source code — only fix test files
- Read source before fixing — understand the actual behavior
- Prefer updating assertions over adding workarounds
- Don't suppress errors — if a test exposes a real bug, report it
- Keep tests deterministic — no
Date.now(), noMath.random()without seeding
Source Bug Detection
If a test failure reveals a real bug in source code:
[SOURCE BUG DETECTED]
File: src/services/payment.ts:45
Issue: calculateTotal() doesn't handle negative quantities
Test: tests/unit/test_payment.ts:23 — test_negative_quantity
Action: Test is CORRECT. Source code needs fixing.
Skipping this test in heal loop.
Report to user for manual resolution.Flaky Test Prevention
Generated tests must avoid:
setTimeout/sleepfor synchronization- Shared mutable state between tests
- Order-dependent test execution
- Hard-coded ports or file paths
- Time-sensitive assertions (
Date.now())
Instead use:
- Playwright auto-wait and
waitForassertions - Fresh fixtures per test (function scope)
- Dynamic port allocation
- Relative paths and temp directories
- Frozen time (vi.useFakeTimers / freezegun)
Real Service Detection
Real-Service Detection
Detect and configure real-service infrastructure for integration tests. Real services catch bugs that mocks miss — mock/prod divergence caused silent failures in past incidents.
Detection (Phase 1)
# PARALLEL — scan for infrastructure:
Glob(pattern="**/docker-compose*.yml")
Glob(pattern="**/docker-compose.test.yml")
Grep(pattern="testcontainers", glob="**/package.json", output_mode="content")
Grep(pattern="testcontainers", glob="**/requirements*.txt", output_mode="content")
Grep(pattern="testcontainers", glob="**/pyproject.toml", output_mode="content")Decision Matrix
| Found | Strategy | Test Speed |
|---|---|---|
docker-compose.test.yml | Start services via compose, run tests, tear down | ~30s startup |
docker-compose.yml (no test variant) | Use with --profile test if available | ~30s startup |
testcontainers in deps | Per-test isolated containers | ~5s per container |
Neither + --real-services flag | Error: suggest installing testcontainers | N/A |
| Neither, no flag | Fall back to mocks (MSW/VCR) | ~0s |
Docker-Compose Workflow
# Start services before tests
docker compose -f docker-compose.test.yml up -d --wait
# Run integration tests
npm run test:integration # or pytest tests/integration/
# Tear down after tests
docker compose -f docker-compose.test.yml down -vServices to look for in compose files:
postgres/mysql/mongodb— databaseredis— cache/sessionsrabbitmq/kafka— message queueselasticsearch/opensearch— searchminio/localstack— S3-compatible storage
Testcontainers Workflow
TypeScript
import { PostgreSqlContainer } from '@testcontainers/postgresql';
let container: StartedPostgreSqlContainer;
beforeAll(async () => {
container = await new PostgreSqlContainer()
.withDatabase('test_db')
.start();
process.env.DATABASE_URL = container.getConnectionUri();
}, 30_000);
afterAll(async () => {
await container.stop();
});Python
import pytest
from testcontainers.postgres import PostgresContainer
@pytest.fixture(scope="session")
def postgres():
with PostgresContainer("postgres:16-alpine") as pg:
yield pg.get_connection_url()Environment Variables
When real services are running, set connection strings:
| Service | Env Variable | Source |
|---|---|---|
| PostgreSQL | DATABASE_URL | container.getConnectionUri() |
| Redis | REDIS_URL | redis://localhost:\{mapped_port\} |
| MongoDB | MONGODB_URI | container.getConnectionString() |
Cleanup
Always tear down services after tests:
- Docker-compose:
docker compose down -v(removes volumes) - Testcontainers:
container.stop()in afterAll/fixture teardown - Database state: transaction rollback or truncate between tests
Configure
Configures OrchestKit plugin settings, MCP servers, hook permissions, and keybindings. Use when customizing plugin behavior or managing settings.
Create Pr
Creates GitHub pull requests with validation. Use when opening PRs or submitting code for review.
Last updated on