Skip to main content
OrchestKit v7.11.1 — 93 skills, 33 agents, 105 hooks · Claude Code 2.1.76+
OrchestKit
Skills

Cover

Generate and run comprehensive test suites — unit tests, integration tests with real services (testcontainers/docker-compose), and Playwright E2E tests. Analyzes coverage gaps, spawns parallel test-generator agents per tier, runs tests, and heals failures (max 3 iterations). Use when generating tests for existing code, improving coverage after implementation, or creating a full test suite from scratch. Chains naturally after /ork:implement. Do NOT use for verifying/grading existing tests (use /ork:verify) or running tests without generation (use npm test directly).

Command high

Cover — Test Suite Generator

Generate comprehensive test suites for existing code with real-service integration testing and automated failure healing.

Quick Start

/ork:cover authentication flow
/ork:cover --model=opus payment processing
/ork:cover --tier=unit,integration user service
/ork:cover --real-services checkout pipeline

Argument Resolution

SCOPE = "$ARGUMENTS"  # e.g., "authentication flow"

# Flag parsing
MODEL_OVERRIDE = None
TIERS = ["unit", "integration", "e2e"]  # default: all three
REAL_SERVICES = False

for token in "$ARGUMENTS".split():
    if token.startswith("--model="):
        MODEL_OVERRIDE = token.split("=", 1)[1]
        SCOPE = SCOPE.replace(token, "").strip()
    elif token.startswith("--tier="):
        TIERS = token.split("=", 1)[1].split(",")
        SCOPE = SCOPE.replace(token, "").strip()
    elif token == "--real-services":
        REAL_SERVICES = True
        SCOPE = SCOPE.replace(token, "").strip()

Step -1: MCP Probe + Resume Check

# Probe MCPs (parallel):
ToolSearch(query="select:mcp__memory__search_nodes")
ToolSearch(query="select:mcp__context7__resolve-library-id")

Write(".claude/chain/capabilities.json", {
  "memory": <true if found>,
  "context7": <true if found>,
  "skill": "cover",
  "timestamp": now()
})

# Resume check:
Read(".claude/chain/state.json")
# If exists and skill == "cover": resume from current_phase
# Otherwise: initialize state

Step 0: Scope & Tier Selection

AskUserQuestion(
  questions=[
    {
      "question": "What test tiers should I generate?",
      "header": "Test Tiers",
      "options": [
        {"label": "Full coverage (Recommended)", "description": "Unit + Integration (real services) + E2E", "markdown": "```\nFull Coverage\n─────────────\n  Unit            Integration       E2E\n  ┌─────────┐    ┌─────────────┐  ┌──────────┐\n  │ AAA     │    │ Real DB     │  │Playwright│\n  │ Mocks   │    │ Real APIs   │  │Page obj  │\n  │ Factory │    │ Testcontain │  │A11y      │\n  └─────────┘    └─────────────┘  └──────────┘\n  3 parallel test-generator agents\n```"},
        {"label": "Unit + Integration", "description": "Skip E2E, focus on logic and service boundaries", "markdown": "```\nUnit + Integration\n──────────────────\n  Unit tests for business logic\n  Integration tests at API boundaries\n  Real services if docker-compose found\n  Skip: browser automation\n```"},
        {"label": "Unit only", "description": "Fast isolated tests for business logic", "markdown": "```\nUnit Only (~2 min)\n──────────────────\n  AAA pattern tests\n  MSW/VCR mocking\n  Factory-based data\n  Coverage gap analysis\n  Skip: real services, browser\n```"},
        {"label": "Integration only", "description": "API boundary and real-service tests", "markdown": "```\nIntegration Only\n────────────────\n  API endpoint tests (Supertest/httpx)\n  Database tests (real or in-memory)\n  Contract tests (Pact)\n  Testcontainers if available\n```"},
        {"label": "E2E only", "description": "Playwright browser tests", "markdown": "```\nE2E Only\n────────\n  Playwright page objects\n  User flow tests\n  Visual regression\n  Accessibility (axe-core)\n```"}
      ],
      "multiSelect": false
    },
    {
      "question": "Healing strategy for failing tests?",
      "header": "Failure Handling",
      "options": [
        {"label": "Auto-heal (Recommended)", "description": "Fix failing tests up to 3 iterations"},
        {"label": "Generate only", "description": "Write tests, report failures, don't fix"},
        {"label": "Strict", "description": "All tests must pass or abort"}
      ],
      "multiSelect": false
    }
  ]
)

Override TIERS based on selection. Skip this step if --tier= flag was provided.


Task Management (MANDATORY)

TaskCreate(
  subject=f"Cover: {SCOPE}",
  description="Generate comprehensive test suite with real-service testing",
  activeForm=f"Generating tests for {SCOPE}"
)

# Subtasks per phase
TaskCreate(subject="Discover scope and detect frameworks", activeForm="Discovering test scope")
TaskCreate(subject="Analyze coverage gaps", activeForm="Analyzing coverage gaps")
TaskCreate(subject="Generate tests (parallel per tier)", activeForm="Generating tests")
TaskCreate(subject="Execute generated tests", activeForm="Running tests")
TaskCreate(subject="Heal failing tests", activeForm="Healing test failures")
TaskCreate(subject="Generate coverage report", activeForm="Generating report")

6-Phase Workflow

PhaseActivitiesOutput
1. DiscoveryDetect frameworks, scan scope, find untested codeFramework map, file list
2. Coverage AnalysisRun existing tests, map gaps per tierCoverage baseline, gap map
3. GenerationParallel test-generator agents per tierTest files created
4. ExecutionRun all generated testsPass/fail results
5. HealFix failures, re-run (max 3 iterations)Green test suite
6. ReportCoverage delta, test count, summaryCoverage report

Phase Handoffs

After PhaseHandoff FileKey Outputs
1. Discovery01-cover-discovery.jsonFrameworks, scope files, tier plan
2. Analysis02-cover-analysis.jsonBaseline coverage, gap map
3. Generation03-cover-generation.jsonFiles created, test count per tier
5. Heal05-cover-healed.jsonFinal pass/fail, iterations used

Phase 1: Discovery

Detect the project's test infrastructure and scope the work.

# PARALLEL — all in ONE message:
# 1. Framework detection (hook handles this, but also scan manually)
Grep(pattern="vitest|jest|mocha|playwright|cypress", glob="package.json", output_mode="content")
Grep(pattern="pytest|unittest|hypothesis", glob="pyproject.toml", output_mode="content")
Grep(pattern="pytest|unittest|hypothesis", glob="requirements*.txt", output_mode="content")

# 2. Real-service infrastructure
Glob(pattern="**/docker-compose*.yml")
Glob(pattern="**/testcontainers*")
Grep(pattern="testcontainers", glob="**/package.json", output_mode="content")
Grep(pattern="testcontainers", glob="**/requirements*.txt", output_mode="content")

# 3. Existing test structure
Glob(pattern="**/tests/**/*.test.*")
Glob(pattern="**/tests/**/*.spec.*")
Glob(pattern="**/__tests__/**/*")
Glob(pattern="**/test_*.py")

# 4. Scope files (what to test)
# If SCOPE specified, find matching source files
Grep(pattern=SCOPE, output_mode="files_with_matches")

Real-service decision:

  • docker-compose*.yml found → integration tests use real services
  • testcontainers in deps → use testcontainers for isolated service instances
  • Neither found + --real-services flag → error: "No docker-compose or testcontainers found. Install testcontainers or remove --real-services flag."
  • Neither found, no flag → integration tests use mocks (MSW/VCR)

Load real-service detection details: Read("$\{CLAUDE_SKILL_DIR\}/references/real-service-detection.md")

Phase 2: Coverage Analysis

Run existing tests and identify gaps.

# Detect and run coverage command
# TypeScript: npx vitest run --coverage --reporter=json
# Python: pytest --cov=<scope> --cov-report=json
# Go: go test -coverprofile=coverage.out ./...

# Parse coverage output to identify:
# 1. Files with 0% coverage (priority targets)
# 2. Files below threshold (default 70%)
# 3. Uncovered functions/methods
# 4. Untested edge cases (error paths, boundary conditions)

Output coverage baseline to user immediately (progressive output).

Phase 3: Generation (Parallel Agents)

Spawn test-generator agents per tier. Launch ALL in ONE message with run_in_background=true.

# Unit tests agent
if "unit" in TIERS:
    Agent(
        subagent_type="test-generator",
        prompt=f"""Generate unit tests for: {SCOPE}
        Coverage gaps: {gap_map.unit_gaps}
        Framework: {detected_framework}
        Existing tests: {existing_test_files}

        Focus on:
        - AAA pattern (Arrange-Act-Assert)
        - Parametrized tests for multiple inputs
        - MSW/VCR for HTTP mocking (never mock fetch directly)
        - Factory-based test data (FactoryBoy/faker-js)
        - Edge cases: empty input, errors, timeouts, boundary values
        - Target: 90%+ business logic coverage""",
        isolation="worktree",
        run_in_background=True,
        max_turns=50,
        model=MODEL_OVERRIDE
    )

# Integration tests agent
if "integration" in TIERS:
    Agent(
        subagent_type="test-generator",
        prompt=f"""Generate integration tests for: {SCOPE}
        Coverage gaps: {gap_map.integration_gaps}
        Framework: {detected_framework}
        Real services available: {real_service_infra}

        Focus on:
        - API endpoint tests (Supertest/httpx)
        - Database tests with {'real DB via testcontainers/docker-compose' if real_services else 'in-memory/mocked DB'}
        - Contract tests (Pact) for service boundaries
        - Zod/Pydantic schema validation at edges
        - Fresh state per test (transaction rollback or cleanup)
        - Target: all API endpoints and service boundaries""",
        isolation="worktree",
        run_in_background=True,
        max_turns=50,
        model=MODEL_OVERRIDE
    )

# E2E tests agent
if "e2e" in TIERS:
    Agent(
        subagent_type="test-generator",
        prompt=f"""Generate E2E tests for: {SCOPE}
        Framework: Playwright
        Routes/pages: {discovered_routes}

        Focus on:
        - Semantic locators (getByRole > getByLabel > getByTestId)
        - Page Object Model for complex pages
        - User flow tests (happy path + error paths)
        - Accessibility tests (axe-core WCAG 2.2 AA)
        - Visual regression (toHaveScreenshot)
        - No hardcoded waits (use auto-wait)""",
        isolation="worktree",
        run_in_background=True,
        max_turns=50,
        model=MODEL_OVERRIDE
    )

Output each agent's results as soon as it returns — don't wait for all agents. This lets users see generated tests incrementally.

Partial results (CC 2.1.76): If an agent is killed (timeout, context limit), its response is tagged [PARTIAL RESULT]. Include partial tests but flag them in Phase 4.

Phase 4: Execution

Run all generated tests and collect results.

# Run test commands per tier (PARALLEL if independent):
# Unit: npx vitest run tests/unit/ OR pytest tests/unit/
# Integration: npx vitest run tests/integration/ OR pytest tests/integration/
# E2E: npx playwright test

# Collect: pass count, fail count, error details, coverage delta

Phase 5: Heal Loop

Fix failing tests iteratively. Max 3 iterations to prevent infinite loops.

for iteration in range(3):
    if all_tests_pass:
        break

    # For each failing test:
    # 1. Read the test file and the source code it tests
    # 2. Analyze the failure (assertion error? import error? timeout?)
    # 3. Fix the test (not the source code — tests only)
    # 4. Re-run the fixed tests

    # Common fixes:
    # - Wrong assertions (expected value mismatch)
    # - Missing imports or setup
    # - Stale selectors in E2E tests
    # - Race conditions (add proper waits)
    # - Mock configuration errors

Load heal strategy details: Read("$\{CLAUDE_SKILL_DIR\}/references/heal-loop-strategy.md")

Boundary: heal fixes TESTS, not source code. If a test fails because the source code has a bug, report it — don't silently fix production code.

Phase 6: Report

Generate coverage report with before/after comparison.

Coverage Report: {SCOPE}
═══════════════════════════

Baseline → After
────────────────
  Unit:        67.2% → 91.3% (+24.1%)
  Integration: 42.0% → 78.5% (+36.5%)
  E2E:          0.0% → 65.0% (+65.0%)
  Overall:     48.4% → 82.1% (+33.7%)

Tests Generated
───────────────
  Unit:        23 tests (18 pass, 5 healed)
  Integration: 12 tests (10 pass, 2 healed)
  E2E:          8 tests (8 pass)
  Total:       43 tests

Heal Iterations: 2/3

Files Created
─────────────
  tests/unit/services/test_auth.py
  tests/unit/services/test_payment.py
  tests/integration/api/test_users.py
  tests/integration/api/test_checkout.py
  tests/e2e/checkout.spec.ts
  tests/e2e/pages/CheckoutPage.ts

Real Services Used: PostgreSQL (testcontainers), Redis (docker-compose)

Remaining Gaps
──────────────
  - src/services/notification.ts (0% — no tests generated, out of scope)
  - src/utils/crypto.ts (45% — edge cases not covered)

Next Steps
──────────
  /ork:verify {SCOPE}    # Grade the implementation + tests
  /ork:commit             # Commit generated tests

Key Principles

  • Tests only — never modify production source code, only generate test files
  • Real services when available — prefer testcontainers/docker-compose over mocks for integration tests because mock/prod divergence causes silent failures in production
  • Parallel generation — spawn one test-generator agent per tier in ONE message
  • Heal, don't loop forever — max 3 iterations, then report remaining failures
  • Progressive output — show results as each agent completes
  • Factory over fixtures — use FactoryBoy/faker-js for test data, not hardcoded values
  • Mock at network level — MSW/VCR, never mock fetch/axios directly

  • ork:implement — generates tests during implementation (Phase 5); use /ork:cover after for deeper coverage
  • ork:verify — grades existing tests 0-10; chain: implement → cover → verify
  • testing-unit / testing-integration / testing-e2e — knowledge skills loaded by test-generator agents
  • ork:commit — commit generated test files

References

Load on demand with Read("$\{CLAUDE_SKILL_DIR\}/references/&lt;file&gt;"):

FileContent
real-service-detection.mdDocker-compose/testcontainers detection, service startup, teardown
heal-loop-strategy.mdFailure classification, fix patterns, iteration budget
coverage-report-template.mdReport format, delta calculation, gap analysis

Version: 1.0.0 (March 2026) — Initial release


References (3)

Coverage Report Template

Coverage Report Template

Format for the Phase 6 report output.

Report Structure

# Coverage Report: {SCOPE}

## Summary

| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| Unit coverage | {N}% | {N}% | +{N}% |
| Integration coverage | {N}% | {N}% | +{N}% |
| E2E coverage | {N}% | {N}% | +{N}% |
| **Overall** | **{N}%** | **{N}%** | **+{N}%** |

## Tests Generated

| Tier | Count | Pass | Healed | Failed |
|------|-------|------|--------|--------|
| Unit | {N} | {N} | {N} | {N} |
| Integration | {N} | {N} | {N} | {N} |
| E2E | {N} | {N} | {N} | {N} |
| **Total** | **{N}** | **{N}** | **{N}** | **{N}** |

Heal iterations used: {N}/3

## Files Created

{list of test files created, grouped by tier}

## Real Services Used

{list of services started via docker-compose or testcontainers, or "None (mocks only)"}

## Remaining Gaps

{files or functions still below coverage threshold, with reasons}

## Failures (if any)

{tests that could not be healed after 3 iterations, with failure reason and suggested fix}

## Next Steps

- `/ork:verify {SCOPE}` — grade the implementation + tests
- `/ork:commit` — commit generated test files
- Fix source bugs detected during test generation (if any)

Delta Calculation

# Before: run coverage with existing tests only
baseline = run_coverage(existing_tests)

# After: run coverage with existing + generated tests
final = run_coverage(existing_tests + generated_tests)

# Delta per file
for file in scope_files:
    delta = final[file] - baseline[file]
    # Report files with biggest delta first

Coverage Tool Commands

StackCommandOutput
Vitestnpx vitest run --coverage --reporter=jsoncoverage/coverage-final.json
Jestnpx jest --coverage --jsoncoverage/coverage-final.json
pytestpytest --cov=\{scope\} --cov-report=jsoncoverage.json
Gogo test -coverprofile=coverage.out ./...coverage.out
PlaywrightCoverage via Istanbul instrumentationcoverage/ dir

Thresholds

TierTargetMinimum
Unit (business logic)90%70%
Integration (API boundaries)80%60%
E2E (critical user flows)N/AKey flows covered
Overall80%70%

Heal Loop Strategy

Heal Loop Strategy

Fix failing generated tests iteratively. Max 3 iterations to prevent infinite loops.

Failure Classification

CategoryExampleFix Strategy
Assertion errorexpected 200, got 201Update expected value after verifying source behavior
Import errorCannot find module './auth'Fix import path, check tsconfig/conftest
Setup errorConnection refusedAdd missing service setup, check fixture scope
TimeoutTest exceeded 5000msAdd proper waits (Playwright: auto-wait; API: increase timeout)
Selector staleElement not found: [data-testid="submit"]Switch to semantic locator (getByRole)
Type errorProperty 'id' does not existFix type assertion or factory output
FlakyPasses sometimes, fails othersRemove timing deps, add deterministic waits

Iteration Budget

Iteration 1: Fix obvious errors (imports, assertions, setup)
Iteration 2: Fix interaction errors (selectors, timing, state)
Iteration 3: Fix remaining edge cases or mark as known-failing

After 3 iterations, any still-failing tests are reported with:

  • Failure reason
  • File and line number
  • Suggested manual fix

Fix Rules

  1. Never modify source code — only fix test files
  2. Read source before fixing — understand the actual behavior
  3. Prefer updating assertions over adding workarounds
  4. Don't suppress errors — if a test exposes a real bug, report it
  5. Keep tests deterministic — no Date.now(), no Math.random() without seeding

Source Bug Detection

If a test failure reveals a real bug in source code:

[SOURCE BUG DETECTED]
File: src/services/payment.ts:45
Issue: calculateTotal() doesn't handle negative quantities
Test: tests/unit/test_payment.ts:23 — test_negative_quantity
Action: Test is CORRECT. Source code needs fixing.
         Skipping this test in heal loop.
         Report to user for manual resolution.

Flaky Test Prevention

Generated tests must avoid:

  • setTimeout/sleep for synchronization
  • Shared mutable state between tests
  • Order-dependent test execution
  • Hard-coded ports or file paths
  • Time-sensitive assertions (Date.now())

Instead use:

  • Playwright auto-wait and waitFor assertions
  • Fresh fixtures per test (function scope)
  • Dynamic port allocation
  • Relative paths and temp directories
  • Frozen time (vi.useFakeTimers / freezegun)

Real Service Detection

Real-Service Detection

Detect and configure real-service infrastructure for integration tests. Real services catch bugs that mocks miss — mock/prod divergence caused silent failures in past incidents.

Detection (Phase 1)

# PARALLEL — scan for infrastructure:
Glob(pattern="**/docker-compose*.yml")
Glob(pattern="**/docker-compose.test.yml")
Grep(pattern="testcontainers", glob="**/package.json", output_mode="content")
Grep(pattern="testcontainers", glob="**/requirements*.txt", output_mode="content")
Grep(pattern="testcontainers", glob="**/pyproject.toml", output_mode="content")

Decision Matrix

FoundStrategyTest Speed
docker-compose.test.ymlStart services via compose, run tests, tear down~30s startup
docker-compose.yml (no test variant)Use with --profile test if available~30s startup
testcontainers in depsPer-test isolated containers~5s per container
Neither + --real-services flagError: suggest installing testcontainersN/A
Neither, no flagFall back to mocks (MSW/VCR)~0s

Docker-Compose Workflow

# Start services before tests
docker compose -f docker-compose.test.yml up -d --wait

# Run integration tests
npm run test:integration  # or pytest tests/integration/

# Tear down after tests
docker compose -f docker-compose.test.yml down -v

Services to look for in compose files:

  • postgres / mysql / mongodb — database
  • redis — cache/sessions
  • rabbitmq / kafka — message queues
  • elasticsearch / opensearch — search
  • minio / localstack — S3-compatible storage

Testcontainers Workflow

TypeScript

import { PostgreSqlContainer } from '@testcontainers/postgresql';

let container: StartedPostgreSqlContainer;

beforeAll(async () => {
  container = await new PostgreSqlContainer()
    .withDatabase('test_db')
    .start();
  process.env.DATABASE_URL = container.getConnectionUri();
}, 30_000);

afterAll(async () => {
  await container.stop();
});

Python

import pytest
from testcontainers.postgres import PostgresContainer

@pytest.fixture(scope="session")
def postgres():
    with PostgresContainer("postgres:16-alpine") as pg:
        yield pg.get_connection_url()

Environment Variables

When real services are running, set connection strings:

ServiceEnv VariableSource
PostgreSQLDATABASE_URLcontainer.getConnectionUri()
RedisREDIS_URLredis://localhost:\{mapped_port\}
MongoDBMONGODB_URIcontainer.getConnectionString()

Cleanup

Always tear down services after tests:

  • Docker-compose: docker compose down -v (removes volumes)
  • Testcontainers: container.stop() in afterAll/fixture teardown
  • Database state: transaction rollback or truncate between tests
Edit on GitHub

Last updated on