Testing Patterns
Redirect — testing-patterns was split into 5 focused sub-skills. Use when looking for testing-patterns, writing tests, or test automation. Redirects to testing-unit, testing-e2e, testing-integration, testing-llm, or testing-perf.
Auto-activated — this skill loads automatically when Claude detects matching context.
Testing Patterns (Redirect)
This skill was split into 5 focused sub-skills in v7.2.0. Use the appropriate sub-skill below.
Sub-Skills
| Sub-Skill | Focus | When to Use |
|---|---|---|
ork:testing-unit | Unit tests, AAA pattern, fixtures, mocking, factories | Isolated business logic tests |
ork:testing-e2e | Playwright, page objects, visual regression, a11y | Browser-based end-to-end tests |
ork:testing-integration | API endpoints, database, contract testing | Cross-boundary integration tests |
ork:testing-llm | LLM mocking, DeepEval/RAGAS, structured output | AI/ML evaluation and testing |
ork:testing-perf | k6, Locust, pytest-xdist, benchmark | Performance and load testing |
Quick Reference
/ork:testing-unit # Unit testing patterns
/ork:testing-e2e # End-to-end with Playwright
/ork:testing-integration # API and database integration
/ork:testing-llm # LLM evaluation patterns
/ork:testing-perf # Performance and load testingRelated Skills
ork:testing-unit— Unit testing: AAA pattern, fixtures, mocking, factoriesork:testing-e2e— E2E testing: Playwright, page objects, visual regressionork:testing-integration— Integration testing: API endpoints, database, contractsork:testing-llm— LLM testing: mock responses, DeepEval/RAGAS evaluationork:testing-perf— Performance testing: k6, Locust, pytest-xdist
Rules (2)
Assertions must test observable behavior, not implementation details — MEDIUM
Assert Behavior, Not Implementation
Why
Tests that assert implementation details (internal method calls, private state, specific SQL queries) break on every refactor even when the behavior is unchanged. This creates a maintenance burden where developers spend more time fixing tests than writing features — and the tests never catch real bugs.
Rule
Assertions must verify:
- Return values and output (what the function produces)
- Side effects on public interfaces (what changed externally)
- Error behavior (what happens when things go wrong)
Assertions must NOT verify:
- Internal method call counts or order
- Private state or internal data structures
- Specific implementation mechanisms (SQL text, HTTP internals)
Incorrect — testing implementation details
test("creates user", async () => {
const spy = jest.spyOn(db, "query");
await userService.create({ name: "Alice", email: "alice@test.com" });
// Asserts exact SQL — breaks if column order changes or ORM upgrades
expect(spy).toHaveBeenCalledWith(
"INSERT INTO users (name, email) VALUES ($1, $2)",
["Alice", "alice@test.com"]
);
// Asserts call count — breaks if implementation adds a cache write
expect(spy).toHaveBeenCalledTimes(1);
});def test_process_order(mocker):
mock_validate = mocker.patch.object(order_service, "_validate_items")
mock_calc = mocker.patch.object(order_service, "_calculate_total")
order_service.process(order)
# Asserts internal method calls — breaks on any refactor
mock_validate.assert_called_once_with(order.items)
mock_calc.assert_called_once_with(order.items, order.discount)Correct — testing observable behavior
test("creates user and returns it with generated id", async () => {
const result = await userService.create({
name: "Alice",
email: "alice@test.com"
});
// Assert return value (behavior)
expect(result.id).toBeDefined();
expect(result.name).toBe("Alice");
expect(result.email).toBe("alice@test.com");
// Assert side effect via public interface (behavior)
const fetched = await userService.findById(result.id);
expect(fetched?.name).toBe("Alice");
});def test_process_order_calculates_total():
order = Order(
items=[Item(price=10, qty=2), Item(price=5, qty=1)],
discount=0.1
)
result = order_service.process(order)
# Assert output (behavior)
assert result.total == 22.50 # (10*2 + 5*1) * 0.9
assert result.status == "confirmed"Assertion Quality Checklist
// BAD: Asserts HOW (implementation)
expect(mockDb.query).toHaveBeenCalledWith(/* specific SQL */);
expect(internalCache.size).toBe(1);
expect(privatMethod).toHaveBeenCalledTimes(3);
// GOOD: Asserts WHAT (behavior)
expect(result.id).toBeDefined(); // Output value
expect(await service.findById(id)).toBeTruthy(); // Side effect
expect(() => service.create(invalid)).toThrow(); // Error behaviorWhen Mocks Are Acceptable
| Scenario | Mock OK | Assert On |
|---|---|---|
| External API calls | Yes — mock the HTTP client | Response handling behavior |
| Database in unit tests | Yes — mock the repository | Return values and errors |
| Internal private methods | No — do not mock | Test via public interface |
| Time-dependent logic | Yes — mock Date.now | Correct output for given time |
| File system | Yes — mock fs calls | Correct file content written |
Tests must not depend on execution order or shared mutable state between test cases — HIGH
Test Isolation — No Shared State or Order Dependency
Why
Tests that share mutable state or depend on execution order are the #1 source of flaky tests. They pass when run sequentially in a specific order but fail when parallelized, shuffled, or run individually. Debugging flaky tests wastes more time than writing isolated tests.
Rule
Every test must:
- Set up its own state in
beforeEach/setUp/ arrange phase - Clean up after itself (or use fresh fixtures)
- Pass when run in isolation (
test.only/-k test_name) - Pass when run in any order (shuffle mode)
Incorrect — tests share mutable state
// Shared state between tests — order dependent
describe("UserService", () => {
const users: User[] = []; // Shared mutable array
test("creates a user", () => {
users.push({ id: 1, name: "Alice" });
expect(users).toHaveLength(1);
});
test("finds the user", () => {
// DEPENDS on "creates a user" running first
const found = users.find(u => u.name === "Alice");
expect(found).toBeDefined();
});
test("deletes the user", () => {
// DEPENDS on both previous tests
users.splice(0, 1);
expect(users).toHaveLength(0);
});
});# Module-level state — tests pollute each other
_cache = {}
def test_add_to_cache():
_cache["key"] = "value"
assert _cache["key"] == "value"
def test_cache_is_empty():
# FAILS if test_add_to_cache runs first
assert len(_cache) == 0Correct — each test owns its state
describe("UserService", () => {
let service: UserService;
let db: TestDatabase;
beforeEach(() => {
db = new TestDatabase(); // Fresh DB per test
service = new UserService(db); // Fresh service per test
});
afterEach(() => {
db.cleanup();
});
test("creates a user", async () => {
const user = await service.create({ name: "Alice" });
expect(user.id).toBeDefined();
});
test("finds a user by name", async () => {
// Arrange: create its own user, does not rely on other tests
await service.create({ name: "Bob" });
// Act
const found = await service.findByName("Bob");
// Assert
expect(found?.name).toBe("Bob");
});
});import pytest
@pytest.fixture
def cache():
"""Fresh cache for each test."""
return {}
def test_add_to_cache(cache):
cache["key"] = "value"
assert cache["key"] == "value"
def test_cache_starts_empty(cache):
# Always passes — gets its own empty cache
assert len(cache) == 0Verification
# Run tests in random order to detect order dependency
# Jest
jest --randomize
# pytest
pytest -p randomly
# Run single test in isolation
jest --testNamePattern="finds a user"
pytest -k "test_cache_starts_empty"Common Isolation Violations
| Violation | Fix |
|---|---|
| Shared array/object between tests | Move to beforeEach |
| Database not reset between tests | Use transactions + rollback |
| Global singleton with state | Reset in beforeEach or use DI |
| File system side effects | Use temp directories per test |
| Environment variable mutations | Save/restore in setup/teardown |
Testing Llm
LLM and AI testing patterns — mock responses, evaluation with DeepEval/RAGAS, structured output validation, and agentic test patterns (generator, healer, planner). Use when testing AI features, validating LLM outputs, or building evaluation pipelines.
Testing Perf
Performance and load testing patterns — k6 load tests, Locust stress tests, pytest execution optimization (xdist parallel, plugins), test type classification, and performance benchmarking. Use when writing load tests, optimizing test execution speed, or setting up pytest infrastructure.
Last updated on