Fixes GitHub issues with parallel analysis. Use to debug errors, resolve regressions, fix bugs, or triage issues.

Command medium

Fix Issue

Systematic issue resolution with hypothesis-based root cause analysis, similar issue detection, and prevention recommendations.

Quick Start

/ork:fix-issue 123
/ork:fix-issue 456

Opus 4.6: Root cause analysis uses native adaptive thinking. Dynamic token budgets scale with context window for thorough investigation.

BEFORE creating tasks, clarify fix approach using AskUserQuestion. See rules/evidence-gathering.md for the full prompt template and workflow adjustments per approach (Proper fix, Quick fix, Investigate first, Hotfix).

STEP 0b: Select Orchestration Mode

Choose Agent Teams (mesh) or Task tool (star). See references/agent-selection.md for the selection criteria, cost comparison, and task creation patterns.

Workflow Overview

Phase	Activities	Output
1. Understand Issue	Read GitHub issue details	Problem statement
2. Similar Issue Detection	Search for related past issues	Related issues list
3. Hypothesis Formation	Form hypotheses with confidence scores	Ranked hypotheses
4. Root Cause Analysis	5 parallel agents investigate	Confirmed root cause
5. Fix Design	Design approach based on RCA	Fix specification
6. Implementation	Apply fix with tests	Working code
7. Validation	Verify fix resolves issue	Evidence
8. Prevention	How to prevent recurrence	Prevention plan
9. Runbook	Create/update runbook entry	Runbook
10. Lessons Learned	Capture knowledge	Persisted learnings
11. Commit and PR	Create PR with fix	Merged PR

Full phase details: See references/fix-phases.md for bash commands, templates, and procedures for each phase.

Critical Constraints

Feature branch MANDATORY -- NEVER commit directly to main or dev
Regression test MANDATORY -- write failing test BEFORE implementing fix
Prevention required -- at least one of: automated test, validation rule, or process check
Make minimal, focused changes; DO NOT over-engineer

CC 2.1.49 Enhancements

See references/cc-enhancements.md for session resume, task metrics, tool guidance, worktree isolation, and adaptive thinking.

Rules Quick Reference

Rule	Impact	What It Covers
evidence-gathering	HIGH	User intent verification, confidence scale, key decisions
rca-five-whys	HIGH	5 Whys iterative causal analysis
rca-fishbone	MEDIUM	Ishikawa diagram, multi-factor analysis
rca-fault-tree	MEDIUM	Fault tree analysis, AND/OR gates, critical systems

ork:commit - Commit issue fixes
debug-investigator - Debug complex issues
ork:issue-progress-tracking - Auto-updates from commits
ork:remember - Store lessons learned

References

Version: 2.1.0 (February 2026)

Rules (4)

Evidence Gathering — HIGH

Evidence Gathering Patterns

Verify User Intent (STEP 0)

BEFORE creating tasks, clarify fix approach with AskUserQuestion:

AskUserQuestion(
  questions=[{
    "question": "What approach for this fix?",
    "header": "Approach",
    "options": [
      {"label": "Proper fix (Recommended)", "description": "Full RCA, tests, prevention recommendations"},
      {"label": "Quick fix", "description": "Minimal fix to resolve the immediate issue"},
      {"label": "Investigate first", "description": "Understand the issue before deciding on approach"},
      {"label": "Hotfix", "description": "Emergency patch, minimal testing"}
    ],
    "multiSelect": false
  }]
)

Based on answer, adjust workflow:

Proper fix: All 11 phases, parallel agents for RCA
Quick fix: Skip phases 8-10 (prevention, runbook, lessons)
Investigate first: Only phases 1-4 (understand, search, hypotheses, analyze)
Hotfix: Minimal phases, skip similar issue search

Hypothesis Confidence Scale

Confidence	Meaning
90-100%	Near certain
70-89%	Highly likely
50-69%	Probable
30-49%	Possible
0-29%	Unlikely

Key Decisions

Decision	Choice	Rationale
Feature branch	MANDATORY	Never commit to main/dev directly
Regression test	MANDATORY	Fix without test is incomplete
Hypothesis confidence	0-100% scale	Quantifies certainty
Similar issue search	Before hypothesis	Leverage past solutions
Prevention analysis	Mandatory phase	Break recurring issue cycle
Runbook generation	Template-based	Consistent documentation

Map all failure paths with fault tree analysis to prevent recurring system failures — MEDIUM

Fault Tree Analysis (FTA)

Top-down, deductive analysis mapping all paths to a failure using boolean logic (AND/OR gates). Best for critical systems and exhaustive failure analysis.

FTA Symbols

Symbol	Meaning
TOP	Top event — the failure being analyzed
AND	All inputs must occur for output
OR	Any input causes output
Basic Event	Root cause (leaf node)
Undeveloped	Needs further analysis

Example: Authentication Failure

                USER CANNOT
                AUTHENTICATE
                     |
                   [OR]
        +------------+------------+
        |            |            |
    Invalid      Auth Service   Account
   Credentials     Down         Locked
        |            |
      [OR]         [OR]
    +---+---+    +---+---+
    |   |   |    |   |   |
   Wrong Expired Token DB  Redis External
   Pass  Token  Invalid Down Down  Auth

Building a Fault Tree

Define top event — the failure to analyze
Ask "what causes this?" — list immediate causes
Classify as AND/OR — do ALL causes need to happen, or ANY one?
Decompose each cause — repeat until reaching basic events
Identify minimal cut sets — smallest combinations that cause failure
Prioritize by probability — most likely paths first

Minimal Cut Sets

The smallest set of basic events that together cause the top event:

Top: User Cannot Authenticate (OR gate)
  Cut Set 1: {Wrong Password}         — single point of failure
  Cut Set 2: {Expired Token}          — single point of failure
  Cut Set 3: {DB Down}                — single point of failure
  Cut Set 4: {Account Locked}         — single point of failure

Single-event cut sets indicate no redundancy — add defense-in-depth.

When to Use FTA

Scenario	Use FTA?
Safety-critical system failure	Yes
Need exhaustive failure path mapping	Yes
Complex multi-component failure	Yes
Simple linear bug	No — use 5 Whys
Multiple contributing factors	Maybe — Fishbone first
Regulatory compliance analysis	Yes
Post-incident for serious outages	Yes

Incorrect — stopping at high-level causes without decomposition:

USER CANNOT AUTHENTICATE
         |
       [OR]
    +----+----+
    |         |
Auth Service  Account
   Down       Locked

Correct — decompose to basic events with AND/OR gates:

                USER CANNOT
                AUTHENTICATE
                     |
                   [OR]
        +------------+------------+
        |            |            |
    Invalid      Auth Service   Account
   Credentials     Down         Locked
        |            |
      [OR]         [OR]
    +---+---+    +---+---+
    |   |   |    |   |   |
   Wrong Expired Token DB  Redis External
   Pass  Token  Invalid Down Down  Auth

Minimal Cut Sets identified:
  {Wrong Password}, {Expired Token}, {DB Down}, {Account Locked}
  → All single-event cuts = no redundancy, needs defense-in-depth

Key Rules

Start from the top event (failure) and work downward
Every gate must be classified as AND (all required) or OR (any sufficient)
Decompose until reaching basic events (actionable root causes)
Identify minimal cut sets to find the most vulnerable paths
Single-event cut sets indicate missing redundancy
Use for critical systems where exhaustive analysis is justified

Analyze multi-factor problems with fishbone diagrams to avoid single-cause fixation — MEDIUM

Fishbone Diagram (Ishikawa)

Visualize multiple potential causes organized by category. Best for problems with several contributing factors.

Software-Specific Categories

                    +-------------+
          Code -----+             |
                    |             |
 Infrastructure ----+             +---- BUG/INCIDENT
                    |             |
   Dependencies ----+             |
                    |             |
   Configuration ---+             |
                    |             |
        Process ----+             |
                    |             |
        People -----+             |
                    +-------------+

Example: API Latency Spike

Category	Potential Causes
Code	N+1 query, missing index, sync blocking call
Infrastructure	DB connection pool exhausted, network saturation, insufficient RAM
Dependencies	External API slow, Redis timeout, CDN cache miss
Configuration	Wrong pool size, missing timeout, debug logging on
Process	No load testing, no perf regression CI
People	Unfamiliarity with query optimizer, missing review

Fishbone Process

Define the problem clearly (the fish head)
Identify major categories (the bones) — use software categories above
Brainstorm causes for each category
Analyze relationships between causes across categories
Prioritize most likely root causes by evidence
Verify with data, metrics, or targeted testing
Take action on confirmed causes

When to Use Fishbone

Scenario	Use Fishbone?
Multiple things went wrong	Yes
Problem has one clear cause	No — use 5 Whys
Team brainstorming session	Yes
Safety-critical failure analysis	No — use Fault Tree
Recurring issue with no clear pattern	Yes

Incorrect — jumping to one cause without category analysis:

### API Latency Spike Analysis

**Root Cause:** N+1 query in user endpoint
**Fix:** Add query optimization

Correct — fishbone analysis across all categories:

### API Latency Spike — Fishbone Analysis

**Code:**
- N+1 query in user endpoint (CONFIRMED via query log)
- Sync blocking call to external API

**Infrastructure:**
- DB connection pool exhausted (CONFIRMED: 0 available connections)
- Network saturation (ruled out: 20% utilization)

**Dependencies:**
- Redis timeout increased (ruled out: within SLA)

**Configuration:**
- Connection pool size too small (CONFIRMED: 10 max, need 50)

**Process:**
- No load testing in CI (process gap)

**Root Causes (cross-category):**
1. N+1 query (Code) + small pool (Config) = exhaustion
2. Missing load tests (Process) = undetected before prod

**Actions:**
- Fix N+1 query immediately
- Increase pool size 10 → 50
- Add load tests to CI

Key Rules

Use software-specific categories (Code, Infrastructure, Dependencies, Configuration, Process, People)
Brainstorm causes per category before analyzing relationships
Look for cross-category interactions (e.g., code + config)
Prioritize by evidence, not by assumption
Verify top candidates with data or experiments before committing to a fix

Apply the 5 Whys technique to reach root causes instead of fixing symptoms — HIGH

5 Whys Technique

Iteratively ask "why" to drill down from symptom to root cause. Simple, fast, and effective for linear causal chains.

Process

Problem Statement: [Clear description of the issue]
    |
    v
Why #1: [First level cause]
    |
    v
Why #2: [Deeper cause]
    |
    v
Why #3: [Even deeper]
    |
    v
Why #4: [Getting to root]
    |
    v
Why #5: [Root cause identified]
    |
    v
Action: [Fix that addresses root cause]

Example: Production Outage

**Problem:** Website was down for 2 hours

**Why 1:** The application server ran out of memory and crashed.
**Why 2:** A memory leak in the image processing service accumulated over time.
**Why 3:** The service wasn't releasing image buffers after processing.
**Why 4:** The cleanup code had a bug introduced in last week's release.
**Why 5:** We don't have automated memory leak detection in our test suite.

**Root Cause:** Missing automated memory leak testing
**Action:** Add memory profiling to CI pipeline, add cleanup tests

Best Practices

Do	Don't
Base answers on evidence	Guess or assume
Stay focused on one causal chain	Branch too early
Keep asking until actionable	Stop at symptoms
Involve people closest to issue	Assign blame
Document your reasoning	Skip steps

When 5 Whys Falls Short

Multiple contributing factors — use Fishbone diagram instead
Complex system interactions — use Fault Tree Analysis
Organizational/process issues — needs broader systemic analysis
Concurrent failures — 5 Whys assumes linear causation

Incorrect — stopping at symptom without root cause:

**Problem:** Website was down for 2 hours

**Why 1:** The application server crashed.
**Action:** Restart the server

Correct — drilling down to root cause with 5 Whys:

**Problem:** Website was down for 2 hours

**Why 1:** The application server ran out of memory and crashed.
  Evidence: Out-of-memory error in logs

**Why 2:** A memory leak in the image processing service accumulated over time.
  Evidence: Memory usage increased 2GB/hour in metrics

**Why 3:** The service wasn't releasing image buffers after processing.
  Evidence: Code review shows missing .dispose() calls

**Why 4:** The cleanup code had a bug introduced in last week's release.
  Evidence: Git blame + diff shows removal of cleanup in PR #234

**Why 5:** We don't have automated memory leak detection in our test suite.
  Evidence: No memory profiling in CI pipeline

**Root Cause:** Missing automated memory leak testing
**Actions:**
- Add memory profiling to CI pipeline
- Add cleanup tests for image processing
- Revert PR #234's cleanup removal

Key Rules

Always start with a clear, specific problem statement
Each "why" must be supported by evidence (logs, metrics, code)
Stop when you reach an actionable root cause (not always exactly 5)
The fix should address the root cause, not the symptom
Document the full chain for knowledge sharing

References (7)

Agent Selection

Agent Selection & Orchestration Mode

Orchestration Mode Selection

Choose Agent Teams (mesh -- RCA agents share hypotheses) or Task tool (star -- all report to lead):

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 -> Agent Teams mode
Agent Teams unavailable -> Task tool mode (default)
Otherwise: Complex cross-cutting bugs (backend + frontend + tests involved) -> recommend Agent Teams; Focused bugs (single domain) -> Task tool

Aspect	Task Tool	Agent Teams
Hypothesis sharing	Lead relays between agents	Investigators share hypotheses in real-time
Conflicting evidence	Lead resolves	Investigators debate directly
Cost	~250K tokens	~600K tokens
Best for	Single-domain bugs	Cross-cutting bugs with multiple hypotheses

Fallback: If Agent Teams encounters issues, fall back to Task tool for remaining investigation.

RCA Agent Roster (Phase 4)

Launch ALL 5 agents in parallel with run_in_background=True and max_turns=25:

#	Agent	Role
1	debug-investigator	Root cause tracing
2	debug-investigator	Impact analysis
3	backend-system-architect	Backend fix design
4	frontend-ui-developer	Frontend fix design
5	test-generator	Test requirements

Each agent outputs structured JSON with findings and SUMMARY line.

Task Management (CC 2.1.16)

# Create main fix task
TaskCreate(
  subject="Fix issue #{number}",
  description="Systematic issue resolution with hypothesis-based RCA",
  activeForm="Fixing issue #{number}"
)

# Create subtasks for 11-phase process
phases = ["Understand issue", "Search similar issues", "Form hypotheses",
          "Analyze root cause", "Design fix", "Implement fix", "Validate fix",
          "Generate prevention", "Create runbook", "Capture lessons", "Commit and PR"]
for phase in phases:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

Agent Teams Rca

Agent Teams RCA Workflow

In Agent Teams mode, form an investigation team where RCA agents share hypotheses and evidence in real-time:

TeamCreate(team_name="fix-issue-{number}", description="RCA for issue #{number}")

Task(subagent_type="debug-investigator", name="root-cause-tracer",
     team_name="fix-issue-{number}",
     prompt="""Trace the root cause for issue #{number}: {issue description}
     Hypotheses: {hypothesis list from Phase 3}
     Test each hypothesis. When you find evidence supporting or refuting a hypothesis,
     message impact-analyst and the relevant domain expert (backend-expert or frontend-expert).
     If you find conflicting evidence, share it with ALL teammates for debate.""")

Task(subagent_type="debug-investigator", name="impact-analyst",
     team_name="fix-issue-{number}",
     prompt="""Analyze the impact and blast radius for issue #{number}.
     When root-cause-tracer shares evidence, assess how many code paths are affected.
     Message test-planner with affected paths so they can plan regression tests.
     If the impact is larger than expected, message the lead immediately.""")

Task(subagent_type="backend-system-architect", name="backend-expert",
     team_name="fix-issue-{number}",
     prompt="""Investigate backend aspects of issue #{number}.
     When root-cause-tracer shares backend-related hypotheses, design the fix approach.
     Message frontend-expert if the fix affects API contracts.
     Share fix design with test-planner for test requirements.""")

Task(subagent_type="frontend-ui-developer", name="frontend-expert",
     team_name="fix-issue-{number}",
     prompt="""Investigate frontend aspects of issue #{number}.
     When root-cause-tracer shares frontend-related hypotheses, design the fix approach.
     If backend-expert changes API contracts, adapt the frontend fix accordingly.
     Share component changes with test-planner.""")

Task(subagent_type="test-generator", name="test-planner",
     team_name="fix-issue-{number}",
     prompt="""Plan regression tests for issue #{number}.
     When root-cause-tracer confirms the root cause, write a failing test that reproduces it.
     When backend-expert or frontend-expert share fix designs, plan verification tests.
     Start with the regression test BEFORE the fix is applied (TDD approach).""")

Team teardown after fix is implemented and validated:

SendMessage(type="shutdown_request", recipient="root-cause-tracer", content="Fix validated")
SendMessage(type="shutdown_request", recipient="impact-analyst", content="Fix validated")
SendMessage(type="shutdown_request", recipient="backend-expert", content="Fix validated")
SendMessage(type="shutdown_request", recipient="frontend-expert", content="Fix validated")
SendMessage(type="shutdown_request", recipient="test-planner", content="Fix validated")
TeamDelete()

Fallback: If team formation fails, use standard Phase 4 Task spawns.

Cc Enhancements

CC 2.1.27+ Enhancements for Fix Issue

Session Resume with PR Context

When you create a PR for the fix, the session is automatically linked:

# Later: Resume with full PR context
claude --from-pr 789

Task Metrics (CC 2.1.30)

Track RCA efficiency across the 5 parallel agents:

## Phase 4 Metrics (Root Cause Analysis)
| Agent | Tokens | Tools | Duration |
|-------|--------|-------|----------|
| debug-investigator #1 | 520 | 12 | 18s |
| debug-investigator #2 | 480 | 10 | 15s |
| backend-system-architect | 390 | 8 | 12s |

**Root cause found in:** 45s total

Tool Guidance (CC 2.1.31)

When investigating root cause:

Task	Use	Avoid
Read logs/files	`Read(file_path=...)`	`bash cat`
Search for errors	`Grep(pattern="ERROR")`	`bash grep`
Find affected files	`Glob(pattern="*/.py")`	`bash find`
Check git history	`Bash git log/diff`	(git needs bash)

Session Resume Hints (CC 2.1.31)

Before ending fix sessions, capture investigation context:

/ork:remember Issue #$ARGUMENTS RCA findings:
  Root cause: [one line]
  Confirmed by: [key evidence]
  Fix status: [implemented/pending]
  Prevention: [recommendation]

Resume later:

claude                              # Shows resume hint
/ork:memory search "issue $ARGUMENTS"  # Loads your findings

Fix Phases

Fix Issue: 11-Phase Workflow

Detailed procedures for each phase of the fix-issue workflow.

Phase 1: Understand the Issue

gh issue view $ARGUMENTS --json title,body,labels,assignees,comments
gh pr list --search "issue:$ARGUMENTS"
gh issue view $ARGUMENTS --comments

Start Work ceremony (from issue-progress-tracking): move issue to in-progress, comment on issue, ensure branch is named issue/N-description.

Phase 2: Similar Issue Detection

See Similar Issue Search for patterns.

gh issue list --search "[key error message]" --state all
mcp__memory__search_nodes(query="issue [error type] fix")

Similar Issue	Similarity	Status	Relevant?
#101	85%	Closed	Yes

Determine: Regression? Variant? New issue?

Phase 3: Hypothesis Formation

See Hypothesis-Based RCA for confidence scoring.

## Hypothesis 1: [Brief name]
**Confidence:** [0-100]%
**Description:** [What might cause the issue]
**Test:** [How to verify]

Confidence	Meaning
90-100%	Near certain
70-89%	Highly likely
50-69%	Probable
30-49%	Possible
0-29%	Unlikely

Phase 4: Root Cause Analysis (5 Agents)

Launch ALL 5 agents in parallel with run_in_background=True and max_turns=25:

debug-investigator: Root cause tracing
debug-investigator: Impact analysis
backend-system-architect: Backend fix design
frontend-ui-developer: Frontend fix design
test-generator: Test requirements

Each agent outputs structured JSON with findings and SUMMARY line.

Agent Teams Alternative

See agent-teams-rca.md for Agent Teams root cause analysis workflow.

Phase 5: Fix Design

## Fix Design for Issue #$ARGUMENTS

### Root Cause (Confirmed)
[Description]

### Proposed Fix
[Approach]

### Files to Modify
| File | Change | Reason |
|------|--------|--------|
| [file] | MODIFY | [why] |

### Risks
- [Risk 1]

### Rollback Plan
[How to revert]

Phase 6: Implementation

CRITICAL: Feature Branch Required

NEVER commit directly to main or dev. Always create a feature branch:

# Determine base branch
BASE_BRANCH=$(git remote show origin | grep 'HEAD branch' | cut -d: -f2 | tr -d ' ')

# Create feature branch (MANDATORY)
git checkout $BASE_BRANCH && git pull origin $BASE_BRANCH
git checkout -b issue/$ARGUMENTS-fix

CRITICAL: Regression Test Required

A fix without a test is incomplete. Add test BEFORE implementing fix:

# 1. Write test that reproduces the bug (should FAIL)
# 2. Implement the fix
# 3. Verify test now PASSES

Guidelines:

Make minimal, focused changes
Add proper error handling
Add regression test FIRST (MANDATORY)
DO NOT over-engineer
DO NOT commit directly to protected branches

Phase 7: Validation

# Backend
poetry run ruff format --check app/
poetry run pytest tests/unit/ -v --tb=short

# Frontend
npm run lint && npm run typecheck && npm run test

Phase 8: Prevention Recommendations

CRITICAL: Prevention must include at least one of:

Automated test - CI catches similar issues (PREFERRED)
Validation rule - Schema/lint rule prevents bad state
Process check - Review checklist item

See Prevention Patterns for full template.

Category	Examples	Effectiveness
Automated test	Unit/integration test in CI	HIGH - catches before merge
Validation rule	Schema check, lint rule	HIGH - catches on save/commit
Architecture	Better error boundaries	MEDIUM
Process	Review checklist item	LOW - human-dependent

Phase 9: Runbook Generation

# Runbook: [Issue Type]

## Symptoms
- [Observable symptom]

## Diagnosis Steps
1. Check [X] by running: `[command]`

## Resolution Steps
1. [Step 1]

## Prevention
- [How to prevent]

Store in memory for future reference.

Phase 10: Lessons Learned

mcp__memory__create_entities(entities=[{
  "name": "lessons-issue-$ARGUMENTS",
  "entityType": "LessonsLearned",
  "observations": [
    "root_cause: [brief]",
    "key_learning: [most important]",
    "prevention: [recommendation]"
  ]
}])

Phase 11: Commit and PR

git add .
git commit -m "fix(#$ARGUMENTS): [Brief description]

Root cause: [one line]
Prevention: [recommendation]"

git push -u origin issue/$ARGUMENTS-fix
gh pr create --base dev --title "fix(#$ARGUMENTS): [description]"

Hypothesis Rca

Hypothesis-Based Root Cause Analysis

Scientific method for identifying root causes with quantified confidence.

The Scientific Method for RCA

1. Observe symptoms
2. Form hypotheses
3. Gather evidence
4. Test hypotheses
5. Confirm or reject
6. Repeat until root cause found

Hypothesis Template

## Hypothesis: [Brief name]
**Confidence:** [0-100]%

**Description:**
[What might be causing the issue]

**Evidence For:**
- [Supporting evidence 1]
- [Supporting evidence 2]

**Evidence Against:**
- [Contradicting evidence 1]

**Test Plan:**
1. [Step to verify/refute]
2. [Expected outcome if true]

Confidence Score Guidelines

Score	Meaning	Evidence Required
90-100%	Near certain	Reproduction + multiple strong evidence
70-89%	Highly likely	Clear evidence, logical chain
50-69%	Probable	Some evidence, plausible mechanism
30-49%	Possible	Limited evidence, needs investigation
0-29%	Unlikely	Weak evidence, backup hypothesis

Evidence Classification

Type	Weight	Examples
Reproduction	+30%	Consistent reproduction steps
Code trace	+20%	Stack trace to specific line
Timing correlation	+15%	Issue appeared after deployment X
Log evidence	+15%	Error messages match hypothesis
Similar patterns	+10%	Same error in related code
User report	+5%	Consistent user descriptions

Contradicting Evidence

Evidence	Weight
Hypothesis disproven by test	-40%
Works in same conditions	-25%
Unrelated timing	-15%
No supporting logs	-10%

Multiple Hypothesis Comparison

| Hypothesis | Initial | After Test | Status |
|------------|---------|------------|--------|
| Race condition | 65% | 85% | INVESTIGATING |
| Null reference | 40% | 15% | REJECTED |
| Cache stale | 30% | 30% | ON HOLD |

Best Practices

Start with 3+ hypotheses - Avoid tunnel vision
Test highest confidence first - Efficient investigation
Update scores after each test - Track progress
Document rejected hypotheses - Prevent repeated investigation
Look for evidence against - Avoid confirmation bias

Prevention Patterns

Strategies to prevent issue recurrence by category.

Code-Level Prevention

Issue Type	Prevention Pattern
Null/undefined	Optional chaining, nullish coalescing
Type errors	Strict TypeScript, runtime validation
Input validation	Zod schemas at boundaries
Error handling	Result types, explicit error states
Race conditions	Locks, atomic operations, idempotency
Memory leaks	Cleanup in useEffect, WeakRef

// Before: Vulnerable
const name = user.profile.name;

// After: Defensive
const name = user?.profile?.name ?? 'Unknown';

Architecture-Level Prevention

Issue Type	Prevention Pattern
Cascading failures	Circuit breakers
Network instability	Retry with backoff
Data inconsistency	Transactions, saga pattern
Timeout issues	Request deadlines, cancellation
Resource exhaustion	Rate limiting, pooling

# Circuit breaker example
@circuit_breaker(failure_threshold=5, recovery_timeout=30)
async def external_api_call():
    ...

Process-Level Prevention

Issue Type	Prevention Pattern
Logic errors	Mandatory PR review
Missing tests	Coverage requirements (>80%)
Regression	Required regression test before fix
Knowledge gaps	ADR for decisions
Onboarding issues	Runbook documentation

Tooling-Level Prevention

Issue Type	Prevention Pattern
Style issues	ESLint/Ruff rules
Type errors	Pre-commit type check
Security vulnerabilities	Dependency scanning in CI
Format inconsistency	Auto-format on save
Secrets in code	Pre-commit secret detection

# .pre-commit-config.yaml
- repo: local
  hooks:
    - id: type-check
      name: TypeScript check
      entry: npx tsc --noEmit
      language: system

Prevention Priority Matrix

Effort	Impact	Priority
Low	High	Immediate
Low	Low	Backlog
High	High	Sprint planning
High	Low	Skip

Similar Issue Search

Find related past issues to leverage previous solutions and detect regressions.

GitHub Issue Search Patterns

# Search by error message
gh issue list --search "TypeError: Cannot read property" --state all

# Search by component/file
gh issue list --search "UserService" --state all --json number,title,state

# Search by label
gh issue list --label "bug" --state closed --limit 20

# Combined search
gh issue list --search "auth login 401" --state all --json number,title,closedAt

Memory/Knowledge Graph Queries

# Search for past fixes
mcp__memory__search_nodes(query="fix authentication error")

# Search by error type
mcp__memory__search_nodes(query="TypeError resolution")

# Search by component
mcp__memory__search_nodes(query="UserService bug")

Stack Trace Similarity Matching

Match by:

Exception type - Same error class
File/line - Same code location
Call stack depth - Similar execution path
Error message pattern - Regex match on message

Similarity Assessment Criteria

Factor	Weight	High Match
Same exception type	30%	Exact match
Same file	25%	Same file involved
Similar error message	20%	>80% string similarity
Same component	15%	Same service/module
Recent (< 30 days)	10%	Recently resolved

When to Reuse vs Investigate Fresh

Reuse Previous Solution When:

Similarity > 80%
Same root cause confirmed
Fix is still applicable
No code changes since fix

Investigate Fresh When:

Similarity < 60%
Context has changed significantly
Previous fix may be incomplete
New dependencies involved

Issue Classification

Type	Action
Regression	Same issue, fix reverted or bypassed
Variant	Similar pattern, different trigger
New	No similar issues found

Root cause identified with confidence >= 70%
Hypotheses documented (at least 2 considered)
Evidence for/against documented
Similar issues checked

Fix Verification

Regression test added
All existing tests pass
Fix manually verified
Edge cases covered

Prevention

Prevention recommendation documented
At least one prevention measure implemented or ticketed
Runbook entry created/updated

Knowledge Capture

Lessons learned stored in memory
RCA report generated (for high/critical issues)
Related issues linked

PR/Commit

Commit message includes issue number
Commit message describes root cause
PR links to issue with "Fixes #N"

Final Verification

# Quick verification commands
git log -1 --oneline  # Check commit message
gh pr checks          # Check CI status
gh issue view [N]     # Verify issue linked

Fix Issue

On this page