Skip to main content
OrchestKit v6.7.1 — 67 skills, 38 agents, 77 hooks with Opus 4.6 support
OrchestKit
Skills

Fix Issue

Fixes GitHub issues with parallel analysis. Use to debug errors, resolve regressions, fix bugs, or triage issues.

Command medium

Fix Issue

Systematic issue resolution with hypothesis-based root cause analysis, similar issue detection, and prevention recommendations.

Quick Start

/ork:fix-issue 123
/ork:fix-issue 456

Opus 4.6: Root cause analysis uses native adaptive thinking. Dynamic token budgets scale with context window for thorough investigation.

STEP 0: Verify User Intent

BEFORE creating tasks, clarify fix approach using AskUserQuestion. See rules/evidence-gathering.md for the full prompt template and workflow adjustments per approach (Proper fix, Quick fix, Investigate first, Hotfix).

STEP 0b: Select Orchestration Mode

Choose Agent Teams (mesh) or Task tool (star). See references/agent-selection.md for the selection criteria, cost comparison, and task creation patterns.

Workflow Overview

PhaseActivitiesOutput
1. Understand IssueRead GitHub issue detailsProblem statement
2. Similar Issue DetectionSearch for related past issuesRelated issues list
3. Hypothesis FormationForm hypotheses with confidence scoresRanked hypotheses
4. Root Cause Analysis5 parallel agents investigateConfirmed root cause
5. Fix DesignDesign approach based on RCAFix specification
6. ImplementationApply fix with testsWorking code
7. ValidationVerify fix resolves issueEvidence
8. PreventionHow to prevent recurrencePrevention plan
9. RunbookCreate/update runbook entryRunbook
10. Lessons LearnedCapture knowledgePersisted learnings
11. Commit and PRCreate PR with fixMerged PR

Full phase details: See references/fix-phases.md for bash commands, templates, and procedures for each phase.

Critical Constraints

  • Feature branch MANDATORY -- NEVER commit directly to main or dev
  • Regression test MANDATORY -- write failing test BEFORE implementing fix
  • Prevention required -- at least one of: automated test, validation rule, or process check
  • Make minimal, focused changes; DO NOT over-engineer

CC 2.1.49 Enhancements

See references/cc-enhancements.md for session resume, task metrics, tool guidance, worktree isolation, and adaptive thinking.

Rules Quick Reference

RuleImpactWhat It Covers
evidence-gatheringHIGHUser intent verification, confidence scale, key decisions
rca-five-whysHIGH5 Whys iterative causal analysis
rca-fishboneMEDIUMIshikawa diagram, multi-factor analysis
rca-fault-treeMEDIUMFault tree analysis, AND/OR gates, critical systems
  • ork:commit - Commit issue fixes
  • debug-investigator - Debug complex issues
  • ork:issue-progress-tracking - Auto-updates from commits
  • ork:remember - Store lessons learned

References


Version: 2.1.0 (February 2026)


Rules (4)

Evidence Gathering — HIGH

Evidence Gathering Patterns

Verify User Intent (STEP 0)

BEFORE creating tasks, clarify fix approach with AskUserQuestion:

AskUserQuestion(
  questions=[{
    "question": "What approach for this fix?",
    "header": "Approach",
    "options": [
      {"label": "Proper fix (Recommended)", "description": "Full RCA, tests, prevention recommendations"},
      {"label": "Quick fix", "description": "Minimal fix to resolve the immediate issue"},
      {"label": "Investigate first", "description": "Understand the issue before deciding on approach"},
      {"label": "Hotfix", "description": "Emergency patch, minimal testing"}
    ],
    "multiSelect": false
  }]
)

Based on answer, adjust workflow:

  • Proper fix: All 11 phases, parallel agents for RCA
  • Quick fix: Skip phases 8-10 (prevention, runbook, lessons)
  • Investigate first: Only phases 1-4 (understand, search, hypotheses, analyze)
  • Hotfix: Minimal phases, skip similar issue search

Hypothesis Confidence Scale

ConfidenceMeaning
90-100%Near certain
70-89%Highly likely
50-69%Probable
30-49%Possible
0-29%Unlikely

Key Decisions

DecisionChoiceRationale
Feature branchMANDATORYNever commit to main/dev directly
Regression testMANDATORYFix without test is incomplete
Hypothesis confidence0-100% scaleQuantifies certainty
Similar issue searchBefore hypothesisLeverage past solutions
Prevention analysisMandatory phaseBreak recurring issue cycle
Runbook generationTemplate-basedConsistent documentation

Map all failure paths with fault tree analysis to prevent recurring system failures — MEDIUM

Fault Tree Analysis (FTA)

Top-down, deductive analysis mapping all paths to a failure using boolean logic (AND/OR gates). Best for critical systems and exhaustive failure analysis.

FTA Symbols

SymbolMeaning
TOPTop event — the failure being analyzed
ANDAll inputs must occur for output
ORAny input causes output
Basic EventRoot cause (leaf node)
UndevelopedNeeds further analysis

Example: Authentication Failure

                USER CANNOT
                AUTHENTICATE
                     |
                   [OR]
        +------------+------------+
        |            |            |
    Invalid      Auth Service   Account
   Credentials     Down         Locked
        |            |
      [OR]         [OR]
    +---+---+    +---+---+
    |   |   |    |   |   |
   Wrong Expired Token DB  Redis External
   Pass  Token  Invalid Down Down  Auth

Building a Fault Tree

  1. Define top event — the failure to analyze
  2. Ask "what causes this?" — list immediate causes
  3. Classify as AND/OR — do ALL causes need to happen, or ANY one?
  4. Decompose each cause — repeat until reaching basic events
  5. Identify minimal cut sets — smallest combinations that cause failure
  6. Prioritize by probability — most likely paths first

Minimal Cut Sets

The smallest set of basic events that together cause the top event:

Top: User Cannot Authenticate (OR gate)
  Cut Set 1: {Wrong Password}         — single point of failure
  Cut Set 2: {Expired Token}          — single point of failure
  Cut Set 3: {DB Down}                — single point of failure
  Cut Set 4: {Account Locked}         — single point of failure

Single-event cut sets indicate no redundancy — add defense-in-depth.

When to Use FTA

ScenarioUse FTA?
Safety-critical system failureYes
Need exhaustive failure path mappingYes
Complex multi-component failureYes
Simple linear bugNo — use 5 Whys
Multiple contributing factorsMaybe — Fishbone first
Regulatory compliance analysisYes
Post-incident for serious outagesYes

Incorrect — stopping at high-level causes without decomposition:

USER CANNOT AUTHENTICATE
         |
       [OR]
    +----+----+
    |         |
Auth Service  Account
   Down       Locked

Correct — decompose to basic events with AND/OR gates:

                USER CANNOT
                AUTHENTICATE
                     |
                   [OR]
        +------------+------------+
        |            |            |
    Invalid      Auth Service   Account
   Credentials     Down         Locked
        |            |
      [OR]         [OR]
    +---+---+    +---+---+
    |   |   |    |   |   |
   Wrong Expired Token DB  Redis External
   Pass  Token  Invalid Down Down  Auth

Minimal Cut Sets identified:
  {Wrong Password}, {Expired Token}, {DB Down}, {Account Locked}
  → All single-event cuts = no redundancy, needs defense-in-depth

Key Rules

  • Start from the top event (failure) and work downward
  • Every gate must be classified as AND (all required) or OR (any sufficient)
  • Decompose until reaching basic events (actionable root causes)
  • Identify minimal cut sets to find the most vulnerable paths
  • Single-event cut sets indicate missing redundancy
  • Use for critical systems where exhaustive analysis is justified

Analyze multi-factor problems with fishbone diagrams to avoid single-cause fixation — MEDIUM

Fishbone Diagram (Ishikawa)

Visualize multiple potential causes organized by category. Best for problems with several contributing factors.

Software-Specific Categories

                    +-------------+
          Code -----+             |
                    |             |
 Infrastructure ----+             +---- BUG/INCIDENT
                    |             |
   Dependencies ----+             |
                    |             |
   Configuration ---+             |
                    |             |
        Process ----+             |
                    |             |
        People -----+             |
                    +-------------+

Example: API Latency Spike

CategoryPotential Causes
CodeN+1 query, missing index, sync blocking call
InfrastructureDB connection pool exhausted, network saturation, insufficient RAM
DependenciesExternal API slow, Redis timeout, CDN cache miss
ConfigurationWrong pool size, missing timeout, debug logging on
ProcessNo load testing, no perf regression CI
PeopleUnfamiliarity with query optimizer, missing review

Fishbone Process

  1. Define the problem clearly (the fish head)
  2. Identify major categories (the bones) — use software categories above
  3. Brainstorm causes for each category
  4. Analyze relationships between causes across categories
  5. Prioritize most likely root causes by evidence
  6. Verify with data, metrics, or targeted testing
  7. Take action on confirmed causes

When to Use Fishbone

ScenarioUse Fishbone?
Multiple things went wrongYes
Problem has one clear causeNo — use 5 Whys
Team brainstorming sessionYes
Safety-critical failure analysisNo — use Fault Tree
Recurring issue with no clear patternYes

Incorrect — jumping to one cause without category analysis:

### API Latency Spike Analysis

**Root Cause:** N+1 query in user endpoint
**Fix:** Add query optimization

Correct — fishbone analysis across all categories:

### API Latency Spike — Fishbone Analysis

**Code:**
- N+1 query in user endpoint (CONFIRMED via query log)
- Sync blocking call to external API

**Infrastructure:**
- DB connection pool exhausted (CONFIRMED: 0 available connections)
- Network saturation (ruled out: 20% utilization)

**Dependencies:**
- Redis timeout increased (ruled out: within SLA)

**Configuration:**
- Connection pool size too small (CONFIRMED: 10 max, need 50)

**Process:**
- No load testing in CI (process gap)

**Root Causes (cross-category):**
1. N+1 query (Code) + small pool (Config) = exhaustion
2. Missing load tests (Process) = undetected before prod

**Actions:**
- Fix N+1 query immediately
- Increase pool size 10 → 50
- Add load tests to CI

Key Rules

  • Use software-specific categories (Code, Infrastructure, Dependencies, Configuration, Process, People)
  • Brainstorm causes per category before analyzing relationships
  • Look for cross-category interactions (e.g., code + config)
  • Prioritize by evidence, not by assumption
  • Verify top candidates with data or experiments before committing to a fix

Apply the 5 Whys technique to reach root causes instead of fixing symptoms — HIGH

5 Whys Technique

Iteratively ask "why" to drill down from symptom to root cause. Simple, fast, and effective for linear causal chains.

Process

Problem Statement: [Clear description of the issue]
    |
    v
Why #1: [First level cause]
    |
    v
Why #2: [Deeper cause]
    |
    v
Why #3: [Even deeper]
    |
    v
Why #4: [Getting to root]
    |
    v
Why #5: [Root cause identified]
    |
    v
Action: [Fix that addresses root cause]

Example: Production Outage

**Problem:** Website was down for 2 hours

**Why 1:** The application server ran out of memory and crashed.
**Why 2:** A memory leak in the image processing service accumulated over time.
**Why 3:** The service wasn't releasing image buffers after processing.
**Why 4:** The cleanup code had a bug introduced in last week's release.
**Why 5:** We don't have automated memory leak detection in our test suite.

**Root Cause:** Missing automated memory leak testing
**Action:** Add memory profiling to CI pipeline, add cleanup tests

Best Practices

DoDon't
Base answers on evidenceGuess or assume
Stay focused on one causal chainBranch too early
Keep asking until actionableStop at symptoms
Involve people closest to issueAssign blame
Document your reasoningSkip steps

When 5 Whys Falls Short

  • Multiple contributing factors — use Fishbone diagram instead
  • Complex system interactions — use Fault Tree Analysis
  • Organizational/process issues — needs broader systemic analysis
  • Concurrent failures — 5 Whys assumes linear causation

Incorrect — stopping at symptom without root cause:

**Problem:** Website was down for 2 hours

**Why 1:** The application server crashed.
**Action:** Restart the server

Correct — drilling down to root cause with 5 Whys:

**Problem:** Website was down for 2 hours

**Why 1:** The application server ran out of memory and crashed.
  Evidence: Out-of-memory error in logs

**Why 2:** A memory leak in the image processing service accumulated over time.
  Evidence: Memory usage increased 2GB/hour in metrics

**Why 3:** The service wasn't releasing image buffers after processing.
  Evidence: Code review shows missing .dispose() calls

**Why 4:** The cleanup code had a bug introduced in last week's release.
  Evidence: Git blame + diff shows removal of cleanup in PR #234

**Why 5:** We don't have automated memory leak detection in our test suite.
  Evidence: No memory profiling in CI pipeline

**Root Cause:** Missing automated memory leak testing
**Actions:**
- Add memory profiling to CI pipeline
- Add cleanup tests for image processing
- Revert PR #234's cleanup removal

Key Rules

  • Always start with a clear, specific problem statement
  • Each "why" must be supported by evidence (logs, metrics, code)
  • Stop when you reach an actionable root cause (not always exactly 5)
  • The fix should address the root cause, not the symptom
  • Document the full chain for knowledge sharing

References (7)

Agent Selection

Agent Selection & Orchestration Mode

Orchestration Mode Selection

Choose Agent Teams (mesh -- RCA agents share hypotheses) or Task tool (star -- all report to lead):

  1. CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 -> Agent Teams mode
  2. Agent Teams unavailable -> Task tool mode (default)
  3. Otherwise: Complex cross-cutting bugs (backend + frontend + tests involved) -> recommend Agent Teams; Focused bugs (single domain) -> Task tool
AspectTask ToolAgent Teams
Hypothesis sharingLead relays between agentsInvestigators share hypotheses in real-time
Conflicting evidenceLead resolvesInvestigators debate directly
Cost~250K tokens~600K tokens
Best forSingle-domain bugsCross-cutting bugs with multiple hypotheses

Fallback: If Agent Teams encounters issues, fall back to Task tool for remaining investigation.

RCA Agent Roster (Phase 4)

Launch ALL 5 agents in parallel with run_in_background=True and max_turns=25:

#AgentRole
1debug-investigatorRoot cause tracing
2debug-investigatorImpact analysis
3backend-system-architectBackend fix design
4frontend-ui-developerFrontend fix design
5test-generatorTest requirements

Each agent outputs structured JSON with findings and SUMMARY line.

Task Management (CC 2.1.16)

# Create main fix task
TaskCreate(
  subject="Fix issue #{number}",
  description="Systematic issue resolution with hypothesis-based RCA",
  activeForm="Fixing issue #{number}"
)

# Create subtasks for 11-phase process
phases = ["Understand issue", "Search similar issues", "Form hypotheses",
          "Analyze root cause", "Design fix", "Implement fix", "Validate fix",
          "Generate prevention", "Create runbook", "Capture lessons", "Commit and PR"]
for phase in phases:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

Agent Teams Rca

Agent Teams RCA Workflow

In Agent Teams mode, form an investigation team where RCA agents share hypotheses and evidence in real-time:

TeamCreate(team_name="fix-issue-{number}", description="RCA for issue #{number}")

Task(subagent_type="debug-investigator", name="root-cause-tracer",
     team_name="fix-issue-{number}",
     prompt="""Trace the root cause for issue #{number}: {issue description}
     Hypotheses: {hypothesis list from Phase 3}
     Test each hypothesis. When you find evidence supporting or refuting a hypothesis,
     message impact-analyst and the relevant domain expert (backend-expert or frontend-expert).
     If you find conflicting evidence, share it with ALL teammates for debate.""")

Task(subagent_type="debug-investigator", name="impact-analyst",
     team_name="fix-issue-{number}",
     prompt="""Analyze the impact and blast radius for issue #{number}.
     When root-cause-tracer shares evidence, assess how many code paths are affected.
     Message test-planner with affected paths so they can plan regression tests.
     If the impact is larger than expected, message the lead immediately.""")

Task(subagent_type="backend-system-architect", name="backend-expert",
     team_name="fix-issue-{number}",
     prompt="""Investigate backend aspects of issue #{number}.
     When root-cause-tracer shares backend-related hypotheses, design the fix approach.
     Message frontend-expert if the fix affects API contracts.
     Share fix design with test-planner for test requirements.""")

Task(subagent_type="frontend-ui-developer", name="frontend-expert",
     team_name="fix-issue-{number}",
     prompt="""Investigate frontend aspects of issue #{number}.
     When root-cause-tracer shares frontend-related hypotheses, design the fix approach.
     If backend-expert changes API contracts, adapt the frontend fix accordingly.
     Share component changes with test-planner.""")

Task(subagent_type="test-generator", name="test-planner",
     team_name="fix-issue-{number}",
     prompt="""Plan regression tests for issue #{number}.
     When root-cause-tracer confirms the root cause, write a failing test that reproduces it.
     When backend-expert or frontend-expert share fix designs, plan verification tests.
     Start with the regression test BEFORE the fix is applied (TDD approach).""")

Team teardown after fix is implemented and validated:

SendMessage(type="shutdown_request", recipient="root-cause-tracer", content="Fix validated")
SendMessage(type="shutdown_request", recipient="impact-analyst", content="Fix validated")
SendMessage(type="shutdown_request", recipient="backend-expert", content="Fix validated")
SendMessage(type="shutdown_request", recipient="frontend-expert", content="Fix validated")
SendMessage(type="shutdown_request", recipient="test-planner", content="Fix validated")
TeamDelete()

Fallback: If team formation fails, use standard Phase 4 Task spawns.

Cc Enhancements

CC 2.1.27+ Enhancements for Fix Issue

Session Resume with PR Context

When you create a PR for the fix, the session is automatically linked:

# Later: Resume with full PR context
claude --from-pr 789

Task Metrics (CC 2.1.30)

Track RCA efficiency across the 5 parallel agents:

## Phase 4 Metrics (Root Cause Analysis)
| Agent | Tokens | Tools | Duration |
|-------|--------|-------|----------|
| debug-investigator #1 | 520 | 12 | 18s |
| debug-investigator #2 | 480 | 10 | 15s |
| backend-system-architect | 390 | 8 | 12s |

**Root cause found in:** 45s total

Tool Guidance (CC 2.1.31)

When investigating root cause:

TaskUseAvoid
Read logs/filesRead(file_path=...)bash cat
Search for errorsGrep(pattern="ERROR")bash grep
Find affected filesGlob(pattern="**/*.py")bash find
Check git historyBash git log/diff(git needs bash)

Session Resume Hints (CC 2.1.31)

Before ending fix sessions, capture investigation context:

/ork:remember Issue #$ARGUMENTS RCA findings:
  Root cause: [one line]
  Confirmed by: [key evidence]
  Fix status: [implemented/pending]
  Prevention: [recommendation]

Resume later:

claude                              # Shows resume hint
/ork:memory search "issue $ARGUMENTS"  # Loads your findings

Fix Phases

Fix Issue: 11-Phase Workflow

Detailed procedures for each phase of the fix-issue workflow.


Phase 1: Understand the Issue

gh issue view $ARGUMENTS --json title,body,labels,assignees,comments
gh pr list --search "issue:$ARGUMENTS"
gh issue view $ARGUMENTS --comments

Start Work ceremony (from issue-progress-tracking): move issue to in-progress, comment on issue, ensure branch is named issue/N-description.


Phase 2: Similar Issue Detection

See Similar Issue Search for patterns.

gh issue list --search "[key error message]" --state all
mcp__memory__search_nodes(query="issue [error type] fix")
Similar IssueSimilarityStatusRelevant?
#10185%ClosedYes

Determine: Regression? Variant? New issue?


Phase 3: Hypothesis Formation

See Hypothesis-Based RCA for confidence scoring.

## Hypothesis 1: [Brief name]
**Confidence:** [0-100]%
**Description:** [What might cause the issue]
**Test:** [How to verify]
ConfidenceMeaning
90-100%Near certain
70-89%Highly likely
50-69%Probable
30-49%Possible
0-29%Unlikely

Phase 4: Root Cause Analysis (5 Agents)

Launch ALL 5 agents in parallel with run_in_background=True and max_turns=25:

  1. debug-investigator: Root cause tracing
  2. debug-investigator: Impact analysis
  3. backend-system-architect: Backend fix design
  4. frontend-ui-developer: Frontend fix design
  5. test-generator: Test requirements

Each agent outputs structured JSON with findings and SUMMARY line.

Agent Teams Alternative

See agent-teams-rca.md for Agent Teams root cause analysis workflow.


Phase 5: Fix Design

## Fix Design for Issue #$ARGUMENTS

### Root Cause (Confirmed)
[Description]

### Proposed Fix
[Approach]

### Files to Modify
| File | Change | Reason |
|------|--------|--------|
| [file] | MODIFY | [why] |

### Risks
- [Risk 1]

### Rollback Plan
[How to revert]

Phase 6: Implementation

CRITICAL: Feature Branch Required

NEVER commit directly to main or dev. Always create a feature branch:

# Determine base branch
BASE_BRANCH=$(git remote show origin | grep 'HEAD branch' | cut -d: -f2 | tr -d ' ')

# Create feature branch (MANDATORY)
git checkout $BASE_BRANCH && git pull origin $BASE_BRANCH
git checkout -b issue/$ARGUMENTS-fix

CRITICAL: Regression Test Required

A fix without a test is incomplete. Add test BEFORE implementing fix:

# 1. Write test that reproduces the bug (should FAIL)
# 2. Implement the fix
# 3. Verify test now PASSES

Guidelines:

  • Make minimal, focused changes
  • Add proper error handling
  • Add regression test FIRST (MANDATORY)
  • DO NOT over-engineer
  • DO NOT commit directly to protected branches

Phase 7: Validation

# Backend
poetry run ruff format --check app/
poetry run pytest tests/unit/ -v --tb=short

# Frontend
npm run lint && npm run typecheck && npm run test

Phase 8: Prevention Recommendations

CRITICAL: Prevention must include at least one of:

  1. Automated test - CI catches similar issues (PREFERRED)
  2. Validation rule - Schema/lint rule prevents bad state
  3. Process check - Review checklist item

See Prevention Patterns for full template.

CategoryExamplesEffectiveness
Automated testUnit/integration test in CIHIGH - catches before merge
Validation ruleSchema check, lint ruleHIGH - catches on save/commit
ArchitectureBetter error boundariesMEDIUM
ProcessReview checklist itemLOW - human-dependent

Phase 9: Runbook Generation

# Runbook: [Issue Type]

## Symptoms
- [Observable symptom]

## Diagnosis Steps
1. Check [X] by running: `[command]`

## Resolution Steps
1. [Step 1]

## Prevention
- [How to prevent]

Store in memory for future reference.


Phase 10: Lessons Learned

mcp__memory__create_entities(entities=[{
  "name": "lessons-issue-$ARGUMENTS",
  "entityType": "LessonsLearned",
  "observations": [
    "root_cause: [brief]",
    "key_learning: [most important]",
    "prevention: [recommendation]"
  ]
}])

Phase 11: Commit and PR

git add .
git commit -m "fix(#$ARGUMENTS): [Brief description]

Root cause: [one line]
Prevention: [recommendation]"

git push -u origin issue/$ARGUMENTS-fix
gh pr create --base dev --title "fix(#$ARGUMENTS): [description]"

Hypothesis Rca

Hypothesis-Based Root Cause Analysis

Scientific method for identifying root causes with quantified confidence.

The Scientific Method for RCA

1. Observe symptoms
2. Form hypotheses
3. Gather evidence
4. Test hypotheses
5. Confirm or reject
6. Repeat until root cause found

Hypothesis Template

## Hypothesis: [Brief name]
**Confidence:** [0-100]%

**Description:**
[What might be causing the issue]

**Evidence For:**
- [Supporting evidence 1]
- [Supporting evidence 2]

**Evidence Against:**
- [Contradicting evidence 1]

**Test Plan:**
1. [Step to verify/refute]
2. [Expected outcome if true]

Confidence Score Guidelines

ScoreMeaningEvidence Required
90-100%Near certainReproduction + multiple strong evidence
70-89%Highly likelyClear evidence, logical chain
50-69%ProbableSome evidence, plausible mechanism
30-49%PossibleLimited evidence, needs investigation
0-29%UnlikelyWeak evidence, backup hypothesis

Evidence Classification

TypeWeightExamples
Reproduction+30%Consistent reproduction steps
Code trace+20%Stack trace to specific line
Timing correlation+15%Issue appeared after deployment X
Log evidence+15%Error messages match hypothesis
Similar patterns+10%Same error in related code
User report+5%Consistent user descriptions

Contradicting Evidence

EvidenceWeight
Hypothesis disproven by test-40%
Works in same conditions-25%
Unrelated timing-15%
No supporting logs-10%

Multiple Hypothesis Comparison

| Hypothesis | Initial | After Test | Status |
|------------|---------|------------|--------|
| Race condition | 65% | 85% | INVESTIGATING |
| Null reference | 40% | 15% | REJECTED |
| Cache stale | 30% | 30% | ON HOLD |

Best Practices

  1. Start with 3+ hypotheses - Avoid tunnel vision
  2. Test highest confidence first - Efficient investigation
  3. Update scores after each test - Track progress
  4. Document rejected hypotheses - Prevent repeated investigation
  5. Look for evidence against - Avoid confirmation bias

Prevention Patterns

Prevention Patterns

Strategies to prevent issue recurrence by category.

Code-Level Prevention

Issue TypePrevention Pattern
Null/undefinedOptional chaining, nullish coalescing
Type errorsStrict TypeScript, runtime validation
Input validationZod schemas at boundaries
Error handlingResult types, explicit error states
Race conditionsLocks, atomic operations, idempotency
Memory leaksCleanup in useEffect, WeakRef
// Before: Vulnerable
const name = user.profile.name;

// After: Defensive
const name = user?.profile?.name ?? 'Unknown';

Architecture-Level Prevention

Issue TypePrevention Pattern
Cascading failuresCircuit breakers
Network instabilityRetry with backoff
Data inconsistencyTransactions, saga pattern
Timeout issuesRequest deadlines, cancellation
Resource exhaustionRate limiting, pooling
# Circuit breaker example
@circuit_breaker(failure_threshold=5, recovery_timeout=30)
async def external_api_call():
    ...

Process-Level Prevention

Issue TypePrevention Pattern
Logic errorsMandatory PR review
Missing testsCoverage requirements (>80%)
RegressionRequired regression test before fix
Knowledge gapsADR for decisions
Onboarding issuesRunbook documentation

Tooling-Level Prevention

Issue TypePrevention Pattern
Style issuesESLint/Ruff rules
Type errorsPre-commit type check
Security vulnerabilitiesDependency scanning in CI
Format inconsistencyAuto-format on save
Secrets in codePre-commit secret detection
# .pre-commit-config.yaml
- repo: local
  hooks:
    - id: type-check
      name: TypeScript check
      entry: npx tsc --noEmit
      language: system

Prevention Priority Matrix

EffortImpactPriority
LowHighImmediate
LowLowBacklog
HighHighSprint planning
HighLowSkip

Similar Issue Search

Find related past issues to leverage previous solutions and detect regressions.

GitHub Issue Search Patterns

# Search by error message
gh issue list --search "TypeError: Cannot read property" --state all

# Search by component/file
gh issue list --search "UserService" --state all --json number,title,state

# Search by label
gh issue list --label "bug" --state closed --limit 20

# Combined search
gh issue list --search "auth login 401" --state all --json number,title,closedAt

Memory/Knowledge Graph Queries

# Search for past fixes
mcp__memory__search_nodes(query="fix authentication error")

# Search by error type
mcp__memory__search_nodes(query="TypeError resolution")

# Search by component
mcp__memory__search_nodes(query="UserService bug")

Stack Trace Similarity Matching

Match by:

  1. Exception type - Same error class
  2. File/line - Same code location
  3. Call stack depth - Similar execution path
  4. Error message pattern - Regex match on message

Similarity Assessment Criteria

FactorWeightHigh Match
Same exception type30%Exact match
Same file25%Same file involved
Similar error message20%>80% string similarity
Same component15%Same service/module
Recent (< 30 days)10%Recently resolved

When to Reuse vs Investigate Fresh

Reuse Previous Solution When:

  • Similarity > 80%
  • Same root cause confirmed
  • Fix is still applicable
  • No code changes since fix

Investigate Fresh When:

  • Similarity < 60%
  • Context has changed significantly
  • Previous fix may be incomplete
  • New dependencies involved

Issue Classification

TypeAction
RegressionSame issue, fix reverted or bypassed
VariantSimilar pattern, different trigger
NewNo similar issues found

Checklists (1)

Fix Complete Checklist

Fix Complete Checklist

Verify all aspects of issue resolution before closing.

Root Cause Analysis

  • Root cause identified with confidence >= 70%
  • Hypotheses documented (at least 2 considered)
  • Evidence for/against documented
  • Similar issues checked

Fix Verification

  • Regression test added
  • All existing tests pass
  • Fix manually verified
  • Edge cases covered

Prevention

  • Prevention recommendation documented
  • At least one prevention measure implemented or ticketed
  • Runbook entry created/updated

Knowledge Capture

  • Lessons learned stored in memory
  • RCA report generated (for high/critical issues)
  • Related issues linked

PR/Commit

  • Commit message includes issue number
  • Commit message describes root cause
  • PR links to issue with "Fixes #N"

Final Verification

# Quick verification commands
git log -1 --oneline  # Check commit message
gh pr checks          # Check CI status
gh issue view [N]     # Verify issue linked
Edit on GitHub

Last updated on

On this page

Related SkillsFix IssueQuick StartSTEP 0: Verify User IntentSTEP 0b: Select Orchestration ModeWorkflow OverviewCritical ConstraintsCC 2.1.49 EnhancementsRules Quick ReferenceRelated SkillsReferencesRules (4)Evidence Gathering — HIGHEvidence Gathering PatternsVerify User Intent (STEP 0)Hypothesis Confidence ScaleKey DecisionsMap all failure paths with fault tree analysis to prevent recurring system failures — MEDIUMFault Tree Analysis (FTA)FTA SymbolsExample: Authentication FailureBuilding a Fault TreeMinimal Cut SetsWhen to Use FTAKey RulesAnalyze multi-factor problems with fishbone diagrams to avoid single-cause fixation — MEDIUMFishbone Diagram (Ishikawa)Software-Specific CategoriesExample: API Latency SpikeFishbone ProcessWhen to Use FishboneKey RulesApply the 5 Whys technique to reach root causes instead of fixing symptoms — HIGH5 Whys TechniqueProcessExample: Production OutageBest PracticesWhen 5 Whys Falls ShortKey RulesReferences (7)Agent SelectionAgent Selection & Orchestration ModeOrchestration Mode SelectionRCA Agent Roster (Phase 4)Task Management (CC 2.1.16)Agent Teams RcaAgent Teams RCA WorkflowCc EnhancementsCC 2.1.27+ Enhancements for Fix IssueSession Resume with PR ContextTask Metrics (CC 2.1.30)Tool Guidance (CC 2.1.31)Session Resume Hints (CC 2.1.31)Fix PhasesFix Issue: 11-Phase WorkflowPhase 1: Understand the IssuePhase 2: Similar Issue DetectionPhase 3: Hypothesis FormationPhase 4: Root Cause Analysis (5 Agents)Agent Teams AlternativePhase 5: Fix DesignPhase 6: ImplementationCRITICAL: Feature Branch RequiredCRITICAL: Regression Test RequiredPhase 7: ValidationPhase 8: Prevention RecommendationsPhase 9: Runbook GenerationPhase 10: Lessons LearnedPhase 11: Commit and PRHypothesis RcaHypothesis-Based Root Cause AnalysisThe Scientific Method for RCAHypothesis TemplateConfidence Score GuidelinesEvidence ClassificationContradicting EvidenceMultiple Hypothesis ComparisonBest PracticesPrevention PatternsPrevention PatternsCode-Level PreventionArchitecture-Level PreventionProcess-Level PreventionTooling-Level PreventionPrevention Priority MatrixSimilar Issue SearchSimilar Issue SearchGitHub Issue Search PatternsMemory/Knowledge Graph QueriesStack Trace Similarity MatchingSimilarity Assessment CriteriaWhen to Reuse vs Investigate FreshIssue ClassificationChecklists (1)Fix Complete ChecklistFix Complete ChecklistRoot Cause AnalysisFix VerificationPreventionKnowledge CapturePR/CommitFinal Verification