Skip to main content
OrchestKit v6.7.1 — 67 skills, 38 agents, 77 hooks with Opus 4.6 support
OrchestKit
Skills

Skill Evolution

Analyzes skill usage patterns and suggests improvements. Use when reviewing skill performance, applying auto-suggested changes, or rolling back versions.

Command medium

Skill Evolution Manager

Enables skills to automatically improve based on usage patterns, user edits, and success rates. Provides version control with safe rollback capability.

Overview

  • Reviewing how skills are performing across sessions
  • Identifying patterns in user edits to skill outputs
  • Applying learned improvements to skill templates
  • Rolling back problematic skill changes
  • Tracking skill version history and success rates

Quick Reference

CommandDescription
/ork:skill-evolutionShow evolution report for all skills
/ork:skill-evolution analyze <skill-id>Analyze specific skill patterns
/ork:skill-evolution evolve <skill-id>Review and apply suggestions
/ork:skill-evolution history <skill-id>Show version history
/ork:skill-evolution rollback <skill-id> <version>Restore previous version

How It Works

The skill evolution system operates in three phases:

COLLECT                    ANALYZE                    ACT
───────                    ───────                    ───
┌─────────────┐           ┌─────────────┐           ┌─────────────┐
│ PostTool    │──────────▶│ Evolution   │──────────▶│ /ork:skill- │
│ Edit        │  patterns │ Analyzer    │ suggest   │ evolution   │
│ Tracker     │           │ Engine      │           │ command     │
└─────────────┘           └─────────────┘           └─────────────┘
     │                          │                          │
     ▼                          ▼                          ▼
┌─────────────┐           ┌─────────────┐           ┌─────────────┐
│ edit-       │           │ evolution-  │           │ versions/   │
│ patterns.   │           │ registry.   │           │ snapshots   │
│ jsonl       │           │ json        │           │             │
└─────────────┘           └─────────────┘           └─────────────┘

See Pattern Detection Heuristics for tracked edit patterns and detection regexes. See Confidence Scoring for suggestion thresholds.


Subcommands

Each subcommand is documented with implementation details, shell commands, and sample output in the Evolution Commands Reference.

Report (Default)

/ork:skill-evolution — Shows evolution report for all tracked skills with usage counts, success rates, and pending suggestions.

Analyze

/ork:skill-evolution analyze <skill-id> — Deep-dives into edit patterns for a specific skill, showing frequency, sample counts, and confidence scores.

Evolve

/ork:skill-evolution evolve <skill-id> — Interactive review of improvement suggestions. Uses AskUserQuestion for each suggestion (Apply / Skip / Reject). Creates version snapshot before applying.

History

/ork:skill-evolution history <skill-id> — Shows version history with performance metrics per version.

Rollback

/ork:skill-evolution rollback <skill-id> <version> — Restores a previous version after confirmation. Current version is backed up automatically.


Data Files

FilePurposeFormat
.claude/feedback/edit-patterns.jsonlRaw edit pattern eventsJSONL (append-only)
.claude/feedback/evolution-registry.jsonAggregated suggestionsJSON
.claude/feedback/metrics.jsonSkill usage metricsJSON
skills/<cat>/<name>/versions/Version snapshotsDirectory
skills/<cat>/<name>/versions/manifest.jsonVersion metadataJSON

Auto-Evolution Safety

See Auto-Evolution Triggers for full safety mechanisms, health monitoring, and trigger criteria.

Key safeguards: version snapshots before changes, auto-alert on >20% success rate drop, human review required, rejected suggestions never re-suggested.


References

Rules


  • ork:configure - Configure OrchestKit settings
  • ork:doctor - Diagnose OrchestKit issues
  • feedback-dashboard - View comprehensive feedback metrics

Rules (3)

Auto-Evolution Triggers — HIGH

Auto-Evolution Safety & Trigger Criteria

Safety Mechanisms

  1. Version Snapshots: Always created before changes
  2. Rollback Triggers: Auto-alert if success rate drops >20%
  3. Human Review: High-confidence suggestions require approval
  4. Rejection Memory: Rejected suggestions are never re-suggested

Health Monitoring

The system monitors skill health and can trigger warnings:

WARNING: api-design-framework success rate dropped from 94% to 71%
Consider: /ork:skill-evolution rollback api-design-framework 1.1.0

When Auto-Evolution Activates

  • Pattern frequency exceeds the Add Threshold (70%)
  • At least Minimum Samples (5) uses recorded
  • No prior rejection for the same pattern on the same skill
  • Current skill version success rate is stable (no recent drops)

When Rollback Is Triggered

  • Success rate drops more than 20% after an evolution
  • Alert is surfaced in the next report or analyze invocation
  • User is prompted to rollback via AskUserQuestion

Confidence Scoring — HIGH

Confidence Scoring & Suggestion Thresholds

Thresholds

ThresholdDefaultDescription
Minimum Samples5Uses before generating suggestions
Add Threshold70%Frequency to suggest adding pattern
Auto-Apply Confidence85%Confidence for auto-application
Rollback Trigger-20%Success rate drop to trigger rollback

Confidence Calculation

Confidence is calculated as the ratio of users who apply a pattern to total uses:

confidence = pattern_frequency / total_uses
  • Below 70%: Pattern tracked but no suggestion generated
  • 70%-84%: Suggestion generated, requires human approval via evolve subcommand
  • 85%+: Auto-apply eligible (still requires human confirmation via AskUserQuestion)

Suggestion States

Suggestions progress through: pendingapplied | rejected

  • Applied: Pattern added to skill template, version bumped
  • Rejected: Marked in registry, never re-suggested for this skill

Pattern Detection Heuristics — HIGH

Edit Pattern Detection Heuristics

The system tracks these common edit patterns users apply after skill output:

PatternDescriptionDetection Regex
add_paginationUser adds pagination to API responseslimit.*offset, cursor.*pagination
add_rate_limitingUser adds rate limitingrate.?limit, throttl
add_error_handlingUser adds try/catch blockstry.*catch, except
add_typesUser adds TypeScript/Python typesinterface\s, Optional
add_validationUser adds input validationvalidate, Pydantic, Zod
add_loggingUser adds logging/observabilitylogger\., console.log
remove_commentsUser removes generated commentsPattern removal detection
add_auth_checkUser adds authentication checks@auth, @require_auth

How Detection Works

The PostTool Edit Tracker hook monitors file edits after skill invocations. When a user edits skill output, the edit is classified against the patterns above using regex matching. Results are appended to .claude/feedback/edit-patterns.jsonl.


References (3)

Evolution Analysis

Evolution Analysis Methodology

Reference guide for understanding how the skill evolution system analyzes patterns and generates suggestions.

Pattern Detection Algorithm

1. Data Collection (PostTool Hook)

When a Write or Edit tool is used after a skill was recently loaded:

IF skill_loaded_within(5_minutes) AND tool IN (Write, Edit):
    content = get_edit_content()
    patterns = detect_patterns(content)
    IF patterns.length > 0:
        log_to_edit_patterns_jsonl(skill_id, patterns)

2. Pattern Matching

The system uses regex patterns to categorize edits:

PATTERN_DETECTORS=(
    ["add_pagination"]="limit.*offset|page.*size|cursor.*pagination|Paginated"
    ["add_rate_limiting"]="rate.?limit|throttl|RateLimiter|requests.?per"
    ["add_caching"]="@cache|cache_key|TTL|redis|memcache|@cached"
    ["add_retry_logic"]="retry|backoff|max_attempts|tenacity|Retry"
    ["add_error_handling"]="try.*catch|except|raise.*Exception|throw.*Error"
    ["add_validation"]="validate|Validator|@validate|Pydantic|Zod|yup"
    ["add_logging"]="logger\.|logging\.|console\.log|winston|pino"
    ["add_types"]=": *(str|int|bool|List|Dict|Optional)|interface\s|type\s.*="
    ["add_auth_check"]="@auth|@require_auth|isAuthenticated|requiresAuth"
    ["add_test_case"]="def test_|it\(|describe\(|expect\(|@pytest"
)

3. Frequency Calculation

For each skill with sufficient usage:

frequency = pattern_count / total_skill_uses

4. Confidence Scoring

Confidence combines frequency with sample size:

confidence = frequency × min(samples / 20, 1.0)

This means:

  • 100% frequency with 5 samples = 0.25 confidence (needs more data)
  • 100% frequency with 20+ samples = 1.0 confidence (high certainty)
  • 70% frequency with 15 samples = 0.53 confidence (moderate)

Suggestion Thresholds

MetricThresholdPurpose
MIN_SAMPLES5Prevent premature suggestions
ADD_THRESHOLD0.7070%+ users add = suggest adding
REMOVE_THRESHOLD0.7070%+ users remove = suggest removing
AUTO_APPLY_CONFIDENCE0.85Auto-apply if very high confidence

Suggestion Types

Add Suggestions

Generated when users frequently add similar content:

{
  "type": "add",
  "target": "template",
  "pattern": "add_pagination",
  "reason": "85% of users add pagination after using this skill"
}

Remove Suggestions

Generated when users frequently remove generated content:

{
  "type": "remove",
  "target": "template",
  "pattern": "remove_comments",
  "reason": "72% of users remove docstrings from generated code"
}

Analysis Best Practices

  1. Wait for sufficient data: Don't act on suggestions until MIN_SAMPLES reached
  2. Review high-confidence first: Focus on suggestions with confidence > 0.80
  3. Consider context: A pattern may be added for specific use cases only
  4. Monitor after changes: Track success rate changes after evolution

Interpreting Results

High-Value Improvements

  • Frequency > 80%, Confidence > 0.70
  • Pattern is universally applicable
  • Easy to add to skill template

Conditional Improvements

  • Frequency 50-80%
  • May be context-dependent
  • Consider adding as optional reference

Skip/Investigate

  • Frequency < 50%
  • Might be edge case or user preference
  • Review individual edit patterns for context

Evolution Commands

Evolution Subcommand Reference

Detailed implementation and sample output for each subcommand.


Subcommand: Report (Default)

Usage: /ork:skill-evolution

Shows evolution report for all tracked skills.

Implementation

# Run the evolution engine report
"${CLAUDE_PROJECT_DIR}/.claude/scripts/evolution-engine.sh" report

Sample Output

Skill Evolution Report
══════════════════════════════════════════════════════════════

Skills Summary:
┌────────────────────────────┬─────────┬─────────┬───────────┬────────────┐
│ Skill                      │ Uses    │ Success │ Avg Edits │ Suggestions│
├────────────────────────────┼─────────┼─────────┼───────────┼────────────┤
│ api-design-framework       │     156 │     94% │       1.8 │          2 │
│ database-schema-designer   │      89 │     91% │       2.1 │          1 │
│ fastapi-patterns           │      67 │     88% │       2.4 │          3 │
└────────────────────────────┴─────────┴─────────┴───────────┴────────────┘

Summary:
  Skills tracked: 3
  Total uses: 312
  Overall success rate: 91%

Top Pending Suggestions:
1. 93% | api-design-framework | add add_pagination
2. 88% | api-design-framework | add add_rate_limiting
3. 85% | fastapi-patterns | add add_error_handling

Subcommand: Analyze

Usage: /ork:skill-evolution analyze &lt;skill-id&gt;

Analyzes edit patterns for a specific skill.

Implementation

# Run analysis for specific skill
"${CLAUDE_PROJECT_DIR}/.claude/scripts/evolution-engine.sh" analyze "$SKILL_ID"

Sample Output

Skill Analysis: api-design-framework
────────────────────────────────────
Uses: 156 | Success: 94% | Avg Edits: 1.8

Edit Patterns Detected:
┌──────────────────────────┬─────────┬──────────┬────────────┐
│ Pattern                  │ Freq    │ Samples  │ Confidence │
├──────────────────────────┼─────────┼──────────┼────────────┤
│ add_pagination           │    85%  │ 132/156  │       0.93 │
│ add_rate_limiting        │    72%  │ 112/156  │       0.88 │
│ add_error_handling       │    45%  │  70/156  │       0.56 │
└──────────────────────────┴─────────┴──────────┴────────────┘

Pending Suggestions:
1. 93% conf: ADD add_pagination to template
2. 88% conf: ADD add_rate_limiting to template

Run `/ork:skill-evolution evolve api-design-framework` to review

Subcommand: Evolve

Usage: /ork:skill-evolution evolve &lt;skill-id&gt;

Interactive review and application of improvement suggestions.

Implementation

  1. Get Suggestions:
SUGGESTIONS=$("${CLAUDE_PROJECT_DIR}/.claude/scripts/evolution-engine.sh" suggest "$SKILL_ID")
  1. For Each Suggestion, Present Interactive Options:

Use AskUserQuestion to let the user decide on each suggestion:

{
  "questions": [{
    "question": "Apply suggestion: ADD add_pagination to template? (93% confidence, 132/156 users add this)",
    "header": "Evolution",
    "options": [
      {"label": "Apply", "description": "Add this pattern to the skill template"},
      {"label": "Skip", "description": "Skip for now, ask again later"},
      {"label": "Reject", "description": "Never suggest this again"}
    ],
    "multiSelect": false
  }]
}
  1. On Apply:

    • Create version snapshot first
    • Apply the suggestion to skill files
    • Update evolution registry
  2. On Reject:

    • Mark suggestion as rejected in registry
    • Will not be suggested again

Applying Suggestions

When a user accepts a suggestion, the implementation depends on the suggestion type:

For add suggestions to templates:

  • Add the pattern to the skill's template files
  • Update SKILL.md with new guidance

For add suggestions to references:

  • Create new reference file in references/ directory

For remove suggestions:

  • Remove the identified content
  • Archive in version snapshot first

Subcommand: History

Usage: /ork:skill-evolution history &lt;skill-id&gt;

Shows version history with performance metrics.

Implementation

# Run version manager list
"${CLAUDE_PROJECT_DIR}/.claude/scripts/version-manager.sh" list "$SKILL_ID"

Sample Output

Version History: api-design-framework
══════════════════════════════════════════════════════════════

Current Version: 1.2.0

┌─────────┬────────────┬─────────┬───────┬───────────┬────────────────────────────┐
│ Version │ Date       │ Success │ Uses  │ Avg Edits │ Changelog                  │
├─────────┼────────────┼─────────┼───────┼───────────┼────────────────────────────┤
│ 1.2.0   │ 2026-01-14 │    94%  │   156 │       1.8 │ Added pagination pattern   │
│ 1.1.0   │ 2026-01-05 │    89%  │    80 │       2.3 │ Added error handling ref   │
│ 1.0.0   │ 2025-11-01 │    78%  │    45 │       3.2 │ Initial release            │
└─────────┴────────────┴─────────┴───────┴───────────┴────────────────────────────┘

Subcommand: Rollback

Usage: /ork:skill-evolution rollback &lt;skill-id&gt; &lt;version&gt;

Restores a skill to a previous version.

Implementation

  1. Confirm with User:

Use AskUserQuestion for confirmation:

{
  "questions": [{
    "question": "Rollback api-design-framework from 1.2.0 to 1.0.0? Current version will be backed up.",
    "header": "Rollback",
    "options": [
      {"label": "Confirm Rollback", "description": "Restore version 1.0.0"},
      {"label": "Cancel", "description": "Keep current version"}
    ],
    "multiSelect": false
  }]
}
  1. On Confirm:
"${CLAUDE_PROJECT_DIR}/.claude/scripts/version-manager.sh" restore "$SKILL_ID" "$VERSION"
  1. Report Result:
Restored api-design-framework to version 1.0.0
Previous version backed up to: versions/.backup-1.2.0-1736867234

Version Management

Version Management Guide

Reference guide for managing skill versions with safe rollback capability.

Version Structure

Each skill can have versioned snapshots stored in:

skills/<category>/<skill-name>/
├── SKILL.md                 # Current version
├── SKILL.md        # Current metadata
├── references/              # Current references
├── scripts/               # Current templates
└── versions/
    ├── manifest.json        # Version history metadata
    ├── 1.0.0/
    │   ├── SKILL.md
    │   ├── SKILL.md
    │   ├── references/
    │   └── CHANGELOG.md
    └── 1.1.0/
        ├── SKILL.md
        ├── SKILL.md
        ├── references/
        └── CHANGELOG.md

Manifest Schema

The manifest.json tracks version history:

{
  "$schema": "../../../../../../.claude/schemas/skill-evolution.schema.json",
  "skillId": "api-design-framework",
  "currentVersion": "1.2.0",
  "versions": [
    {
      "version": "1.0.0",
      "date": "2025-11-01",
      "successRate": 0.78,
      "uses": 45,
      "avgEdits": 3.2,
      "changelog": "Initial release"
    },
    {
      "version": "1.1.0",
      "date": "2026-01-05",
      "successRate": 0.89,
      "uses": 80,
      "avgEdits": 1.8,
      "changelog": "Added pagination pattern (85% users added manually)"
    }
  ],
  "suggestions": [],
  "editPatterns": {},
  "lastAnalyzed": "2026-01-14T10:30:00Z"
}

Versioning Workflow

Creating a Version

  1. Before making changes, create a version snapshot:

    version-manager.sh create <skill-id> "Description of changes"
  2. The system:

    • Bumps version number (patch by default)
    • Copies current files to versions/&lt;new-version&gt;/
    • Records current metrics in manifest
    • Creates CHANGELOG.md

Comparing Versions

Compare two versions to see what changed:

version-manager.sh diff <skill-id> 1.0.0 1.1.0

Shows:

  • File differences (unified diff)
  • Metrics comparison (success rate, uses, avg edits)

Restoring a Version

If a change causes problems, rollback:

version-manager.sh restore <skill-id> <version>

The system:

  1. Backs up current version to .backup-&lt;version&gt;-&lt;timestamp&gt;
  2. Copies snapshot files to skill root
  3. Updates manifest with rollback entry

Automatic Safety Checks

Rollback Triggers

The system monitors for:

TriggerThresholdAction
Success rate drop-20%Warning + rollback suggestion
Avg edits increase+50%Warning (users fighting skill)
Consecutive failures5+Alert to review

Health Check Integration

The posttool hooks monitor skill health:

check_skill_health() {
    local skill_id="$1"
    local current_rate=$(get_recent_success_rate "$skill_id" 10)
    local baseline_rate=$(get_version_baseline "$skill_id")

    if (( $(echo "$baseline_rate - $current_rate > 0.20" | bc -l) )); then
        echo "WARNING: $skill_id dropped from ${baseline_rate} to ${current_rate}"
    fi
}

Best Practices

When to Create Versions

  • Before applying evolution suggestions
  • Before major skill modifications
  • After validating improvements work well
  • At regular intervals (weekly/monthly) for active skills

Version Naming

Use semantic versioning:

  • Major (2.0.0): Breaking changes to skill behavior
  • Minor (1.1.0): New features/patterns added
  • Patch (1.0.1): Bug fixes, minor improvements

Cleanup Policy

  • Keep last 5 versions minimum
  • Archive versions older than 90 days
  • Never delete versions with good metrics (baseline references)

Metrics Interpretation

PatternInterpretation
IncreasingEvolution working well
StableSkill mature and effective
DecreasingInvestigate recent changes
PatternInterpretation
DecreasingSkill producing better output
StableConsistent quality
IncreasingUsers modifying more (skill may need updates)

Recovery Scenarios

Accidental Breaking Change

# 1. Check history
version-manager.sh list <skill-id>

# 2. Find last good version
version-manager.sh metrics <skill-id>

# 3. Restore
version-manager.sh restore <skill-id> 1.1.0

Gradual Degradation

# 1. Compare versions
version-manager.sh diff <skill-id> 1.0.0 1.2.0

# 2. Identify problematic changes
# 3. Create new version fixing issues
Edit on GitHub

Last updated on

On this page

Skill Evolution ManagerOverviewQuick ReferenceHow It WorksSubcommandsReport (Default)AnalyzeEvolveHistoryRollbackData FilesAuto-Evolution SafetyReferencesRulesRelated SkillsRules (3)Auto-Evolution Triggers — HIGHAuto-Evolution Safety & Trigger CriteriaSafety MechanismsHealth MonitoringWhen Auto-Evolution ActivatesWhen Rollback Is TriggeredConfidence Scoring — HIGHConfidence Scoring & Suggestion ThresholdsThresholdsConfidence CalculationSuggestion StatesPattern Detection Heuristics — HIGHEdit Pattern Detection HeuristicsHow Detection WorksReferences (3)Evolution AnalysisEvolution Analysis MethodologyPattern Detection Algorithm1. Data Collection (PostTool Hook)2. Pattern Matching3. Frequency Calculation4. Confidence ScoringSuggestion ThresholdsSuggestion TypesAdd SuggestionsRemove SuggestionsAnalysis Best PracticesInterpreting ResultsHigh-Value ImprovementsConditional ImprovementsSkip/InvestigateEvolution CommandsEvolution Subcommand ReferenceSubcommand: Report (Default)ImplementationSample OutputSubcommand: AnalyzeImplementationSample OutputSubcommand: EvolveImplementationApplying SuggestionsSubcommand: HistoryImplementationSample OutputSubcommand: RollbackImplementationVersion ManagementVersion Management GuideVersion StructureManifest SchemaVersioning WorkflowCreating a VersionComparing VersionsRestoring a VersionAutomatic Safety ChecksRollback TriggersHealth Check IntegrationBest PracticesWhen to Create VersionsVersion NamingCleanup PolicyMetrics InterpretationSuccess Rate TrendsAverage Edits TrendsRecovery ScenariosAccidental Breaking ChangeGradual Degradation