Skip to main content
OrchestKit v6.7.1 — 67 skills, 38 agents, 77 hooks with Opus 4.6 support
OrchestKit

Review a Pull Request

AI-powered code review with 6 parallel specialized agents that catch security, performance, and quality issues.

The /ork:review-pr command runs a multi-agent code review against any pull request. Six specialized agents analyze the diff in parallel, each focused on a different dimension of code quality. The results are synthesized into a single, actionable PR comment. This cookbook walks through reviewing a payment processing PR.

Scenario

A teammate opens PR #42 that adds a Stripe payment processing endpoint to your FastAPI backend. The PR is 420 lines across 9 files: a new router, service layer, webhook handler, database migration, and tests. You want a thorough review before merging code that handles real money.

What You'll Use

ComponentTypeRole
/ork:review-prCommand skillOrchestrates the multi-agent review
code-quality-reviewerAgentStyle, patterns, complexity
security-auditorAgentInjection, XSS, secrets, OWASP
test-generatorAgentMissing test coverage
performance-engineerAgentN+1 queries, bundle size, latency
backend-system-architectAgentAPI design, error handling, contracts
accessibility-specialistAgentA11y issues (if frontend changes present)
pr-size-warningHookDetects large PRs and warns about review difficulty
security-command-auditHookLogs security findings
skill-suggesterHookInjects payment and Stripe reference skills

Step 1: Start the Review

/ork:review-pr 42

OrchestKit fetches the PR metadata from GitHub using gh pr view 42 and asks for your review focus:

PR #42: "Add Stripe payment processing endpoint"
  Author: @teammate  |  +312 / -108  |  9 files  |  base: main

Review focus:

  [1] Full        — All 6 agents: security, quality, tests, performance, design, a11y
  [2] Security    — Deep security-only scan (2 agents)
  [3] Performance — Latency, queries, resource usage
  [4] Quick       — Quality + tests only (fast, 2 agents)

> 1

Selecting Full launches all six agents against the PR diff.


Step 2: Six Parallel Agents Analyze the Diff

/ork:review-pr 42 (Full)
    |
    |  Fetches diff: gh pr diff 42
    |  Fetches files: 9 changed files
    |
    |---> code-quality-reviewer ----------------------------+
    |      Skills: code-review-playbook, clean-architecture  |
    |      Focus: naming, complexity, DRY, patterns          |
    |                                                        |
    |---> security-auditor ---------------------------------+|
    |      Skills: owasp-top-10, defense-in-depth,          ||
    |              input-validation, security-scanning       ||
    |      Focus: injection, auth bypass, exposed secrets    ||
    |                                                       ||
    |---> test-generator -----------------------------------+||
    |      Skills: pytest-advanced, integration-testing     |||
    |      Focus: missing coverage, edge cases, mocks       |||
    |                                                      |||
    |---> performance-engineer ----------------------------+|||
    |      Skills: performance-optimization, caching-strategies  ||||
    |      Focus: N+1 queries, connection pooling, latency ||||
    |                                                     ||||
    |---> backend-system-architect -----------------------+|||||
    |      Skills: api-design-framework,                 ||||||
    |              error-handling-rfc9457                 ||||||
    |      Focus: API contracts, error responses         ||||||
    |                                                   ||||||
    +---> accessibility-specialist ---------------------+|||||||
           Skills: wcag-patterns, aria-guidelines       ||||||||
           Focus: a11y (skipped -- no frontend files)   ||||||||
                                                        ||||||||
    <----------- Results synthesized <------------------++++++++

Each agent receives the full diff plus relevant skill context. The accessibility-specialist detects that all 9 changed files are backend Python files, reports "no frontend changes detected", and completes in under a second. The remaining five agents work in parallel.

The pr-size-warning hook fires immediately when it counts 420 changed lines. It injects a note into the review context: "Large PR (420 lines). Consider whether this should have been split into smaller PRs." This note appears in the final review output.


Step 3: Review Results

After all agents complete (typically 30-90 seconds for a full review), OrchestKit synthesizes their findings into a structured report:

PR #42 Review — "Add Stripe payment processing endpoint"
=====================================================

Quality:      8/10
Security:     6/10   <- 1 P0, 2 P1 findings
Tests:        7/10   3 missing test cases identified
Performance:  9/10   No N+1 queries detected
Design:       8/10   1 API contract suggestion
A11y:         --     No frontend changes

------------------------------------------------------

SECURITY (security-auditor)

  [P0] Webhook signature not verified
       File: app/payments/webhook.py:34
       The Stripe webhook handler processes events without
       verifying the signature header. An attacker could forge
       webhook payloads to credit accounts.
       Fix: Use stripe.Webhook.construct_event() with your
       webhook signing secret.

  [P1] Stripe secret key in default parameter
       File: app/payments/config.py:12
       STRIPE_SECRET_KEY has a default value of "sk_test_..."
       in the config. This will leak to version control.
       Fix: Remove the default. Require it via env variable
       with no fallback.

  [P1] No idempotency key on charge creation
       File: app/payments/service.py:67
       stripe.PaymentIntent.create() is called without an
       idempotency_key. Network retries could create duplicate
       charges.
       Fix: Accept an idempotency key from the client or
       generate one from the order ID.

------------------------------------------------------

QUALITY (code-quality-reviewer)

  [Suggestion] Extract Stripe client initialization
       File: app/payments/service.py:15-22
       Stripe is initialized inline in three methods. Extract
       to a shared dependency or lifespan handler.

  [Suggestion] PaymentService has 8 methods (complexity: medium)
       Consider splitting webhook handling into a separate
       WebhookService class.

------------------------------------------------------

TESTS (test-generator)

  Missing test cases:
    1. Webhook with invalid signature -> should return 400
    2. Duplicate payment intent (idempotency) -> should not double-charge
    3. Stripe API timeout -> should return 502 with retry-after

  Coverage estimate: 74% (target: 85%)

------------------------------------------------------

DESIGN (backend-system-architect)

  [Suggestion] POST /payments/charge returns 200 on success
       Recommend 201 Created with a Location header pointing
       to the payment resource: /payments/{payment_id}

------------------------------------------------------

PR SIZE WARNING: 420 lines across 9 files. Consider splitting
webhook handling into a separate PR for easier review.

Step 4: Post to GitHub

OrchestKit asks whether to post the review as a PR comment:

Post this review as a comment on PR #42? [Y/n]
> Y

Review posted to PR #42
  https://github.com/your-org/your-repo/pull/42#issuecomment-1234567

The comment uses GitHub markdown with collapsible sections for each category, so it does not overwhelm the PR conversation. Security P0 findings are always expanded and highlighted.

P0 security findings block the review with a "Changes Requested" status. P1 findings are flagged but do not block. P2 findings are informational suggestions. The webhook signature vulnerability in this example is a P0 -- the PR should not merge until it is fixed.


Behind the Scenes

How the Review Diff is Distributed

OrchestKit does not send the entire diff to every agent. It routes files intelligently:

AgentFiles ReceivedRationale
security-auditorAll 9 filesSecurity must see everything
code-quality-reviewerAll 9 filesStyle applies everywhere
test-generatorTest files + source files they testNeeds both to assess coverage
performance-engineerService + migration filesWhere query and latency issues live
backend-system-architectRouter + schema filesAPI surface area
accessibility-specialistFrontend files onlySkipped when none exist

Hooks That Fired

HookWhenWhat It Did
pr-size-warningPR fetchedDetected 420 lines, injected size warning into review
skill-suggesterReview startedDetected "payment" and "Stripe" keywords, injected input-validation and api-design-framework reference skills
security-command-auditAfter security scanLogged the P0 finding to session metrics and audit trail
auto-remember-continuityReview completeStored "PR #42 has unverified webhook signatures" in memory for follow-up

Security Severity Levels

Each security finding is assigned a severity that maps to review actions:

SeverityMeaningReview Action
P0Exploitable vulnerabilityChanges Requested -- blocks merge
P1Real issue, requires specific conditionsFlagged -- fix before production
P2Hardening suggestionInformational -- consider improving

Skills Auto-Injected

Each agent received its standard skill set plus context-specific skills detected by the skill-suggester hook:

  • code-review-playbook -- Structured review methodology
  • owasp-top-10 -- Common web vulnerability patterns
  • defense-in-depth -- Layered security approach
  • api-design-framework -- REST conventions, status codes, error formats
  • error-handling-rfc9457 -- Problem Details standard
  • pytest-advanced -- Test patterns, fixtures, parametrize
  • performance-optimization -- N+1 detection, bundle analysis
  • clean-architecture -- Separation of concerns, dependency boundaries

Tips

Use "Quick" for small PRs. If the PR is under 100 lines and touches well-tested code, the Quick review (quality + tests only) runs in under 15 seconds. Save the Full review for complex or sensitive changes.

Re-review after fixes. After the author pushes fixes for the P0 finding, run /ork:review-pr 42 again. OrchestKit fetches the updated diff and verifies the fixes are correct. It also checks that the fix did not introduce new issues.

Security findings use severity levels intentionally. P0 means "do not merge" -- the code has a vulnerability that could be exploited. P1 means "fix before production" -- the issue is real but requires specific conditions. P2 means "consider improving" -- a hardening suggestion rather than a vulnerability.

Combine with /ork:verify for pre-merge confidence. Run /ork:review-pr for the review comment, then /ork:verify locally to confirm tests pass and the security scan is clean. This catches issues that a diff-only review cannot detect, such as tests that pass individually but fail together.

Edit on GitHub

Last updated on