Skip to main content
OrchestKit v6.7.1 — 67 skills, 38 agents, 77 hooks with Opus 4.6 support
OrchestKit
Skills

Web Research Workflow

Unified decision tree for web research and competitive monitoring. Auto-selects WebFetch, Tavily, or agent-browser based on target site characteristics and available API keys. Includes competitor page tracking, snapshot diffing, and change alerting. Use when researching web content, scraping, extracting raw markdown, capturing documentation, or monitoring competitor changes.

Reference low

Primary Agent: web-research-analyst

Web Research Workflow

Unified approach for web content research that automatically selects the right tool for each situation.

Quick Decision Tree

URL to research


┌─────────────────┐
│ 1. Try WebFetch │ ← Fast, free, no overhead
│    (always try) │
└─────────────────┘

Content OK? ──Yes──► Parse and return

     No (empty/partial/<500 chars)


┌───────────────────────┐
│ 2. TAVILY_API_KEY set?│
└───────────────────────┘
     │          │
    Yes         No ──► Skip to step 3


┌───────────────────────────┐
│ Tavily search/extract/    │ ← Raw markdown, batch URLs
│ crawl/research            │
└───────────────────────────┘

Content OK? ──Yes──► Parse and return

     No (JS-rendered/auth-required)


┌─────────────────────┐
│ 3. Use agent-browser │ ← Full browser, last resort
└─────────────────────┘

├─ SPA (react/vue/angular) ──► wait --load networkidle
├─ Login required ──► auth flow + state save
├─ Dynamic content ──► wait --text "Expected"
└─ Multi-page ──► crawl pattern

Tavily Enhanced Research

When TAVILY_API_KEY is set, Tavily provides a powerful middle tier between WebFetch and agent-browser. It returns raw markdown content, supports batch URL extraction, and offers semantic search with relevance scoring. If TAVILY_API_KEY is not set, the 3-tier tree collapses to 2-tier (WebFetch → agent-browser) automatically.

See Tool Selection for when-to-use-what tables, escalation heuristics, SPA detection patterns, and cost awareness.

See Tavily API Reference for Search, Extract, Map, Crawl, and Research endpoint examples and options.

Browser Patterns

For content requiring JavaScript rendering, authentication, or multi-page crawling, fall back to agent-browser.

See Browser Patterns for auto-fallback, authentication flow, multi-page research patterns, best practices, and troubleshooting.

Competitive Monitoring

Track competitor websites for changes in pricing, features, positioning, and content.

See Competitor Page Monitoring for snapshot capture, structured data extraction, and change classification.

See Change Detection & Discovery for diff detection, structured comparison, Tavily site discovery, and CI automation.

Change Classification

SeverityExamplesAction
CriticalPrice increase/decrease, major feature changeImmediate alert
HighNew feature added, feature removedReview required
MediumCopy changes, positioning shiftNote for analysis
LowTypos, minor stylingLog only

Integration with Agents

This skill is used by:

  • web-research-analyst - Primary user
  • market-intelligence - Competitor research
  • product-strategist - Deep competitive analysis
  • ux-researcher - Design system capture
  • documentation-specialist - API doc extraction
  • browser-content-capture - Detailed browser patterns
  • agent-browser - CLI reference

Version: 1.3.0 (February 2026)


Rules (4)

Use fallback browser automation patterns when WebFetch and Tavily are insufficient — MEDIUM

Browser Patterns

Pattern 1: Auto-Fallback

Try WebFetch first, fall back to browser if needed:

# Step 1: Try WebFetch
WebFetch(url="https://example.com", prompt="Extract main content")

# If result is empty, partial, or contains "Loading..." indicators:
# Step 2: Fall back to browser
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser get text body

Pattern 2: Authentication Flow

For login-protected content:

# 1. Navigate to login
agent-browser open https://app.example.com/login
agent-browser snapshot -i

# 2. Fill credentials (use refs from snapshot)
agent-browser fill @e1 "$EMAIL"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3  # Submit button

# 3. Wait for redirect
agent-browser wait --url "**/dashboard"

# 4. Save session for reuse
agent-browser state save /tmp/session-example.json

# 5. Later: restore session
agent-browser state load /tmp/session-example.json
agent-browser open https://app.example.com/protected-page

Pattern 3: Multi-Page Research

For documentation sites or multi-page content:

# 1. Get navigation links
agent-browser open https://docs.example.com
agent-browser snapshot -i

# 2. Extract all doc links
LINKS=$(agent-browser eval "JSON.stringify(
  Array.from(document.querySelectorAll('nav a'))
    .map(a => a.href)
    .filter(h => h.includes('/docs/'))
)")

# 3. Iterate with rate limiting
for link in $(echo "$LINKS" | jq -r '.[]' | head -20); do
  agent-browser open "$link"
  agent-browser wait --load networkidle
  agent-browser get text article > "/tmp/doc-$(basename $link).txt"
  sleep 2  # Rate limit
done

Best Practices

Use Appropriate Waits

# For SPAs with API calls
agent-browser wait --load networkidle

# For specific content
agent-browser wait --text "Expected content"

# For elements
agent-browser wait @e5

Respect Rate Limits

# Add delays between requests
sleep 2

# Use session isolation for parallel work
agent-browser --session site1 open https://site1.com
agent-browser --session site2 open https://site2.com

Cache Results

# Save extracted content to avoid re-scraping
agent-browser get text body > /tmp/cache/example-com.txt

Troubleshooting

IssueSolution
Empty content from WebFetchTry Tavily extract, then agent-browser
WebFetch returns <500 charsEscalate to Tavily extract if API key set
Partial contentUse wait --text "Expected" for specific content
Need batch URL extractionTavily extract (up to 20 URLs at once)
403 ForbiddenMay need authentication flow (agent-browser)
CAPTCHAManual intervention required
Rate limitedAdd delays, reduce request frequency
Content in iframeUse agent-browser frame @e1 then extract
No TAVILY_API_KEYSkip Tavily tier, use WebFetch → agent-browser

Incorrect — Browser without waiting for content:

agent-browser open https://spa-app.com
agent-browser get text body
# Returns empty - content not loaded yet

Correct — Wait for network idle before extracting:

agent-browser open https://spa-app.com
agent-browser wait --load networkidle
agent-browser get text body
# Content fully loaded

Automate change detection and site discovery to catch competitor changes immediately — HIGH

Change Detection and Site Discovery

Detect differences between snapshots and discover competitor pages for monitoring.

Incorrect -- manual visual comparison:

# Open two browser tabs and visually compare
# No structured diff, easy to miss changes
# No record of what changed or when

Correct -- automated diff with structured comparison:

# Text diff between latest two snapshots
LATEST=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.txt | head -1)
PREVIOUS=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.txt | head -2 | tail -1)

diff -u "$PREVIOUS" "$LATEST" > \
  .competitive-intel/diffs/competitor-pricing-$(date +%Y%m%d).diff

# Structured JSON comparison for pricing changes
LATEST_JSON=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.json | head -1)
PREVIOUS_JSON=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.json | head -2 | tail -1)

jq -s '
  .[0] as $old | .[1] as $new |
  {
    price_changes: [
      $new[] | . as $tier |
      ($old[] | select(.name == $tier.name)) as $old_tier |
      select($old_tier.price \!= $tier.price) |
      {name: .name, old_price: $old_tier.price, new_price: .price}
    ],
    new_tiers: [$new[] | select(.name as $n | $old | map(.name) | index($n) | not)],
    removed_tiers: [$old[] | select(.name as $n | $new | map(.name) | index($n) | not)]
  }
' "$PREVIOUS_JSON" "$LATEST_JSON"

Site discovery with Tavily (when TAVILY_API_KEY is set):

# Crawl competitor site -- discovers URLs and extracts content
# Use Tavily crawl API with include_paths filter
# Save results to .competitive-intel/snapshots/ directory
# See web-research-workflow SKILL.md for full Tavily crawl examples

Automated monitoring via CI:

# .github/workflows/competitive-monitor.yml
name: Competitive Monitor
on:
  schedule:
    - cron: '0 9 * * *'  # Daily at 9 AM
jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm install -g agent-browser
      - run: agent-browser install
      - run: ./scripts/run-competitive-monitor.sh
      - uses: actions/upload-artifact@v4
        with:
          name: competitive-intel
          path: .competitive-intel/

Key rules:

  • Run structured JSON diffs, not just text diffs, to detect price and feature changes precisely
  • Use Tavily crawl for initial site discovery when API key is available, fall back to manual URL lists
  • Store diffs with timestamps for change history and trend analysis
  • Automate monitoring in CI with daily schedules for consistent coverage

Monitor competitor pages for pricing and feature changes to enable timely strategic response — HIGH

Competitor Page Monitoring

Capture point-in-time snapshots of competitor pages for pricing, features, and positioning tracking.

Incorrect -- manual one-off checks with no history:

# Check competitor pricing once, no record kept
# No snapshot, no history, no comparison baseline

Correct -- structured snapshot capture with change classification:

# Create snapshots directory
mkdir -p .competitive-intel/snapshots

# Capture text content
agent-browser open https://competitor.com/pricing
agent-browser wait --load networkidle
agent-browser get text body > \
  .competitive-intel/snapshots/competitor-pricing-$(date +%Y%m%d).txt

# Extract structured pricing data
agent-browser eval "JSON.stringify(
  Array.from(document.querySelectorAll('.pricing-tier')).map(tier => ({
    name: tier.querySelector('h3')?.innerText,
    price: tier.querySelector('.price')?.innerText,
    features: Array.from(tier.querySelectorAll('li')).map(li => li.innerText)
  }))
)" > .competitive-intel/snapshots/competitor-pricing-$(date +%Y%m%d).json

agent-browser close

Change classification:

SeverityExamplesAction
CriticalPrice increase/decrease, major feature changeImmediate alert
HighNew feature added, feature removedReview required
MediumCopy changes, positioning shiftNote for analysis
LowTypos, minor stylingLog only

Config file for monitoring targets:

{
  "monitors": [
    {
      "name": "Competitor A Pricing",
      "url": "https://competitor-a.com/pricing",
      "frequency": "daily",
      "selectors": {
        "pricing": ".pricing-tier",
        "features": ".feature-list li"
      },
      "alerts": {
        "price_change": "critical",
        "new_feature": "high",
        "copy_change": "low"
      }
    }
  ],
  "storage": ".competitive-intel",
  "retention_days": 90
}

Key rules:

  • Always save both text and structured (JSON) snapshots for each capture
  • Use consistent date-stamped filenames for chronological comparison
  • Classify changes by severity to prioritize strategic response
  • Store findings in knowledge graph for cross-session persistence

Choose between WebFetch, Tavily, and browser tools using the decision matrix — HIGH

Tool Selection Rules

When to Use Tavily over WebFetch

  • WebFetch returned <500 chars (likely incomplete)
  • You need raw markdown content (not Haiku-summarized)
  • Batch extracting content from multiple URLs
  • Semantic search with relevance scoring
  • Site discovery/crawling (map API)

When to Skip Tavily and Go to agent-browser

  • Content requires JavaScript rendering (SPAs)
  • Authentication/login is required
  • Interactive elements need clicking
  • Content is behind CAPTCHAs

Scenario-Based Selection

ScenarioToolWhy
Static HTML pageWebFetchFast, no browser needed
Public API docsWebFetchUsually server-rendered
GitHub READMEWebFetchStatic content
Batch URL extraction (2-20 URLs)Tavily ExtractParallel, raw markdown
Semantic search with contentTavily SearchRelevance-scored results
Site crawl/discovery (URLs only)Tavily MapFinds all URLs on a domain
Full site content extractionTavily CrawlCrawl + extract in one call
Competitor page deep contentTavily ExtractFull markdown, not summarized
Deep multi-source researchTavily ResearchSynthesized report with citations (beta)
React/Vue/Angular appagent-browserNeeds JS execution
Interactive pricing pageagent-browserDynamic content
Login-protected contentagent-browserNeeds session state
Swagger UIagent-browserClient-rendered
Documentation with sidebar navagent-browserClient-side routing

Detection Heuristics

Content likely needs browser if WebFetch returns:

  • Empty or very short content (< 500 chars)
  • Contains &lt;noscript&gt; tags
  • Contains "Loading...", "Please wait", "JavaScript required"
  • Contains only <div id="root"></div> or <div id="app"></div>
  • Returns 403/401 (may need auth)

SPA Detection

Known patterns that always need browser:

# URL patterns suggesting SPA
app.* | dashboard.* | portal.* | console.*

# Framework indicators in initial HTML
"__NEXT_DATA__" Next.js (may work with WebFetch)
"window.__NUXT__" Nuxt.js (may work with WebFetch)
"ng-app" Angular (needs browser)
"data-reactroot" React (needs browser)
"data-v-" Vue (needs browser)

Escalation Heuristic

# Auto-escalate from WebFetch to Tavily
CONTENT=$(WebFetch url="$URL" prompt="Extract main content")
CHAR_COUNT=${#CONTENT}

if [ "$CHAR_COUNT" -lt 500 ] && [ -n "$TAVILY_API_KEY" ]; then
  # WebFetch returned thin content — try Tavily extract
  RESULT=$(curl -s -X POST 'https://api.tavily.com/extract' \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $TAVILY_API_KEY" \
    -d "{\"urls\":[\"$URL\"]}")
  # Parse result...
fi

Cost Awareness

APICostNotes
Tavily Search1 credit/searchadvanced depth = 2 credits
Tavily Extract1 credit/5 URLsBatch up to 20 URLs
Tavily Map1 credit/10 pagesGood for site discovery
Tavily Crawl1 credit/5 pagesFull site extraction in one call
Tavily Research5 credits/queryDeep multi-source synthesis (beta)
WebFetchFreeAlways try first
agent-browserFree (compute)Slowest, most capable

Graceful Degradation

If TAVILY_API_KEY is not set, the 3-tier tree collapses to the original 2-tier (WebFetch → agent-browser). No configuration needed — agents check for the env var before attempting Tavily calls.

Incorrect — Always using browser for everything:

agent-browser open https://github.com/owner/repo/blob/main/README.md
agent-browser wait --load networkidle
agent-browser get text body
# Slow, unnecessary - static content

Correct — Use WebFetch for static content:

WebFetch(
  url="https://github.com/owner/repo/blob/main/README.md",
  prompt="Extract README content"
)
# Fast, no browser needed

References (1)

Tavily Api

Tavily API Reference

Returns relevance-scored results with raw markdown content:

curl -s -X POST 'https://api.tavily.com/search' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "query": "your search query",
    "search_depth": "advanced",
    "max_results": 5,
    "include_raw_content": "markdown"
  }' | python3 -m json.tool

Options:

  • search_depth: "basic" (fast) or "advanced" (thorough, 2x cost)
  • topic: "general" (default) or "news" or "finance"
  • include_domains: ["example.com"] — restrict to specific sites
  • exclude_domains: ["reddit.com"] — filter out sites
  • days: 3 — limit to recent results (news/finance)
  • include_raw_content: "markdown" — get full page content

Extract (Batch URL Content)

Extract raw content from up to 20 URLs at once:

curl -s -X POST 'https://api.tavily.com/extract' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "urls": [
      "https://docs.example.com/guide",
      "https://competitor.com/pricing"
    ]
  }' | python3 -m json.tool

Returns markdown content for each URL. Use when you have specific URLs and need full content.

Map (Site Discovery)

Discover all URLs on a site before extracting:

curl -s -X POST 'https://api.tavily.com/map' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "url": "https://docs.example.com",
    "max_depth": 2,
    "limit": 50
  }' | python3 -m json.tool

Useful for documentation sites and competitor sitemaps. Combine with extract for full crawl.

Crawl (Multi-Page Content Extraction)

Crawl an entire site and extract content from all discovered pages in one call. Replaces the manual map+extract two-step workflow:

curl -s -X POST 'https://api.tavily.com/crawl' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "url": "https://docs.example.com",
    "max_depth": 2,
    "limit": 50,
    "include_raw_content": "markdown"
  }' | python3 -m json.tool

Options:

  • max_depth: 2 — how many link-hops from the seed URL
  • limit: 50 — max pages to crawl
  • include_raw_content: "markdown" — get full page content (not just snippets)
  • exclude_paths: ["/blog/*"] — skip certain URL patterns
  • include_paths: ["/docs/*"] — restrict to certain URL patterns

When to use Crawl vs Map+Extract:

  • Use Crawl when you want content from an entire site section (docs, changelog, pricing)
  • Use Map when you only need URL discovery without content
  • Use Extract when you already have specific URLs

Research (Deep Research — Beta)

Multi-step research agent that searches, reads, and synthesizes across multiple sources. Returns a comprehensive report with citations:

curl -s -X POST 'https://api.tavily.com/research' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "query": "comparison of vector databases for RAG in 2026",
    "max_results": 10,
    "report_type": "research_report"
  }' | python3 -m json.tool

Options:

  • report_type: "research_report" (default) | "outline_report" | "detailed_report"
  • max_results: number of sources to analyze (more = deeper but slower)

Note: The /research endpoint is in beta. Falls back gracefully to /search + /extract if unavailable. Best for deep competitive analysis, market research, and technical comparisons.

Edit on GitHub

Last updated on