Web Research Workflow
Unified decision tree for web research and competitive monitoring. Auto-selects WebFetch, Tavily, or agent-browser based on target site characteristics and available API keys. Includes competitor page tracking, snapshot diffing, and change alerting. Use when researching web content, scraping, extracting raw markdown, capturing documentation, or monitoring competitor changes.
Primary Agent: web-research-analyst
Web Research Workflow
Unified approach for web content research that automatically selects the right tool for each situation.
Quick Decision Tree
URL to research
│
▼
┌─────────────────┐
│ 1. Try WebFetch │ ← Fast, free, no overhead
│ (always try) │
└─────────────────┘
│
Content OK? ──Yes──► Parse and return
│
No (empty/partial/<500 chars)
│
▼
┌───────────────────────┐
│ 2. TAVILY_API_KEY set?│
└───────────────────────┘
│ │
Yes No ──► Skip to step 3
│
▼
┌───────────────────────────┐
│ Tavily search/extract/ │ ← Raw markdown, batch URLs
│ crawl/research │
└───────────────────────────┘
│
Content OK? ──Yes──► Parse and return
│
No (JS-rendered/auth-required)
│
▼
┌─────────────────────┐
│ 3. Use agent-browser │ ← Full browser, last resort
└─────────────────────┘
│
├─ SPA (react/vue/angular) ──► wait --load networkidle
├─ Login required ──► auth flow + state save
├─ Dynamic content ──► wait --text "Expected"
└─ Multi-page ──► crawl patternTavily Enhanced Research
When TAVILY_API_KEY is set, Tavily provides a powerful middle tier between WebFetch and agent-browser. It returns raw markdown content, supports batch URL extraction, and offers semantic search with relevance scoring. If TAVILY_API_KEY is not set, the 3-tier tree collapses to 2-tier (WebFetch → agent-browser) automatically.
See Tool Selection for when-to-use-what tables, escalation heuristics, SPA detection patterns, and cost awareness.
See Tavily API Reference for Search, Extract, Map, Crawl, and Research endpoint examples and options.
Browser Patterns
For content requiring JavaScript rendering, authentication, or multi-page crawling, fall back to agent-browser.
See Browser Patterns for auto-fallback, authentication flow, multi-page research patterns, best practices, and troubleshooting.
Competitive Monitoring
Track competitor websites for changes in pricing, features, positioning, and content.
See Competitor Page Monitoring for snapshot capture, structured data extraction, and change classification.
See Change Detection & Discovery for diff detection, structured comparison, Tavily site discovery, and CI automation.
Change Classification
| Severity | Examples | Action |
|---|---|---|
| Critical | Price increase/decrease, major feature change | Immediate alert |
| High | New feature added, feature removed | Review required |
| Medium | Copy changes, positioning shift | Note for analysis |
| Low | Typos, minor styling | Log only |
Integration with Agents
This skill is used by:
web-research-analyst- Primary usermarket-intelligence- Competitor researchproduct-strategist- Deep competitive analysisux-researcher- Design system capturedocumentation-specialist- API doc extraction
Related Skills
browser-content-capture- Detailed browser patternsagent-browser- CLI reference
Version: 1.3.0 (February 2026)
Rules (4)
Use fallback browser automation patterns when WebFetch and Tavily are insufficient — MEDIUM
Browser Patterns
Pattern 1: Auto-Fallback
Try WebFetch first, fall back to browser if needed:
# Step 1: Try WebFetch
WebFetch(url="https://example.com", prompt="Extract main content")
# If result is empty, partial, or contains "Loading..." indicators:
# Step 2: Fall back to browser
agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser get text bodyPattern 2: Authentication Flow
For login-protected content:
# 1. Navigate to login
agent-browser open https://app.example.com/login
agent-browser snapshot -i
# 2. Fill credentials (use refs from snapshot)
agent-browser fill @e1 "$EMAIL"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3 # Submit button
# 3. Wait for redirect
agent-browser wait --url "**/dashboard"
# 4. Save session for reuse
agent-browser state save /tmp/session-example.json
# 5. Later: restore session
agent-browser state load /tmp/session-example.json
agent-browser open https://app.example.com/protected-pagePattern 3: Multi-Page Research
For documentation sites or multi-page content:
# 1. Get navigation links
agent-browser open https://docs.example.com
agent-browser snapshot -i
# 2. Extract all doc links
LINKS=$(agent-browser eval "JSON.stringify(
Array.from(document.querySelectorAll('nav a'))
.map(a => a.href)
.filter(h => h.includes('/docs/'))
)")
# 3. Iterate with rate limiting
for link in $(echo "$LINKS" | jq -r '.[]' | head -20); do
agent-browser open "$link"
agent-browser wait --load networkidle
agent-browser get text article > "/tmp/doc-$(basename $link).txt"
sleep 2 # Rate limit
doneBest Practices
Use Appropriate Waits
# For SPAs with API calls
agent-browser wait --load networkidle
# For specific content
agent-browser wait --text "Expected content"
# For elements
agent-browser wait @e5Respect Rate Limits
# Add delays between requests
sleep 2
# Use session isolation for parallel work
agent-browser --session site1 open https://site1.com
agent-browser --session site2 open https://site2.comCache Results
# Save extracted content to avoid re-scraping
agent-browser get text body > /tmp/cache/example-com.txtTroubleshooting
| Issue | Solution |
|---|---|
| Empty content from WebFetch | Try Tavily extract, then agent-browser |
| WebFetch returns <500 chars | Escalate to Tavily extract if API key set |
| Partial content | Use wait --text "Expected" for specific content |
| Need batch URL extraction | Tavily extract (up to 20 URLs at once) |
| 403 Forbidden | May need authentication flow (agent-browser) |
| CAPTCHA | Manual intervention required |
| Rate limited | Add delays, reduce request frequency |
| Content in iframe | Use agent-browser frame @e1 then extract |
| No TAVILY_API_KEY | Skip Tavily tier, use WebFetch → agent-browser |
Incorrect — Browser without waiting for content:
agent-browser open https://spa-app.com
agent-browser get text body
# Returns empty - content not loaded yetCorrect — Wait for network idle before extracting:
agent-browser open https://spa-app.com
agent-browser wait --load networkidle
agent-browser get text body
# Content fully loadedAutomate change detection and site discovery to catch competitor changes immediately — HIGH
Change Detection and Site Discovery
Detect differences between snapshots and discover competitor pages for monitoring.
Incorrect -- manual visual comparison:
# Open two browser tabs and visually compare
# No structured diff, easy to miss changes
# No record of what changed or whenCorrect -- automated diff with structured comparison:
# Text diff between latest two snapshots
LATEST=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.txt | head -1)
PREVIOUS=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.txt | head -2 | tail -1)
diff -u "$PREVIOUS" "$LATEST" > \
.competitive-intel/diffs/competitor-pricing-$(date +%Y%m%d).diff
# Structured JSON comparison for pricing changes
LATEST_JSON=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.json | head -1)
PREVIOUS_JSON=$(ls -t .competitive-intel/snapshots/competitor-pricing-*.json | head -2 | tail -1)
jq -s '
.[0] as $old | .[1] as $new |
{
price_changes: [
$new[] | . as $tier |
($old[] | select(.name == $tier.name)) as $old_tier |
select($old_tier.price \!= $tier.price) |
{name: .name, old_price: $old_tier.price, new_price: .price}
],
new_tiers: [$new[] | select(.name as $n | $old | map(.name) | index($n) | not)],
removed_tiers: [$old[] | select(.name as $n | $new | map(.name) | index($n) | not)]
}
' "$PREVIOUS_JSON" "$LATEST_JSON"Site discovery with Tavily (when TAVILY_API_KEY is set):
# Crawl competitor site -- discovers URLs and extracts content
# Use Tavily crawl API with include_paths filter
# Save results to .competitive-intel/snapshots/ directory
# See web-research-workflow SKILL.md for full Tavily crawl examplesAutomated monitoring via CI:
# .github/workflows/competitive-monitor.yml
name: Competitive Monitor
on:
schedule:
- cron: '0 9 * * *' # Daily at 9 AM
jobs:
monitor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm install -g agent-browser
- run: agent-browser install
- run: ./scripts/run-competitive-monitor.sh
- uses: actions/upload-artifact@v4
with:
name: competitive-intel
path: .competitive-intel/Key rules:
- Run structured JSON diffs, not just text diffs, to detect price and feature changes precisely
- Use Tavily crawl for initial site discovery when API key is available, fall back to manual URL lists
- Store diffs with timestamps for change history and trend analysis
- Automate monitoring in CI with daily schedules for consistent coverage
Monitor competitor pages for pricing and feature changes to enable timely strategic response — HIGH
Competitor Page Monitoring
Capture point-in-time snapshots of competitor pages for pricing, features, and positioning tracking.
Incorrect -- manual one-off checks with no history:
# Check competitor pricing once, no record kept
# No snapshot, no history, no comparison baselineCorrect -- structured snapshot capture with change classification:
# Create snapshots directory
mkdir -p .competitive-intel/snapshots
# Capture text content
agent-browser open https://competitor.com/pricing
agent-browser wait --load networkidle
agent-browser get text body > \
.competitive-intel/snapshots/competitor-pricing-$(date +%Y%m%d).txt
# Extract structured pricing data
agent-browser eval "JSON.stringify(
Array.from(document.querySelectorAll('.pricing-tier')).map(tier => ({
name: tier.querySelector('h3')?.innerText,
price: tier.querySelector('.price')?.innerText,
features: Array.from(tier.querySelectorAll('li')).map(li => li.innerText)
}))
)" > .competitive-intel/snapshots/competitor-pricing-$(date +%Y%m%d).json
agent-browser closeChange classification:
| Severity | Examples | Action |
|---|---|---|
| Critical | Price increase/decrease, major feature change | Immediate alert |
| High | New feature added, feature removed | Review required |
| Medium | Copy changes, positioning shift | Note for analysis |
| Low | Typos, minor styling | Log only |
Config file for monitoring targets:
{
"monitors": [
{
"name": "Competitor A Pricing",
"url": "https://competitor-a.com/pricing",
"frequency": "daily",
"selectors": {
"pricing": ".pricing-tier",
"features": ".feature-list li"
},
"alerts": {
"price_change": "critical",
"new_feature": "high",
"copy_change": "low"
}
}
],
"storage": ".competitive-intel",
"retention_days": 90
}Key rules:
- Always save both text and structured (JSON) snapshots for each capture
- Use consistent date-stamped filenames for chronological comparison
- Classify changes by severity to prioritize strategic response
- Store findings in knowledge graph for cross-session persistence
Choose between WebFetch, Tavily, and browser tools using the decision matrix — HIGH
Tool Selection Rules
When to Use Tavily over WebFetch
- WebFetch returned <500 chars (likely incomplete)
- You need raw markdown content (not Haiku-summarized)
- Batch extracting content from multiple URLs
- Semantic search with relevance scoring
- Site discovery/crawling (map API)
When to Skip Tavily and Go to agent-browser
- Content requires JavaScript rendering (SPAs)
- Authentication/login is required
- Interactive elements need clicking
- Content is behind CAPTCHAs
Scenario-Based Selection
| Scenario | Tool | Why |
|---|---|---|
| Static HTML page | WebFetch | Fast, no browser needed |
| Public API docs | WebFetch | Usually server-rendered |
| GitHub README | WebFetch | Static content |
| Batch URL extraction (2-20 URLs) | Tavily Extract | Parallel, raw markdown |
| Semantic search with content | Tavily Search | Relevance-scored results |
| Site crawl/discovery (URLs only) | Tavily Map | Finds all URLs on a domain |
| Full site content extraction | Tavily Crawl | Crawl + extract in one call |
| Competitor page deep content | Tavily Extract | Full markdown, not summarized |
| Deep multi-source research | Tavily Research | Synthesized report with citations (beta) |
| React/Vue/Angular app | agent-browser | Needs JS execution |
| Interactive pricing page | agent-browser | Dynamic content |
| Login-protected content | agent-browser | Needs session state |
| Swagger UI | agent-browser | Client-rendered |
| Documentation with sidebar nav | agent-browser | Client-side routing |
Detection Heuristics
Content likely needs browser if WebFetch returns:
- Empty or very short content (< 500 chars)
- Contains
<noscript>tags - Contains "Loading...", "Please wait", "JavaScript required"
- Contains only
<div id="root"></div>or<div id="app"></div> - Returns 403/401 (may need auth)
SPA Detection
Known patterns that always need browser:
# URL patterns suggesting SPA
app.* | dashboard.* | portal.* | console.*
# Framework indicators in initial HTML
"__NEXT_DATA__" → Next.js (may work with WebFetch)
"window.__NUXT__" → Nuxt.js (may work with WebFetch)
"ng-app" → Angular (needs browser)
"data-reactroot" → React (needs browser)
"data-v-" → Vue (needs browser)Escalation Heuristic
# Auto-escalate from WebFetch to Tavily
CONTENT=$(WebFetch url="$URL" prompt="Extract main content")
CHAR_COUNT=${#CONTENT}
if [ "$CHAR_COUNT" -lt 500 ] && [ -n "$TAVILY_API_KEY" ]; then
# WebFetch returned thin content — try Tavily extract
RESULT=$(curl -s -X POST 'https://api.tavily.com/extract' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $TAVILY_API_KEY" \
-d "{\"urls\":[\"$URL\"]}")
# Parse result...
fiCost Awareness
| API | Cost | Notes |
|---|---|---|
| Tavily Search | 1 credit/search | advanced depth = 2 credits |
| Tavily Extract | 1 credit/5 URLs | Batch up to 20 URLs |
| Tavily Map | 1 credit/10 pages | Good for site discovery |
| Tavily Crawl | 1 credit/5 pages | Full site extraction in one call |
| Tavily Research | 5 credits/query | Deep multi-source synthesis (beta) |
| WebFetch | Free | Always try first |
| agent-browser | Free (compute) | Slowest, most capable |
Graceful Degradation
If TAVILY_API_KEY is not set, the 3-tier tree collapses to the original 2-tier (WebFetch → agent-browser). No configuration needed — agents check for the env var before attempting Tavily calls.
Incorrect — Always using browser for everything:
agent-browser open https://github.com/owner/repo/blob/main/README.md
agent-browser wait --load networkidle
agent-browser get text body
# Slow, unnecessary - static contentCorrect — Use WebFetch for static content:
WebFetch(
url="https://github.com/owner/repo/blob/main/README.md",
prompt="Extract README content"
)
# Fast, no browser neededReferences (1)
Tavily Api
Tavily API Reference
Search (Semantic Web Search)
Returns relevance-scored results with raw markdown content:
curl -s -X POST 'https://api.tavily.com/search' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $TAVILY_API_KEY" \
-d '{
"query": "your search query",
"search_depth": "advanced",
"max_results": 5,
"include_raw_content": "markdown"
}' | python3 -m json.toolOptions:
search_depth:"basic"(fast) or"advanced"(thorough, 2x cost)topic:"general"(default) or"news"or"finance"include_domains:["example.com"]— restrict to specific sitesexclude_domains:["reddit.com"]— filter out sitesdays:3— limit to recent results (news/finance)include_raw_content:"markdown"— get full page content
Extract (Batch URL Content)
Extract raw content from up to 20 URLs at once:
curl -s -X POST 'https://api.tavily.com/extract' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $TAVILY_API_KEY" \
-d '{
"urls": [
"https://docs.example.com/guide",
"https://competitor.com/pricing"
]
}' | python3 -m json.toolReturns markdown content for each URL. Use when you have specific URLs and need full content.
Map (Site Discovery)
Discover all URLs on a site before extracting:
curl -s -X POST 'https://api.tavily.com/map' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $TAVILY_API_KEY" \
-d '{
"url": "https://docs.example.com",
"max_depth": 2,
"limit": 50
}' | python3 -m json.toolUseful for documentation sites and competitor sitemaps. Combine with extract for full crawl.
Crawl (Multi-Page Content Extraction)
Crawl an entire site and extract content from all discovered pages in one call. Replaces the manual map+extract two-step workflow:
curl -s -X POST 'https://api.tavily.com/crawl' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $TAVILY_API_KEY" \
-d '{
"url": "https://docs.example.com",
"max_depth": 2,
"limit": 50,
"include_raw_content": "markdown"
}' | python3 -m json.toolOptions:
max_depth:2— how many link-hops from the seed URLlimit:50— max pages to crawlinclude_raw_content:"markdown"— get full page content (not just snippets)exclude_paths:["/blog/*"]— skip certain URL patternsinclude_paths:["/docs/*"]— restrict to certain URL patterns
When to use Crawl vs Map+Extract:
- Use Crawl when you want content from an entire site section (docs, changelog, pricing)
- Use Map when you only need URL discovery without content
- Use Extract when you already have specific URLs
Research (Deep Research — Beta)
Multi-step research agent that searches, reads, and synthesizes across multiple sources. Returns a comprehensive report with citations:
curl -s -X POST 'https://api.tavily.com/research' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $TAVILY_API_KEY" \
-d '{
"query": "comparison of vector databases for RAG in 2026",
"max_results": 10,
"report_type": "research_report"
}' | python3 -m json.toolOptions:
report_type:"research_report"(default) |"outline_report"|"detailed_report"max_results: number of sources to analyze (more = deeper but slower)
Note: The /research endpoint is in beta. Falls back gracefully to /search + /extract if unavailable. Best for deep competitive analysis, market research, and technical comparisons.
Vite Advanced
Advanced Vite 7+ patterns including Environment API, plugin development, SSR configuration, library mode, and build optimization. Use when customizing build pipelines, creating plugins, or configuring multi-environment builds.
Zustand Patterns
Zustand 5.x state management with slices, middleware, Immer, useShallow, and persistence patterns for React applications. Use when building state management with Zustand.
Last updated on