Supacrawler vs BeautifulSoup: Local Performance Benchmarks

We benchmarked BeautifulSoup + requests against Supacrawler for static content scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.

See benchmark code: BeautifulSoup vs Supacrawler benchmark.

Identical Retry Logic

To ensure a completely fair comparison, we implemented the exact same retry and error handling logic in both systems. This is crucial because Supacrawler's production service has sophisticated retry mechanisms that could give it an unfair advantage if not matched in the test.

Supacrawler Service (internal/core/scrape/service.go):

maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
    if attempt > 0 {
        d := time.Duration(1<<(attempt-1)) * time.Second  // 1s, 2s, 4s
        time.Sleep(d)
    }
    // ... scraping logic
}

BeautifulSoup Benchmark (test notebook):

for attempt in range(max_retries):
    try:
        if attempt > 0:
            backoff = 1 << (attempt - 1)  # 1s, 2s, 4s
            time.sleep(backoff)
        response = session.get(url, timeout=10)  # Same 10s timeout
        # ... scraping logic
    except Exception as e:
        if is_retryable_error(e) and attempt < max_retries - 1:
            continue

Critical Setup Details:

JavaScript Rendering: BeautifulSoup cannot execute JavaScript, so we used `` for Supacrawler to ensure fair comparison
Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
Timeouts: Both use 10-second timeouts matching Supacrawler's HTTP client
Error Classification: Both only retry on 429, 503, timeouts - not 403/404
User Agent: Both use identical browser user agent strings

This setup ensures we're comparing like-for-like: static HTML scraping with identical error handling.

Why BeautifulSoup Is Sometimes Faster

The Trade-off: BeautifulSoup extracts raw HTML text while Supacrawler automatically cleans and structures the content into LLM-ready markdown. This explains the performance differences:

BeautifulSoup Raw Output:

Supabase | The Postgres Development Platform.Product Developers Solutions PricingDocsBlog88.3KSign inStart your projectOpen main menuBuild in a weekendScale to millionsSupabase is the Postgres develop...

Supacrawler Clean Output:

# Build in a weekend, Scale to millions
Supabase is the Postgres development platform.
Start your project with a Postgres database, Authentication, instant APIs, Edge Functions, Realtime subscriptions...

Supacrawler is purpose-built for LLMs and does significant additional processing: content cleaning, markdown conversion, metadata extraction, and noise removal. This creates overhead but delivers production-ready data.

Benchmark Results

Single Page Scrape (https://supabase.com):

Tool	Time	Content Quality	Processing
BeautifulSoup	0.36s	Raw HTML text	None
Supacrawler	0.20s	Clean Markdown	Full cleanup

Supacrawler is 1.83x faster and provides significantly higher data quality. Note that results are more variant for non chromium-launched pages, results below talk more about this.

Multi-Page Crawling (50 pages each):

Site	BeautifulSoup	Supacrawler	Winner
nodejs.org/docs	2.18s/page	1.31s/page	Supacrawler (1.7x faster)
docs.python.org	0.07s/page	0.14s/page	BeautifulSoup (2x faster)
go.dev/doc	0.50s/page	0.34s/page	Supacrawler (1.5x faster)

Pattern Analysis: On heavy content sites (Node.js docs), Supacrawler's optimized pipeline performs better despite the extra processing. On lightweight sites (Python docs), BeautifulSoup's minimal overhead wins. For JavaScript-heavy sites, only Supacrawler works.

When to Choose Each Tool

Choose BeautifulSoup when:

You need maximum speed for static HTML extraction
You're comfortable with manual content cleaning
You're parsing local HTML files
You have a simple one-off scraping task

Choose Supacrawler when:

You need LLM-ready, clean markdown output
You're scraping JavaScript-heavy sites
You want built-in retry logic and error handling
You need production-scale reliability and infrastructure
You want rich metadata extraction

See more benchmarks: Supacrawler vs Selenium and Supacrawler vs Playwright

Supacrawler vs BeautifulSoup: Local Performance Benchmarks

Identical Retry Logic

Why BeautifulSoup Is Sometimes Faster

Benchmark Results

When to Choose Each Tool

Product

Company

Blog

Support