Back to Blog

Supacrawler vs BeautifulSoup: Local Performance Benchmarks

We benchmarked BeautifulSoup + requests against Supacrawler for static content scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.

See benchmark code: BeautifulSoup vs Supacrawler benchmark.

Identical Retry Logic

To ensure a completely fair comparison, we implemented the exact same retry and error handling logic in both systems. This is crucial because Supacrawler's production service has sophisticated retry mechanisms that could give it an unfair advantage if not matched in the test.

Supacrawler Service (internal/core/scrape/service.go):

maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
if attempt > 0 {
d := time.Duration(1<<(attempt-1)) * time.Second // 1s, 2s, 4s
time.Sleep(d)
}
// ... scraping logic
}

BeautifulSoup Benchmark (test notebook):

for attempt in range(max_retries):
try:
if attempt > 0:
backoff = 1 << (attempt - 1) # 1s, 2s, 4s
time.sleep(backoff)
response = session.get(url, timeout=10) # Same 10s timeout
# ... scraping logic
except Exception as e:
if is_retryable_error(e) and attempt < max_retries - 1:
continue

Critical Setup Details:

  • JavaScript Rendering: BeautifulSoup cannot execute JavaScript, so we used render_js=False for Supacrawler to ensure fair comparison
  • Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
  • Timeouts: Both use 10-second timeouts matching Supacrawler's HTTP client
  • Error Classification: Both only retry on 429, 503, timeouts - not 403/404
  • User Agent: Both use identical browser user agent strings

This setup ensures we're comparing like-for-like: static HTML scraping with identical error handling.

Why BeautifulSoup Is Sometimes Faster

The Trade-off: BeautifulSoup extracts raw HTML text while Supacrawler automatically cleans and structures the content into LLM-ready markdown. This explains the performance differences:

BeautifulSoup Raw Output:

Supabase | The Postgres Development Platform.Product Developers Solutions PricingDocsBlog88.3KSign inStart your projectOpen main menuBuild in a weekendScale to millionsSupabase is the Postgres develop...

Supacrawler Clean Output:

# Build in a weekend, Scale to millions
Supabase is the Postgres development platform.
Start your project with a Postgres database, Authentication, instant APIs, Edge Functions, Realtime subscriptions...

Supacrawler is purpose-built for LLMs and does significant additional processing: content cleaning, markdown conversion, metadata extraction, and noise removal. This creates overhead but delivers production-ready data.

Benchmark Results

Single Page Scrape (https://supabase.com):

ToolTimeContent QualityProcessing
BeautifulSoup0.36sRaw HTML textNone
Supacrawler0.20sClean MarkdownFull cleanup

Supacrawler is 1.83x faster and provides significantly higher data quality. Note that results are more variant for non chromium-launched pages, results below talk more about this.

Multi-Page Crawling (50 pages each):

SiteBeautifulSoupSupacrawlerWinner
nodejs.org/docs2.18s/page1.31s/pageSupacrawler (1.7x faster)
docs.python.org0.07s/page0.14s/pageBeautifulSoup (2x faster)
go.dev/doc0.50s/page0.34s/pageSupacrawler (1.5x faster)

Pattern Analysis: On heavy content sites (Node.js docs), Supacrawler's optimized pipeline performs better despite the extra processing. On lightweight sites (Python docs), BeautifulSoup's minimal overhead wins. For JavaScript-heavy sites, only Supacrawler works.

When to Choose Each Tool

Choose BeautifulSoup when:

  • You need maximum speed for static HTML extraction
  • You're comfortable with manual content cleaning
  • You're parsing local HTML files
  • You have a simple one-off scraping task

Choose Supacrawler when:

  • You need LLM-ready, clean markdown output
  • You're scraping JavaScript-heavy sites
  • You want built-in retry logic and error handling
  • You need production-scale reliability and infrastructure
  • You want rich metadata extraction

See more benchmarks: Supacrawler vs Selenium and Supacrawler vs Playwright

By Supacrawler Team
Published on September 12, 2025