Back to Blog

Supacrawler vs Playwright: Local Python Performance Benchmarks

We benchmarked Playwright against Supacrawler for JavaScript-heavy web scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.

See benchmark code: Playwright vs Supacrawler benchmark.

Identical Retry Logic with JavaScript Rendering

This is a critical fairness test because both systems are rendering JavaScript - Playwright was specifically designed for browser automation. We used render_js=True for Supacrawler to ensure apples-to-apples comparison.

Supacrawler Service (internal/core/scrape/service.go):

maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
if attempt > 0 {
d := time.Duration(1<<(attempt-1)) * time.Second // 1s, 2s, 4s
time.Sleep(d)
}
// JavaScript rendering with Playwright backend
result, err = s.scrapeWithPlaywright(params.Url, includeHTML, format)
}

Playwright Benchmark (test notebook):

while attempt < max_retries and not success:
try:
page = await context.new_page()
await page.goto(url, wait_until='networkidle', timeout=30000)
# ... scraping logic
except Exception as e:
attempt += 1
if attempt < max_retries:
backoff = 2 ** (attempt - 1) # 1s, 2s, 4s
await asyncio.sleep(backoff)

Critical Setup Details:

  • JavaScript Rendering: Both use Playwright for full browser automation
  • Network Wait: Both wait for networkidle state ensuring JavaScript completion
  • Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
  • Timeouts: Both use 30-second timeouts for heavy JavaScript sites
  • Error Classification: Both only retry on retryable browser errors
  • Environment: Mac M4, 24GB RAM, Chromium headless mode

Why Supacrawler Outperforms Playwright

Architecture Advantage: Supacrawler uses a Go-based streaming worker pool vs Playwright's Python async sequential processing:

Supacrawler's Go Streaming Architecture (internal/core/crawl/service.go):

// Worker pool with concurrent processing
maxWorkers := 10 // 20 for non-JS, 2 for JS rendering
if renderJs {
maxWorkers = 2 // Optimized for JavaScript workloads
}
// Stream results as they complete
worker := func(id int) {
for u := range linksCh {
res, err := s.scrapeWithFreshOption(ctx, u, includeHTML, renderJs, fresh)
// Process immediately, stream results
pageChan <- &PageResult{URL: u, PageContent: &pageContent}
}
}
// Concurrent workers process URLs in parallel
for i := 0; i < maxWorkers; i++ {
wg.Add(1)
go worker(i + 1) // Go goroutines for true concurrency
}

Playwright's Python Async Processing:

# Async but still sequential processing bottleneck
for url in urls:
async with semaphore: # Limit concurrency
page = await context.new_page()
await page.goto(url) # Block until complete
# Extract data sequentially
await page.close()

Key Technical Differences:

  1. Concurrency Model: Go goroutines vs Python asyncio (still GIL-limited)
  2. Browser Management: Optimized cloud Playwright vs local browser overhead
  3. Memory Efficiency: Go's efficient memory model vs Python async overhead
  4. Network Stack: Supacrawler's optimized network handling vs Playwright protocol overhead
  5. Streaming: Real-time result streaming vs batch processing

Benchmark Results

Single Page Scrape (https://example.com) - JavaScript Rendering:

ToolTimeBrowser ManagementArchitectureResource Usage
Playwright7.58sLocal ChromiumPython asyncHigh CPU/Memory
Supacrawler1.21sCloud managedGo concurrentZero local

Supacrawler is 6.3x faster despite using the same Playwright engine internally.

Multi-Page Crawling Performance:

Test ScenarioSupacrawlerPlaywrightPerformance Gain
Single Page1.21s7.58s6.3x faster
5 Pages5.08s (1.02s/page)164.39s (32.88s/page)32.4x faster
50 Pages (avg)0.69s/page47.2s/page68.4x faster

Large-Scale Testing (50 pages each site):

WebsitePagesPlaywright AvgSupacrawler AvgPerformance Gain
supabase.com5037.43s/page0.65s/page57.9x faster
docs.python.org5055.51s/page0.71s/page78.6x faster
ai.google.dev5028.67s/page0.73s/page39.3x faster

The Content Quality Trade-off

Playwright Raw Output:

Supabase | The Postgres Development Platform.Product Developers
Solutions PricingDocsBlog88.3KSign inStart your projectOpen main
menu Build in a weekendScale to millions...

Supacrawler LLM-Ready Output:

# Build in a weekend, Scale to millions
Supabase is the Postgres development platform.
Start your project with a Postgres database, Authentication,
instant APIs, Edge Functions, Realtime subscriptions...

Supacrawler automatically removes navigation, ads, and boilerplate content while preserving structured content in clean markdown format - all while being dramatically faster than Playwright.

Why the Performance Gap is So Large

The dramatic performance difference (up to 78.6x faster) comes from several factors:

1. Browser Pool Management: Supacrawler maintains optimized browser pools in the cloud vs Playwright's local browser startup/teardown overhead.

2. Network Optimization: Supacrawler's Go-based network stack vs Playwright's WebDriver protocol overhead.

3. Concurrent Architecture: True Go concurrency vs Python's GIL limitations even with asyncio.

4. Infrastructure: Purpose-built scraping infrastructure vs general browser automation framework.

When to Choose Each Tool

Choose Playwright when:

  • You need full browser automation (clicking, form filling, testing)
  • You're building end-to-end testing suites
  • You need precise control over browser interactions
  • You're comfortable managing local browser infrastructure

Choose Supacrawler when:

  • You need high-performance web scraping at scale
  • You want LLM-ready, clean markdown output
  • You're building production data extraction pipelines
  • You need zero infrastructure management
  • Performance and reliability are critical

The massive performance advantage comes from Supacrawler's purpose-built architecture: Go's concurrency model, streaming worker pools, optimized browser management, and cloud infrastructure vs Playwright's Python async processing and local browser overhead.


Compare with other tools: Supacrawler vs Selenium | Supacrawler vs BeautifulSoup | Supacrawler vs Firecrawl

By Supacrawler Team
Published on September 12, 2025