Supacrawler vs Playwright: Local Python Performance Benchmarks

We benchmarked Playwright against Supacrawler for JavaScript-heavy web scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.

See benchmark code: Playwright vs Supacrawler benchmark.

Identical Retry Logic with JavaScript Rendering

This is a critical fairness test because both systems are rendering JavaScript - Playwright was specifically designed for browser automation. We used `` for Supacrawler to ensure apples-to-apples comparison.

Supacrawler Service (internal/core/scrape/service.go):

maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
    if attempt > 0 {
        d := time.Duration(1<<(attempt-1)) * time.Second  // 1s, 2s, 4s
        time.Sleep(d)
    }
    // JavaScript rendering with Playwright backend
    result, err = s.scrapeWithPlaywright(params.Url, includeHTML, format)
}

Playwright Benchmark (test notebook):

while attempt < max_retries and not success:
    try:
        page = await context.new_page()
        await page.goto(url, wait_until='networkidle', timeout=30000)
        # ... scraping logic
    except Exception as e:
        attempt += 1
        if attempt < max_retries:
            backoff = 2 ** (attempt - 1)  # 1s, 2s, 4s
            await asyncio.sleep(backoff)

Critical Setup Details:

JavaScript Rendering: Both use Playwright for full browser automation
Network Wait: Both wait for networkidle state ensuring JavaScript completion
Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
Timeouts: Both use 30-second timeouts for heavy JavaScript sites
Error Classification: Both only retry on retryable browser errors
Environment: Mac M4, 24GB RAM, Chromium headless mode

Why Supacrawler Outperforms Playwright

Architecture Advantage: Supacrawler uses a Go-based streaming worker pool vs Playwright's Python async sequential processing:

Supacrawler's Go Streaming Architecture (`internal/core/crawl/service.go`):

// Worker pool with concurrent processing
maxWorkers := 10  // 20 for non-JS, 2 for JS rendering  
if{
    maxWorkers = 2  // Optimized for JavaScript workloads
}

// Stream results as they complete
worker := func(id int) {
    for u := range linksCh {
        res, err := s.scrapeWithFreshOption(ctx, u, includeHTML, fresh)
        // Process immediately, stream results
        pageChan <- &PageResult{URL: u, PageContent: &pageContent}
    }
}

// Concurrent workers process URLs in parallel
for i := 0; i < maxWorkers; i++ {
    wg.Add(1)
    go worker(i + 1)  // Go goroutines for true concurrency
}

Playwright's Python Async Processing:

# Async but still sequential processing bottleneck
for url in urls:
    async with semaphore:  # Limit concurrency
        page = await context.new_page()
        await page.goto(url)  # Block until complete
        # Extract data sequentially
        await page.close()

Key Technical Differences:

Concurrency Model: Go goroutines vs Python asyncio (still GIL-limited)
Browser Management: Optimized cloud Playwright vs local browser overhead
Memory Efficiency: Go's efficient memory model vs Python async overhead
Network Stack: Supacrawler's optimized network handling vs Playwright protocol overhead
Streaming: Real-time result streaming vs batch processing

Benchmark Results

Single Page Scrape (https://example.com) - JavaScript Rendering:

Tool	Time	Browser Management	Architecture	Resource Usage
Playwright	7.58s	Local Chromium	Python async	High CPU/Memory
Supacrawler	1.21s	Cloud managed	Go concurrent	Zero local

Supacrawler is 6.3x faster despite using the same Playwright engine internally.

Multi-Page Crawling Performance:

Test Scenario	Supacrawler	Playwright	Performance Gain
Single Page	1.21s	7.58s	6.3x faster
5 Pages	5.08s (1.02s/page)	164.39s (32.88s/page)	32.4x faster
50 Pages (avg)	0.69s/page	47.2s/page	68.4x faster

Large-Scale Testing (50 pages each site):

Website	Pages	Playwright Avg	Supacrawler Avg	Performance Gain
supabase.com	50	37.43s/page	0.65s/page	57.9x faster
docs.python.org	50	55.51s/page	0.71s/page	78.6x faster
ai.google.dev	50	28.67s/page	0.73s/page	39.3x faster

The Content Quality Trade-off

Playwright Raw Output:

Supabase | The Postgres Development Platform.Product Developers 
Solutions PricingDocsBlog88.3KSign inStart your projectOpen main 
menu Build in a weekendScale to millions...

Supacrawler LLM-Ready Output:

# Build in a weekend, Scale to millions

Supabase is the Postgres development platform.

Start your project with a Postgres database, Authentication, 
instant APIs, Edge Functions, Realtime subscriptions...

Supacrawler automatically removes navigation, ads, and boilerplate content while preserving structured content in clean markdown format - all while being dramatically faster than Playwright.

Why the Performance Gap is So Large

The dramatic performance difference (up to 78.6x faster) comes from several factors:

1. Browser Pool Management: Supacrawler maintains optimized browser pools in the cloud vs Playwright's local browser startup/teardown overhead.

2. Network Optimization: Supacrawler's Go-based network stack vs Playwright's WebDriver protocol overhead.

3. Concurrent Architecture: True Go concurrency vs Python's GIL limitations even with asyncio.

4. Infrastructure: Purpose-built scraping infrastructure vs general browser automation framework.

When to Choose Each Tool

Choose Playwright when:

You need full browser automation (clicking, form filling, testing)
You're building end-to-end testing suites
You need precise control over browser interactions
You're comfortable managing local browser infrastructure

Choose Supacrawler when:

You need high-performance web scraping at scale
You want LLM-ready, clean markdown output
You're building production data extraction pipelines
You need zero infrastructure management
Performance and reliability are critical

The massive performance advantage comes from Supacrawler's purpose-built architecture: Go's concurrency model, streaming worker pools, optimized browser management, and cloud infrastructure vs Playwright's Python async processing and local browser overhead.

Compare with other tools: Supacrawler vs Selenium | Supacrawler vs BeautifulSoup | Supacrawler vs Firecrawl

Supacrawler vs Playwright: Local Python Performance Benchmarks

Identical Retry Logic with JavaScript Rendering

Why Supacrawler Outperforms Playwright

Supacrawler's Go Streaming Architecture (`internal/core/crawl/service.go`):

Playwright's Python Async Processing:

Benchmark Results

The Content Quality Trade-off

Why the Performance Gap is So Large

When to Choose Each Tool

Product

Company

Blog

Support

Supacrawler vs Playwright: Local Python Performance Benchmarks

Identical Retry Logic with JavaScript Rendering

Why Supacrawler Outperforms Playwright

Supacrawler's Go Streaming Architecture (internal/core/crawl/service.go):

Playwright's Python Async Processing:

Benchmark Results

The Content Quality Trade-off

Why the Performance Gap is So Large

When to Choose Each Tool

Supacrawler's Go Streaming Architecture (`internal/core/crawl/service.go`):