Supacrawler vs Playwright: Local Python Performance Benchmarks
We benchmarked Playwright against Supacrawler for JavaScript-heavy web scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.
See benchmark code: Playwright vs Supacrawler benchmark.
Identical Retry Logic with JavaScript Rendering
This is a critical fairness test because both systems are rendering JavaScript - Playwright was specifically designed for browser automation. We used render_js=True
for Supacrawler to ensure apples-to-apples comparison.
Supacrawler Service (internal/core/scrape/service.go
):
maxRetries := 3for attempt := 0; attempt < maxRetries; attempt++ {if attempt > 0 {d := time.Duration(1<<(attempt-1)) * time.Second // 1s, 2s, 4stime.Sleep(d)}// JavaScript rendering with Playwright backendresult, err = s.scrapeWithPlaywright(params.Url, includeHTML, format)}
Playwright Benchmark (test notebook):
while attempt < max_retries and not success:try:page = await context.new_page()await page.goto(url, wait_until='networkidle', timeout=30000)# ... scraping logicexcept Exception as e:attempt += 1if attempt < max_retries:backoff = 2 ** (attempt - 1) # 1s, 2s, 4sawait asyncio.sleep(backoff)
Critical Setup Details:
- JavaScript Rendering: Both use Playwright for full browser automation
- Network Wait: Both wait for
networkidle
state ensuring JavaScript completion - Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
- Timeouts: Both use 30-second timeouts for heavy JavaScript sites
- Error Classification: Both only retry on retryable browser errors
- Environment: Mac M4, 24GB RAM, Chromium headless mode
Why Supacrawler Outperforms Playwright
Architecture Advantage: Supacrawler uses a Go-based streaming worker pool vs Playwright's Python async sequential processing:
Supacrawler's Go Streaming Architecture (internal/core/crawl/service.go
):
// Worker pool with concurrent processingmaxWorkers := 10 // 20 for non-JS, 2 for JS renderingif renderJs {maxWorkers = 2 // Optimized for JavaScript workloads}// Stream results as they completeworker := func(id int) {for u := range linksCh {res, err := s.scrapeWithFreshOption(ctx, u, includeHTML, renderJs, fresh)// Process immediately, stream resultspageChan <- &PageResult{URL: u, PageContent: &pageContent}}}// Concurrent workers process URLs in parallelfor i := 0; i < maxWorkers; i++ {wg.Add(1)go worker(i + 1) // Go goroutines for true concurrency}
Playwright's Python Async Processing:
# Async but still sequential processing bottleneckfor url in urls:async with semaphore: # Limit concurrencypage = await context.new_page()await page.goto(url) # Block until complete# Extract data sequentiallyawait page.close()
Key Technical Differences:
- Concurrency Model: Go goroutines vs Python asyncio (still GIL-limited)
- Browser Management: Optimized cloud Playwright vs local browser overhead
- Memory Efficiency: Go's efficient memory model vs Python async overhead
- Network Stack: Supacrawler's optimized network handling vs Playwright protocol overhead
- Streaming: Real-time result streaming vs batch processing
Benchmark Results
Single Page Scrape (https://example.com) - JavaScript Rendering:
Tool | Time | Browser Management | Architecture | Resource Usage |
---|---|---|---|---|
Playwright | 7.58s | Local Chromium | Python async | High CPU/Memory |
Supacrawler | 1.21s | Cloud managed | Go concurrent | Zero local |
Supacrawler is 6.3x faster despite using the same Playwright engine internally.
Multi-Page Crawling Performance:
Test Scenario | Supacrawler | Playwright | Performance Gain |
---|---|---|---|
Single Page | 1.21s | 7.58s | 6.3x faster |
5 Pages | 5.08s (1.02s/page) | 164.39s (32.88s/page) | 32.4x faster |
50 Pages (avg) | 0.69s/page | 47.2s/page | 68.4x faster |
Large-Scale Testing (50 pages each site):
Website | Pages | Playwright Avg | Supacrawler Avg | Performance Gain |
---|---|---|---|---|
supabase.com | 50 | 37.43s/page | 0.65s/page | 57.9x faster |
docs.python.org | 50 | 55.51s/page | 0.71s/page | 78.6x faster |
ai.google.dev | 50 | 28.67s/page | 0.73s/page | 39.3x faster |
The Content Quality Trade-off
Playwright Raw Output:
Supabase | The Postgres Development Platform.Product DevelopersSolutions PricingDocsBlog88.3KSign inStart your projectOpen mainmenu Build in a weekendScale to millions...
Supacrawler LLM-Ready Output:
# Build in a weekend, Scale to millionsSupabase is the Postgres development platform.Start your project with a Postgres database, Authentication,instant APIs, Edge Functions, Realtime subscriptions...
Supacrawler automatically removes navigation, ads, and boilerplate content while preserving structured content in clean markdown format - all while being dramatically faster than Playwright.
Why the Performance Gap is So Large
The dramatic performance difference (up to 78.6x faster) comes from several factors:
1. Browser Pool Management: Supacrawler maintains optimized browser pools in the cloud vs Playwright's local browser startup/teardown overhead.
2. Network Optimization: Supacrawler's Go-based network stack vs Playwright's WebDriver protocol overhead.
3. Concurrent Architecture: True Go concurrency vs Python's GIL limitations even with asyncio.
4. Infrastructure: Purpose-built scraping infrastructure vs general browser automation framework.
When to Choose Each Tool
Choose Playwright when:
- You need full browser automation (clicking, form filling, testing)
- You're building end-to-end testing suites
- You need precise control over browser interactions
- You're comfortable managing local browser infrastructure
Choose Supacrawler when:
- You need high-performance web scraping at scale
- You want LLM-ready, clean markdown output
- You're building production data extraction pipelines
- You need zero infrastructure management
- Performance and reliability are critical
The massive performance advantage comes from Supacrawler's purpose-built architecture: Go's concurrency model, streaming worker pools, optimized browser management, and cloud infrastructure vs Playwright's Python async processing and local browser overhead.
Compare with other tools: Supacrawler vs Selenium | Supacrawler vs BeautifulSoup | Supacrawler vs Firecrawl