5-Minute Guide: Your First Web Scrape with Supacrawler
Web scraping should be one of the simplest tasks for a developer. You have a URL. You want the text. It sounds like a problem that should take about thirty seconds to solve.
And yet, it never is.
The moment you try, you fall into a rabbit hole of complexity. You need to install a library like BeautifulSoup or Cheerio. Then you need a way to fetch the page, so you add an HTTP client. But what if the site is built with React or Vue? Now you need a full headless browser like Puppeteer or Playwright, which means managing a heavyweight Chrome instance on your server. Before you’ve even extracted a single headline, you have a dozen dependencies and a brittle script that will break the moment a class name changes.
This is the frustrating reality that drives most developers away from small, interesting projects. The "work about the work" is so overwhelming that the original, simple idea dies.
What if we could get rid of all of it? What if we could go back to the original, thirty-second version of the dream?
Your First Scrape: A Single Command
For demonstration purposes let's scrape the official Gemini API documentation page from Google. All you need is your API key (you get 500 free credits when you sign up) and a terminal.
Here is the command:
Note: If you prefer using our SDKs, install them first or see the Install guide to get set up: Install the SDKs.
Extract clean content from any webpage
curl -G https://api.supacrawler.com/api/v1/scrape \-H "Authorization: Bearer YOUR_API_KEY" \-d url="https://ai.google.dev/gemini-api/docs" \-d format="markdown"
After a few seconds you receive the output:
{"url": "https://ai.google.dev/gemini-api/docs","content": "# Gemini Developer API\n\n[Get a Gemini API Key](https://aistudio.google.com/apikey)\n\nGet a Gemini API key and make your first API request in minutes.\n\n## Python\n```\nfrom google import genai\n\nclient = genai.Client(api_key=\"YOUR_API_KEY\")\n\nresponse = client.models.generate_content(\n model=\"gemini-2.0-flash\",\n contents=\"Explain how AI works\",\n)\n\nprint(response.text)\n```","title": "Gemini API | Google AI for Developers","metadata": {"title": "Gemini API | Google AI for Developers","description": "Gemini Developer API Docs and API Reference","ogTitle": "Gemini API | Google AI for Developers","ogImage": "https://ai.google.dev/static/site-assets/images/share-gemini-api.png","ogSiteName": "Google AI for Developers","language": "en"}}
That's it. There is no step two. The response you get back is a simple JSON object containing the clean content of the page, ready to be used. When you made that API call, a lot of the frustrating work we talked about was done for you, on our servers.
The important thing is that you didn't have to think about any of it. The complexity was handled. You were left with the simple result you wanted in the first place.
What's Next?
You now have a reliable way to turn any URL into clean, machine-readable content. This simple primitive is the foundation for thousands of interesting projects.
You could build:
- An AI-powered tool that summarizes articles.
- A script that archives your favorite blog posts.
- A service that feeds content into a vector database for RAG applications.
The goal of Supacrawler is to make the first step of any data project so simple that you actually start it. We handle the friction so you can focus on building.
Ready to try it yourself? Sign up to get your free API key and run your first scrape in the next five minutes.