All talks
Developer Tools

Navigating the Grey: Scaling from Single Worker to Multi-VM Undetectable Scrapers

From single scripts to multi-VM fleets: practical techniques to scale undetectable scrapers.

Navigating the Grey: Scaling from Single Worker to Multi-VM Undetectable Scrapers title slide
Type
Talk
Category
Developer Tools
Level
Intermediate
Duration
120 min
Language
Tagllish
web scrapingPlaywrightcurl_cffiCAPTCHAproxiesOCRasynchronousbot evasion

Abstract

A practical guide to scaling web scrapers from a single worker to multi-VM fleets. Covers execution models, scraping frameworks, headed vs headless trade-offs, bot evasion techniques, and real-world challenges like CAPTCHAs and geo-blocking. Includes code samples and a hands-on challenge with seven progressively difficult targets.

Outline

  1. 01Execution Models
  2. 02Scraping Frameworks
  3. 03Headed vs Headless
  4. 04Extension & API Scraping
  5. 05Agentic Scraping
  6. 06OCR Extraction
  7. 07Paid Web Searching
  8. 08Bot Evasion Techniques
  9. 09Scraping at Scale
  10. 10Scraping Legalities
  11. 11Q&A Session

Key takeaways

  • Compare synchronous vs asynchronous scraping and when to use each
  • Choose the right scraping framework for your workload (Requests, Scrapy, Playwright, curl_cffi)
  • Apply bot evasion techniques: fingerprints, proxies, cookie prewarming, and human-like behavior
  • Scale scrapers across multiple VMs while balancing concurrency, success rate, and cost
  • Navigate legal and technical obstacles like CAPTCHAs, geo-blocking, and rate limits

Delivered once