All talksDeveloper Tools 
Navigating the Grey: Scaling from Single Worker to Multi-VM Undetectable Scrapers
From single scripts to multi-VM fleets: practical techniques to scale undetectable scrapers.

- Type
- Talk
- Category
- Developer Tools
- Level
- Intermediate
- Duration
- 120 min
- Language
- Tagllish
web scrapingPlaywrightcurl_cffiCAPTCHAproxiesOCRasynchronousbot evasion
Abstract
A practical guide to scaling web scrapers from a single worker to multi-VM fleets. Covers execution models, scraping frameworks, headed vs headless trade-offs, bot evasion techniques, and real-world challenges like CAPTCHAs and geo-blocking. Includes code samples and a hands-on challenge with seven progressively difficult targets.
Outline
- 01Execution Models
- 02Scraping Frameworks
- 03Headed vs Headless
- 04Extension & API Scraping
- 05Agentic Scraping
- 06OCR Extraction
- 07Paid Web Searching
- 08Bot Evasion Techniques
- 09Scraping at Scale
- 10Scraping Legalities
- 11Q&A Session
Key takeaways
- Compare synchronous vs asynchronous scraping and when to use each
- Choose the right scraping framework for your workload (Requests, Scrapy, Playwright, curl_cffi)
- Apply bot evasion techniques: fingerprints, proxies, cookie prewarming, and human-like behavior
- Scale scrapers across multiple VMs while balancing concurrency, success rate, and cost
- Navigate legal and technical obstacles like CAPTCHAs, geo-blocking, and rate limits
Slides
Open in new tabDelivered once
EventOrganizerDateReach
- DataMasters Episode 7DiscordData Engineering PilipinasJun 6, 202620