Developer Tools

Navigating the Grey: Scaling from Single Worker to Multi-VM Undetectable Scrapers

From single scripts to multi-VM fleets: practical techniques to scale undetectable scrapers.

Type: Talk
Category: Developer Tools
Level: Intermediate
Duration: 120 min
Language: Tagllish

web scrapingPlaywrightcurl_cffiCAPTCHAproxiesOCRasynchronousbot evasion

Abstract

A practical guide to scaling web scrapers from a single worker to multi-VM fleets. Covers execution models, scraping frameworks, headed vs headless trade-offs, bot evasion techniques, and real-world challenges like CAPTCHAs and geo-blocking. Includes code samples and a hands-on challenge with seven progressively difficult targets.

Outline

01Execution Models
02Scraping Frameworks
03Headed vs Headless
04Extension & API Scraping
05Agentic Scraping
06OCR Extraction
07Paid Web Searching
08Bot Evasion Techniques
09Scraping at Scale
10Scraping Legalities
11Q&A Session

Key takeaways

Compare synchronous vs asynchronous scraping and when to use each
Choose the right scraping framework for your workload (Requests, Scrapy, Playwright, curl_cffi)
Apply bot evasion techniques: fingerprints, proxies, cookie prewarming, and human-like behavior
Scale scrapers across multiple VMs while balancing concurrency, success rate, and cost
Navigate legal and technical obstacles like CAPTCHAs, geo-blocking, and rate limits

Slides

Open in new tab

Delivered once

EventOrganizerDateReach

DataMasters Episode 7DiscordData Engineering PilipinasJun 6, 202620

Navigating the Grey: Scaling from Single Worker to Multi-VM Undetectable Scrapers

Abstract

Outline

Key takeaways

Slides

Delivered once

More talks

From Spark to System: Turning Ideas into Cloud-Driven Real-World Solutions

De-mystifying PyTorch for ASICs

Introduction to GitHub Copilot and AI Development