Name: Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems
Start: 2025-05-24T17:15:00+0200
End: 2025-05-24T18:30:00+0200

Saturday May 24, 2025 5:15pm - 6:30pm CEST

Z2.10

Scraped data can often be the backbone of an investigation, but some websites are more difficult to scrape than others. This session will cover best practices for dealing with tricky sites, including coping with captchas, IP blocks, and browser fingerprinting.

This is an advanced session aimed at people who already have experience of writing code to scrape websites and want to move up to the next level: participants will leave with an understanding of how to approach hard-to-scrape websites, plus the tradeoffs and costs of these approaches.

Speakers

Max Harlow

Bloomberg News

Max Harlow is a data reporter at Bloomberg News in London. He uses data and documents to cover topics including money in politics, corporate sleaze and international trade. He formerly worked at the Financial Times. He also runs Journocoders, a community group for journalists to develop... Read More →

Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.10

Data skills, Hands on