Loading…
Friday May 23, 2025 3:00pm - 4:15pm CEST
Playwright is a next-generation browser automation tool that allows you to use Python or JavaScript to scrape almost any web page. It can assist in downloading pages of government documents, capturing tweets before they get deleted, or simply breaking past the cookie consent banner. Beyond the basics, it can also easily take screenshots, monitor and log network requests, and even fit right into your traditional BeautifulSoup scraping approach.

We'll look at:

- Installing Playwright
- Accessing elements on the page
- Interacting with web pages (clicking, navigating, filling out forms)
- Taking screenshots
- Sending pages to traditional scraping tools like BeautifulSoup
- Common patterns including pagination and CAPTCHA breaking

For those of you familiar with tackling similar problems using Selenium: Playwright is a similar tool with a better interface, better install/upgrade process, and ten times the usability. It might be time to upgrade!

Participants should have a basic knowledge of Python and HTML, but we'll also cover how to breeze past those basics with AI assistance. To fully participate, participants should have Jupyter installed. Additional software and installation tips will be available at https://github.com/jsoma/dataharvest25-playwright-scraping
Speakers
avatar for Nicu Calcea

Nicu Calcea

Senior Data Investigator, Global Witness
I’m a journalist with 14 years of experience in media, specialised in data reporting. Currently based in London.
avatar for Jonathan Soma

Jonathan Soma

Professor, Columbia University
Jonathan Soma is Knight Chair in Data Journalism at Columbia Journalism School, where he directs both the Data Journalism MS and the summer intensive Lede Program. His courses there cover everything from basic Python and analysis to ai2html and machine learning. Right now he's very... Read More →
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.09

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link