Loading…
Venue: Z2.09 clear filter
arrow_back View All Dates
Friday, May 23
 

1:15pm CEST

Mapping independence: why and how newsrooms should host and style their own maps
Friday May 23, 2025 1:15pm - 2:30pm CEST
Tired of Google Maps' branding and tracking? In this session, we’ll explore how newsrooms can host their own map tiles using open-source tools such as Protomaps and OpenFreeMap. We’ll look at how to reduce costs, protect reader privacy, and gain full editorial control — alongside a hands-on workshop on customizing map styles with Maputnik. No prior knowledge is required for the workshop, though some experience working with maps will be a bonus, especially for the self-hosting part. The workshop takes place entirely in a browser. After this session, attendees will be familiar with alternatives to commercial map providers and know the foundations of map styling and self-hosting, empowering them to create maps that better serve their reporting and storytelling.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.09

3:00pm CEST

Browser puppetry: Playwright for dynamic website scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Playwright is a next-generation browser automation tool that allows you to use Python or JavaScript to scrape almost any web page. It can assist in downloading pages of government documents, capturing tweets before they get deleted, or simply breaking past the cookie consent banner. Beyond the basics, it can also easily take screenshots, monitor and log network requests, and even fit right into your traditional BeautifulSoup scraping approach.

We'll look at:

- Installing Playwright
- Accessing elements on the page
- Interacting with web pages (clicking, navigating, filling out forms)
- Taking screenshots
- Sending pages to traditional scraping tools like BeautifulSoup
- Common patterns including pagination and CAPTCHA breaking

For those of you familiar with tackling similar problems using Selenium: Playwright is a similar tool with a better interface, better install/upgrade process, and ten times the usability. It might be time to upgrade!

Participants should have a basic knowledge of Python and HTML, but we'll also cover how to breeze past those basics with AI assistance. To fully participate, participants should have Jupyter installed. Additional software and installation tips will be available at https://github.com/jsoma/dataharvest25-playwright-scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.09

4:45pm CEST

Finding connections: transform your document collections into a graph visualisation
Friday May 23, 2025 4:45pm - 6:00pm CEST
A high level overview of the GraphRAG ecosystem. We aim to show:
- What GraphRAG is, and how it works
- How to prepare your documents
- How to build your own graph.
- How to interface with your graph using Python

These techniques can be used to gain a visual overview of the contents of document sets, and to find information in the documents without having to rely on keywords. Put simply, it enables you to make sense of large amounts of documents without having to read them all.

In the session you'll be making a graph from arbitrary documents, visualizing it, and using it to answer questions.

It will help if you've heard of RAG before, but this is not a prerequisite. To follow along, all you need is a browser and an e-mail address.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.09
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -