Loading…
Venue: Z2.09 clear filter
Friday, May 23
 

1:15pm CEST

Mapping independence: why and how newsrooms should host and style their own maps
Friday May 23, 2025 1:15pm - 2:30pm CEST
Tired of Google Maps' branding and tracking? In this session, we’ll explore how newsrooms can host their own map tiles using open-source tools such as Protomaps and OpenFreeMap. We’ll look at how to reduce costs, protect reader privacy, and gain full editorial control — alongside a hands-on workshop on customizing map styles with Maputnik. No prior knowledge is required for the workshop, though some experience working with maps will be a bonus, especially for the self-hosting part. The workshop takes place entirely in a browser. After this session, attendees will be familiar with alternatives to commercial map providers and know the foundations of map styling and self-hosting, empowering them to create maps that better serve their reporting and storytelling.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.09

3:00pm CEST

Browser puppetry: Playwright for dynamic website scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Playwright is a next-generation browser automation tool that allows you to use Python or JavaScript to scrape almost any web page. It can assist in downloading pages of government documents, capturing tweets before they get deleted, or simply breaking past the cookie consent banner. Beyond the basics, it can also easily take screenshots, monitor and log network requests, and even fit right into your traditional BeautifulSoup scraping approach.

We'll look at:

- Installing Playwright
- Accessing elements on the page
- Interacting with web pages (clicking, navigating, filling out forms)
- Taking screenshots
- Sending pages to traditional scraping tools like BeautifulSoup
- Common patterns including pagination and CAPTCHA breaking

For those of you familiar with tackling similar problems using Selenium: Playwright is a similar tool with a better interface, better install/upgrade process, and ten times the usability. It might be time to upgrade!

Participants should have a basic knowledge of Python and HTML, but we'll also cover how to breeze past those basics with AI assistance. To fully participate, participants should have Jupyter installed. Additional software and installation tips will be available at https://github.com/jsoma/dataharvest25-playwright-scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.09

4:45pm CEST

Finding connections: transform your document collections into a graph visualisation
Friday May 23, 2025 4:45pm - 6:00pm CEST
A high level overview of the GraphRAG ecosystem. We aim to show:
- What GraphRAG is, and how it works
- How to prepare your documents
- How to build your own graph.
- How to interface with your graph using Python

These techniques can be used to gain a visual overview of the contents of document sets, and to find information in the documents without having to rely on keywords. Put simply, it enables you to make sense of large amounts of documents without having to read them all.

In the session you'll be making a graph from arbitrary documents, visualizing it, and using it to answer questions.

It will help if you've heard of RAG before, but this is not a prerequisite. To follow along, all you need is a browser and an e-mail address.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.09
 
Saturday, May 24
 

9:30am CEST

Python Without the pain: write code with LLMs
Saturday May 24, 2025 9:30am - 10:45am CEST
In an age where data is an important part of impactful storytelling, journalists need tools that enable them to work with it effectively. Python is a powerful resource for analyzing and visualizing data, but it can be intimidating for those without a technical background. This workshop breaks down those barriers, showing how AI tools like ChatGPT can make coding basics approachable and accessible. By equipping journalists with these skills, the workshop aims to empower them to create richer, data-driven stories and visualisations without relying heavily on external technical support.

The session will start with an overview of Python and how AI-assisted coding works, showcasing how these tools can simplify technical challenges, followed by real-life examples. Afterward, participants will dive into a hands-on session using Jupyter Notebook to practice running and adapting Python scripts. By the end, they’ll feel more confident tackling technical problems independently.

Participants are encouraged to have Python (with Jupyter Notebook) installed on their devices, or a Google Collab environment ready. You will also need a ChatGPT account set up before attending. While familiarity with Python is helpful, it’s not required.
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.09

11:15am CEST

Making maps with code
Saturday May 24, 2025 11:15am - 12:30pm CEST
Data journalists have traditionally thought of maps and spatial calculations as a job for special mapping software, like QGIS. But it's often more efficient to do GIS work in the script in which you perform the rest of your analysis.

In this class, you will see how easy it is to work with maps within your code.

The class will be taught in R, so some familiarity is recommended, but the skills are generic to all languages.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.09

1:45pm CEST

Spreadsheets with superpowers: LLMs for data extraction and classification
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Lots of data and investigative journalism takes place in spreadsheets. Frequently, we want to  perform a task for every row in our spreadsheet. For instance, we may have cells containing:

- Quotes from a speech by a European politician that we want to classify into “Pro-EU”, “Anti-EU” or “Neutral”

- Company annual reports from which we want to extract the ultimate controlling party

- Political ads which we want to sift according to whether they mention immigration, directly or indirectly

In this session, participants will learn to write a custom AppScript function in Google Sheets that will enable them to apply Large Language Models (LLMs) from OpenAI and Anthropic to their spreadsheet data.

By the end, attendees will be able to write a formula like =LLM(A1, “gpt-4o”, “Is this text about immigration?”), then drag it down to apply it to hundreds of rows at once. This will enable us to apply the astonishing natural language capabilities of LLMs en masse to cells within our spreadsheet.

Attendees will acquire the following skills:

- Using AppScript to write custom functions in Google Sheets

- Using LLMs via APIs

- Some basic LLM prompting techniques and tips

- Understanding when an LLM is likely to be reliable (when its output is based entirely on data within the spreadsheet) and when it is more likely to hallucinate (when its output draws on its own limited knowledge of the world)
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.09

3:30pm CEST

Start looking: finding patterns in data with your eyes 👀
Saturday May 24, 2025 3:30pm - 4:45pm CEST
You've just obtained a big dataset. Where do you begin? How do you find the story buried within the rows and columns?

In this session, you will learn how to quickly become familiar with your data by making a series of charts that will illustrate not just the contents of your data but unveil patterns that can help guide your reporting.

This class will be taught in R, so some familiarity is recommended, but the skills are generic to all languages.
Speakers
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.09

5:15pm CEST

AI cookbook 🥧: 6 recipes for the modern journalist
Saturday May 24, 2025 5:15pm - 6:30pm CEST
What if you could harness AI to automate repetitive tasks, extract meaningful insights from complex datasets, or even assist in storytelling? In this session, you’ll learn how to create practical, customizable workflows—“AI recipes”—designed to tackle real newsroom challenges.

Drawing inspiration from cutting-edge techniques in AI agent design, we’ll guide you through building tools that can annotate maps, analyze documents, and much more. Whether you’re a data journalist, editor, or simply curious about the potential of AI, this session will provide hands-on insights to integrate AI agents into your work.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.09
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.