Loading…
Type: Data skills clear filter
Thursday, May 22
 

10:00am CEST

Masterclass: Hack your way into a big dataset with R (Masterclass ticket needed)
Thursday May 22, 2025 10:00am - 12:00pm CEST

A separate ticket is required to attend this masterclass. If you would like to attend but haven't yet purchased a ticket, please contact us at info@dataharvest.eu

R is one of the most useful programming languages in data journalism. You may have heard of it, maybe even tried it a little and found the learning curve too steep. If so, this session is for you.

We are going to spend the day looking at European Environment Agency’s EPRTR (European Pollutant Release and Transfer Register) data – it’s a lot of data, and some of it is quite messy. It contains dozens, probably hundreds, of potential lines of investigation to be explored – and that’s what we’re going to do.

By the end of the day, you will know how to import data in an R environment, filter it, reshape it, and interrogate it. You will be able to make some basic graphs. Above all, you will be on the way to finding stories in the day’s chosen data, and be able to take your script away and use it again, or adapt it to other datasets. And, we hope, you will have the beginnings of a story idea.

We will assume that you are familiar with spreadsheets, but that you have no knowledge of R. You will not need to install anything – everything will be run on cloud instances of R.

If you’re already advanced with R, it is still worth coming along to use and share what you know, to support others, and to learn something new.

(If you already have a dataset you want to work with – bring that too!)
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Thursday May 22, 2025 10:00am - 12:00pm CEST
Z1.16

1:00pm CEST

Masterclass: Hack your way into a big dataset with R (Masterclass ticket needed)
Thursday May 22, 2025 1:00pm - 3:00pm CEST
A separate ticket is required to attend this masterclass. If you would like to attend but haven't yet purchased a ticket, please contact us at info@dataharvest.eu

R is one of the most useful programming languages in data journalism. You may have heard of it, maybe even tried it a little and found the learning curve too steep. If so, this session is for you.

We are going to spend the day looking at European Environment Agency’s EPRTR (European Pollutant Release and Transfer Register) data – it’s a lot of data, and some of it is quite messy. It contains dozens, probably hundreds, of potential lines of investigation to be explored – and that’s what we’re going to do.

By the end of the day, you will know how to import data in an R environment, filter it, reshape it, and interrogate it. You will be able to make some basic graphs. Above all, you will be on the way to finding stories in the day’s chosen data, and be able to take your script away and use it again, or adapt it to other datasets.

We will assume that you are familiar with spreadsheets, but that you have no knowledge of R. You will not need to install anything – everything will be run on cloud instances of R.

If you’re already advanced with R, it is still worth coming along to use and share what you know, to support others, and to learn something new.

(If you already have a dataset you want to work with – bring that too!)
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Thursday May 22, 2025 1:00pm - 3:00pm CEST
Z1.16

3:30pm CEST

Masterclass: Hack your way into a big dataset with R (Masterclass ticket needed)
Thursday May 22, 2025 3:30pm - 5:00pm CEST

A separate ticket is required to attend this masterclass. If you would like to attend but haven't yet purchased a ticket, please contact us at info@dataharvest.eu

R is one of the most useful programming languages in data journalism. You may have heard of it, maybe even tried it a little and found the learning curve too steep. If so, this session is for you.

We are going to spend the day looking at European Environment Agency’s EPRTR (European Pollutant Release and Transfer Register) data – it’s a lot of data, and some of it is quite messy. It contains dozens, probably hundreds, of potential lines of investigation to be explored – and that’s what we’re going to do.

By the end of the day, you will know how to import data in an R environment, filter it, reshape it, and interrogate it. You will be able to make some basic graphs. Above all, you will be on the way to finding stories in the day’s chosen data, and be able to take your script away and use it again, or adapt it to other datasets.

We will assume that you are familiar with spreadsheets, but that you have no knowledge of R. You will not need to install anything – everything will be run on cloud instances of R.

If you’re already advanced with R, it is still worth coming along to use and share what you know, to support others, and to learn something new.

(If you already have a dataset you want to work with – bring that too!)
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Thursday May 22, 2025 3:30pm - 5:00pm CEST
Z1.16
 
Friday, May 23
 

1:15pm CEST

Beyond the pixels: the power of raster data in QGIS
Friday May 23, 2025 1:15pm - 2:30pm CEST
Manipulating and analyzing raster data can be intimidating, as it often appears more complex than vector data. However, raster data—such as satellite imagery or forest loss information—is essential for environmental and geographic storytelling. For example others, it enables journalists to assess vegetation health, visualize floods or droughts, and calculate deforested areas, even when true-colour satellite imagery is obscured by clouds.

In this hands-on session, participants will learn the key functions in QGIS needed to work with raster data. This includes loading raster layers, managing projections, setting band combinations (such as false color) for analysis, styling raster layers to enhance visibility, and performing raster calculations.

To attend this session, participants should have basic QGIS skills.

Before the session, please install QGIS on your laptops and make sure it is working properly. Download from: https://www.qgis.org/en/site/forusers/download.html

If you encounter any issues during installation, this guide may help: https://www.qgis.org/resources/installation-guide/
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.08

1:15pm CEST

Mapping independence: why and how newsrooms should host and style their own maps
Friday May 23, 2025 1:15pm - 2:30pm CEST
Tired of Google Maps' branding and tracking? In this session, we’ll explore how newsrooms can host their own map tiles using open-source tools such as Protomaps and OpenFreeMap. We’ll look at how to reduce costs, protect reader privacy, and gain full editorial control — alongside a hands-on workshop on customizing map styles with Maputnik. No prior knowledge is required for the workshop, though some experience working with maps will be a bonus, especially for the self-hosting part. The workshop takes place entirely in a browser. After this session, attendees will be familiar with alternatives to commercial map providers and know the foundations of map styling and self-hosting, empowering them to create maps that better serve their reporting and storytelling.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.09

1:15pm CEST

🤖 How LLMs can classify thousands of records in minutes
Friday May 23, 2025 1:15pm - 2:30pm CEST
You find yourself staring at a dataset with tens or hundreds of thousands of rows. Maybe you want to get up-to-date FOIA contact details for all government departments in your country, or to find out which political donors have links to the fossil fuels industry. What do you do?

Large Language Models (LLMs) can help journalists automate simple research and classification tasks that would take an unreasonably long time to do manually.

In this session, we'll outline how Global Witness has used LLMs, search engines and web scraping to help us identify fossil fuel lobbyists at COP29. These techniques can be applied to other investigations and research tasks.

The workshop will cover:

- An interactive classification demo
- Some basic tips on setting up a research/classification project
- The challenges of doing AI research at scale and how to address them
- Using more advanced tools

After attending this session, you will be able to take an existing dataset and automatically augment it with new data, opening up the potential for new stories and investigations.

If you want to follow along with the classification demo, you'll need to be able to run Jupyter Notebooks on your device or have a Google account. A basic understanding of Python would be useful, but we won't be writing any new code.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.10

3:00pm CEST

AI in code editors: save time when writing code
Friday May 23, 2025 3:00pm - 4:15pm CEST
In this session we will explore AI assisted code editors, and go over the obscure but necessary features that enable you to write web scrapers easily. We'll walk through examples of how to approach unfamiliar pages and web technologies, and how it's already being used to speed up development substantially.

To get the most out of this session, you should have basic knowledge of web scraping in Python.
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.08

3:00pm CEST

Browser puppetry: Playwright for dynamic website scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Playwright is a next-generation browser automation tool that allows you to use Python or JavaScript to scrape almost any web page. It can assist in downloading pages of government documents, capturing tweets before they get deleted, or simply breaking past the cookie consent banner. Beyond the basics, it can also easily take screenshots, monitor and log network requests, and even fit right into your traditional BeautifulSoup scraping approach.

We'll look at:

- Installing Playwright
- Accessing elements on the page
- Interacting with web pages (clicking, navigating, filling out forms)
- Taking screenshots
- Sending pages to traditional scraping tools like BeautifulSoup
- Common patterns including pagination and CAPTCHA breaking

For those of you familiar with tackling similar problems using Selenium: Playwright is a similar tool with a better interface, better install/upgrade process, and ten times the usability. It might be time to upgrade!

Participants should have a basic knowledge of Python and HTML, but we'll also cover how to breeze past those basics with AI assistance. To fully participate, participants should have Jupyter installed. Additional software and installation tips will be available at https://github.com/jsoma/dataharvest25-playwright-scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.09

3:00pm CEST

Cracking the code: how to use RegEx in your investigations
Friday May 23, 2025 3:00pm - 4:15pm CEST
When you unlock the power of regular expressions (RegEx) you supercharge your spreadsheet!

Participants will learn how to extract hidden patterns from text, clean messy datasets, and automate repetitive tasks using the RegEx formulas within Google Sheets (but this session is also a good intro if you want to apply it in other code).

Through practical examples—like extracting donations, cleaning salutation-heavy lists, and extracting postcodes—you will leave with the confidence to apply RegEx in your day-to-day data work.

Attendees will receive a RegEx cheat sheet (customisable for their own use) and a practical demo spreadsheet to take their skills to the next level. No prior experience of RegEx is required, but you should be comfortable writing formulas in Google Sheets/Excel. I will be sharing a Google Sheet containing the data for which you will require a Google account.
Speakers
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.10

4:45pm CEST

Data Magic made simple: three ways to crunch numbers in spreadsheets
Friday May 23, 2025 4:45pm - 6:00pm CEST
We know that thousands of lines in a dataset can be intimidating, especially if you’re not a programmer. Spreadsheets can do the heavy lifting — and mastering them is easier than you expect!

In this session, we will walk you through three different ways to dive into data using nothing but spreadsheet tools. Along the way, we’ll show you how to cross-check your calculations, ensuring your findings are accurate and reliable. Whether you’re a complete beginner or have already used spreadsheets in your work, you’ll leave with practical skills to handle data confidently without ever touching a line of code. Bring your laptop and join us to discover how easy and powerful data analysis can be!
Speakers
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.08

4:45pm CEST

Finding connections: transform your document collections into a graph visualisation
Friday May 23, 2025 4:45pm - 6:00pm CEST
A high level overview of the GraphRAG ecosystem. We aim to show:
- What GraphRAG is, and how it works
- How to prepare your documents
- How to build your own graph.
- How to interface with your graph using Python

These techniques can be used to gain a visual overview of the contents of document sets, and to find information in the documents without having to rely on keywords. Put simply, it enables you to make sense of large amounts of documents without having to read them all.

In the session you'll be making a graph from arbitrary documents, visualizing it, and using it to answer questions.

It will help if you've heard of RAG before, but this is not a prerequisite. To follow along, all you need is a browser and an e-mail address.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.09

4:45pm CEST

Investigating built expansion on protected natural areas 🌳: learn the basics of PostGIS
Friday May 23, 2025 4:45pm - 6:00pm CEST
Working with geospatial datasets is often critical to investigations dealing with where things are located, where an event occurred, land ownership, and the environment.

In this workshop, participants will learn the basics of using PostGIS to query and join geospatial datasets using the Arena+ "Europe in Grey" investigation as a case study.

We’ll run through a brief overview of PostGIS, the spatial data types, indexes, and functions it adds to Postgres, and a few of its strengths and weaknesses as a tool.

Participants will connect to an existing database containing a dataset of built expansion in Europe and a few datasets used in the Arena investigation. They’ll be guided through writing queries to reproduce some of the investigation’s results, answering questions like:
Which European country has lost the greatest proportion of its wild areas since 2018?
Which protected areas have had the most building on them?

To take part in this workshop, participants should feel comfortable writing basic SQL queries.
Participants should bring their laptops with either DBeaver or their favorite SQL client tool installed.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.10
 
Saturday, May 24
 

9:30am CEST

From source to chart 📈 : Using free tools to automate your data flows
Saturday May 24, 2025 9:30am - 10:45am CEST
DESCRIPTION:

For monitoring purposes and to inform our reporting, it’s helpful to keep an eye on trends over time in some datasets, for example when tracking the progress of an mpox outbreak, monitoring pre-election polls or investigating migration trends.

This is where automation comes in handy: with the data being automatically grabbed from the source, reconfigured and channeled into a chart for visualization, allowing data journalists and their colleagues to notice report-worthy trends early on.

To establish such a workflow, we have been relying on free tools like Datawrapper and Github Actions to run a Python script.

Participants will be guided through setting up such a workflow step-by-step.

LEARNING OBJECTIVES:

🚀 In this session you will learn how to...
- collect data from a url
- parse the data into the needed format using Python's `pandas` library
- use Python's `datawrapper` library to create a chart
- set up the script to run automatically on Github Actions

🔍 PREREQUISITES & TOOLS:

- ✅ Datawrapper API token
- ✅ Github Account (if you want to automate the chart update)
- optional: Code text editor (if you prefer working with your own code for automation), such as Atom or Sublime Text
- optional: Distill browser plugin (if you want to update on click)
- optional: basic understanding of Python/coding helpful, but not required
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.08

9:30am CEST

Python Without the pain: write code with LLMs
Saturday May 24, 2025 9:30am - 10:45am CEST
In an age where data is an important part of impactful storytelling, journalists need tools that enable them to work with it effectively. Python is a powerful resource for analyzing and visualizing data, but it can be intimidating for those without a technical background. This workshop breaks down those barriers, showing how AI tools like ChatGPT can make coding basics approachable and accessible. By equipping journalists with these skills, the workshop aims to empower them to create richer, data-driven stories and visualisations without relying heavily on external technical support.

The session will start with an overview of Python and how AI-assisted coding works, showcasing how these tools can simplify technical challenges, followed by real-life examples. Afterward, participants will dive into a hands-on session using Jupyter Notebook to practice running and adapting Python scripts. By the end, they’ll feel more confident tackling technical problems independently.

Participants are encouraged to have Python (with Jupyter Notebook) installed on their devices, or a Google Collab environment ready. You will also need a ChatGPT account set up before attending. While familiarity with Python is helpful, it’s not required.
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.09

9:30am CEST

Teaching LLMs to build your Machine Learning Models
Saturday May 24, 2025 9:30am - 10:45am CEST
In this practical session, participants will learn how LLMs like ChatGPT can assist in writing machine learning code for journalistic investigations.

We’ll start by prompting ChatGPT to generate code for analyzing a small dataset. Then, we’ll apply the code to a larger dataset locally. After attending this session, participants will be able to train and use this machine learning model on their own.

This method was used by Frontstory.pl to analyze thousands of messages on Telegram and reveal the scale of drug trafficking activity in Poland.

To follow along, participants should be comfortable using Python and Jupyter Notebook.
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.10

11:15am CEST

Make your own investigative application with minimal code
Saturday May 24, 2025 11:15am - 12:30pm CEST
Getting the most out of your investigative data requires more than ad-hoc scripting. Deep research requires a persistent state, data model, user tagging, collaboration, access management, and automatic updates. In short, you need a research application. In this session, you'll learn to use an open source, low-code application to turn a simple ETL into a complex app.

To follow along comfortably, you should be familiar with basic python scripting and REST APIs.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.08

11:15am CEST

Making maps with code
Saturday May 24, 2025 11:15am - 12:30pm CEST
Data journalists have traditionally thought of maps and spatial calculations as a job for special mapping software, like QGIS. But it's often more efficient to do GIS work in the script in which you perform the rest of your analysis.

In this class, you will see how easy it is to work with maps within your code.

The class will be taught in R, so some familiarity is recommended, but the skills are generic to all languages.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.09

11:15am CEST

More than just the Wayback Machine: how to investigate deleted and archived content
Saturday May 24, 2025 11:15am - 12:30pm CEST
Even among investigative journalists, web archives tend to be underrated – and undertaught. This hands-on session introduces journalists to powerful techniques for using web archives.
Participants will learn how to recover deleted or hidden content and archive key material from platforms like Instagram and X.
Using real-world examples, we’ll demonstrate how these skills can strengthen reporting across a wide range of stories, from everyday reporting to investigative longreads.
After this session, you will be able to retrieve archived content, recover deleted posts (not necessarily the same things!), and preserve online material using advanced web archiving tools and techniques. We will teach participants how to tweak the URL and use the asterisk, and we will demonstrate why the "Golden Hour" of archiving is so important in breaking news situations.
No prior experience is required—just an interest in digital sleuthing and a willingness to explore new tools.
Please bring a laptop with you, preferably with the Chrome browser installed.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.10

1:45pm CEST

Extract all relations in Game of Thrones using AI and Python
Saturday May 24, 2025 1:45pm - 3:00pm CEST
AI-models are great tools for structuring unstructured data, especially if you know some basic Python. Those language models can be used to find and classify all arguments in a bunch of documents, make statistics of political debates or hunt for greenwashing in corporate reports. In this session we will extract all relations in the Game of Thrones series and plot them as a network map. Come along if you want to know who slept with whom! (Here is an example of what it looks like: https://lasseedfast.se/got/ ). After this session, you will be able to use AI in combination with Python to systematically extract pieces of information from a large volume of data.

To follow along, participants need a basic understanding of Python.
Install Ollama  and download one of the models to follow along on your own laptop.
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.08

1:45pm CEST

Spreadsheets with superpowers: LLMs for data extraction and classification
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Lots of data and investigative journalism takes place in spreadsheets. Frequently, we want to  perform a task for every row in our spreadsheet. For instance, we may have cells containing:

- Quotes from a speech by a European politician that we want to classify into “Pro-EU”, “Anti-EU” or “Neutral”

- Company annual reports from which we want to extract the ultimate controlling party

- Political ads which we want to sift according to whether they mention immigration, directly or indirectly

In this session, participants will learn to write a custom AppScript function in Google Sheets that will enable them to apply Large Language Models (LLMs) from OpenAI and Anthropic to their spreadsheet data.

By the end, attendees will be able to write a formula like =LLM(A1, “gpt-4o”, “Is this text about immigration?”), then drag it down to apply it to hundreds of rows at once. This will enable us to apply the astonishing natural language capabilities of LLMs en masse to cells within our spreadsheet.

Attendees will acquire the following skills:

- Using AppScript to write custom functions in Google Sheets

- Using LLMs via APIs

- Some basic LLM prompting techniques and tips

- Understanding when an LLM is likely to be reliable (when its output is based entirely on data within the spreadsheet) and when it is more likely to hallucinate (when its output draws on its own limited knowledge of the world)
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.09

1:45pm CEST

Together at last: R and Python united in the Positron IDE
Saturday May 24, 2025 1:45pm - 3:00pm CEST
For years datajournalists have been forced to choose between learning R or Python in order to do data analysis with a scripted language. This meant the choice of IDE (integrated development environment – the app for writing and managing scripts and files) was always a  defining decision.

R users mostly turned to RStudio to maintain R and run scripts, make plots etc. Python users have had a variety of options – Google Colab, Jupyter, Anaconda etc to manage their scripts and projects.

Now there’s a program built to handle both languages in parallel (but not quite simultaneously!) - it's called Positron.

In this session we will introduce you to the Positron program. We will show you the interface, and how to get started with your usual coding language, before working through some scenarios where being able to move quickly from one language to the other is desirable. (And if you have examples of times when you’ve needed this facility, please bring them to this session)

You will ideally have some experience of R or Python, and some appetite for using the other language, perhaps even on deadline. If you want to follow along in the session, install Positron beforehand from https://positron.posit.co/
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.10

3:30pm CEST

Protests, TikTok, and More 🎥: Analyzing images and videos with AI
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Learn how AI can sort through images and video to help you wrangle footage from protests and riots (NYT), analyze trends on TikTok (Washington Post), keep an eye on your local school board meetings (Hearst), measure the effects of congestion pricing (Bloomberg), and a hundred other tidbits for when the cameras might be rolling.

With a little Python and a dash of foundational knowledge, this session will tackle downloading videos, building and evaluating transcripts, splitting scenes, categorizing images, and detecting/counting/tracking objects.

Participants will get the most out of this session if they have a working knowledge of Python. To follow along, you should have Jupyter installed on your computer or a Google account to use Google Colab. Additional materials and installation tips will be available at https://github.com/jsoma/dataharvest25-ai-images-video
Speakers
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.08

3:30pm CEST

Start looking: finding patterns in data with your eyes 👀
Saturday May 24, 2025 3:30pm - 4:45pm CEST
You've just obtained a big dataset. Where do you begin? How do you find the story buried within the rows and columns?

In this session, you will learn how to quickly become familiar with your data by making a series of charts that will illustrate not just the contents of your data but unveil patterns that can help guide your reporting.

This class will be taught in R, so some familiarity is recommended, but the skills are generic to all languages.
Speakers
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.09

3:30pm CEST

💡Streamlit for building tools and collaborate with non-coders
Saturday May 24, 2025 3:30pm - 4:45pm CEST
With Streamlit, you can set up a web page in just a few lines of Python code to share your findings with your team or audience – or to collect information from them. Use it to swiftly try out an idea for publication before asking your IT department to develop it, or to let a colleague make use of a Python-scripted tool you've written. Or build yourself a chatbot to help navigate your own research, local and safe on your computer.
In this session, we’ll cover the basics of Streamlit and build a page where users can upload a PDF along with some information, send it to a Python function for processing, and display the results. More advanced users will learn how to build an LLM-powered chatbot.
Streamlit is a Python library, so you should have a basic understanding of Python. You also need to be the admin of your computer, or at least have permission to start a local web server on it. If you want to build a chatbot, you’ll need to install Ollama (ollama.com) and download a model such as Gemma3 (ollama.com/library/gemma3) before the session starts.
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.10

5:15pm CEST

Advanced prompting for investigations
Saturday May 24, 2025 5:15pm - 6:30pm CEST
How can can journalists use metaprompting to let AI build system instructions for assistants that empower their investigations? Through examples of iterative metaprompting Rune Ytreberg will show you how to master the art of efficient prompting to empower your investigations. You can use metaprompting to build AI assistants that unlock the black box of AI.

After attending this session you will be able to use advanced prompting techniques (meta-prompting) to use a large language model (LLM) to make system instructions. These advanced prompts are detailed instructions that customizes AI assistants to support investigations. They are amomg others used in Open AI and Anthropic projects, and RAG platforms like Kotaemon or Anything LLM.

You can attend this session without any prior knowledge, but you would need some basic knowledge of prompting using tools like Chat GPT, Claude or similar, if you would like to do this as a hands on session.

If you like to follow this as a hands on session, have a paid version of Chat GPT, Claude or similar ready before the session starts.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.08

5:15pm CEST

AI cookbook 🥧: 6 recipes for the modern journalist
Saturday May 24, 2025 5:15pm - 6:30pm CEST
What if you could harness AI to automate repetitive tasks, extract meaningful insights from complex datasets, or even assist in storytelling? In this session, you’ll learn how to create practical, customizable workflows—“AI recipes”—designed to tackle real newsroom challenges.

Drawing inspiration from cutting-edge techniques in AI agent design, we’ll guide you through building tools that can annotate maps, analyze documents, and much more. Whether you’re a data journalist, editor, or simply curious about the potential of AI, this session will provide hands-on insights to integrate AI agents into your work.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.09

5:15pm CEST

Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Scraped data can often be the backbone of an investigation, but some websites are more difficult to scrape than others. This session will cover best practices for dealing with tricky sites, including coping with captchas, IP blocks, and browser fingerprinting. This is an advanced session aimed at people who already have experience of writing code to scrape websites and want to move up to the next level: participants will leave with an understanding of how to approach hard-to-scrape websites, plus the tradeoffs and costs of these approaches.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.10
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.