Loading…
Venue: Z2.10 clear filter
Friday, May 23
 

1:15pm CEST

🤖 How LLMs can classify thousands of records in minutes
Friday May 23, 2025 1:15pm - 2:30pm CEST
You find yourself staring at a dataset with tens or hundreds of thousands of rows. Maybe you want to get up-to-date FOIA contact details for all government departments in your country, or to find out which political donors have links to the fossil fuels industry. What do you do?

Large Language Models (LLMs) can help journalists automate simple research and classification tasks that would take an unreasonably long time to do manually.

In this session, we'll outline how Global Witness has used LLMs, search engines and web scraping to help us identify fossil fuel lobbyists at COP29. These techniques can be applied to other investigations and research tasks.

The workshop will cover:

- An interactive classification demo
- Some basic tips on setting up a research/classification project
- The challenges of doing AI research at scale and how to address them
- Using more advanced tools

After attending this session, you will be able to take an existing dataset and automatically augment it with new data, opening up the potential for new stories and investigations.

If you want to follow along with the classification demo, you'll need to be able to run Jupyter Notebooks on your device or have a Google account. A basic understanding of Python would be useful, but we won't be writing any new code.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.10

3:00pm CEST

Cracking the code: how to use RegEx in your investigations
Friday May 23, 2025 3:00pm - 4:15pm CEST
When you unlock the power of regular expressions (RegEx) you supercharge your spreadsheet!

Participants will learn how to extract hidden patterns from text, clean messy datasets, and automate repetitive tasks using the RegEx formulas within Google Sheets (but this session is also a good intro if you want to apply it in other code).

Through practical examples—like extracting donations, cleaning salutation-heavy lists, and extracting postcodes—you will leave with the confidence to apply RegEx in your day-to-day data work.

Attendees will receive a RegEx cheat sheet (customisable for their own use) and a practical demo spreadsheet to take their skills to the next level. No prior experience of RegEx is required, but you should be comfortable writing formulas in Google Sheets/Excel. I will be sharing a Google Sheet containing the data for which you will require a Google account.
Speakers
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.10

4:45pm CEST

Investigating built expansion on protected natural areas 🌳: learn the basics of PostGIS
Friday May 23, 2025 4:45pm - 6:00pm CEST
Working with geospatial datasets is often critical to investigations dealing with where things are located, where an event occurred, land ownership, and the environment.

In this workshop, participants will learn the basics of using PostGIS to query and join geospatial datasets using the Arena+ "Europe in Grey" investigation as a case study.

We’ll run through a brief overview of PostGIS, the spatial data types, indexes, and functions it adds to Postgres, and a few of its strengths and weaknesses as a tool.

Participants will connect to an existing database containing a dataset of built expansion in Europe and a few datasets used in the Arena investigation. They’ll be guided through writing queries to reproduce some of the investigation’s results, answering questions like:
Which European country has lost the greatest proportion of its wild areas since 2018?
Which protected areas have had the most building on them?

To take part in this workshop, participants should feel comfortable writing basic SQL queries.
Participants should bring their laptops with either DBeaver or their favorite SQL client tool installed.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.10
 
Saturday, May 24
 

9:30am CEST

Teaching LLMs to build your Machine Learning Models
Saturday May 24, 2025 9:30am - 10:45am CEST
In this practical session, participants will learn how LLMs like ChatGPT can assist in writing machine learning code for journalistic investigations.

We’ll start by prompting ChatGPT to generate code for analyzing a small dataset. Then, we’ll apply the code to a larger dataset locally. After attending this session, participants will be able to train and use this machine learning model on their own.

This method was used by Frontstory.pl to analyze thousands of messages on Telegram and reveal the scale of drug trafficking activity in Poland.

To follow along, participants should be comfortable using Python and Jupyter Notebook.
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.10

11:15am CEST

More than just the Wayback Machine: how to investigate deleted and archived content
Saturday May 24, 2025 11:15am - 12:30pm CEST
Even among investigative journalists, web archives tend to be underrated – and undertaught. This hands-on session introduces journalists to powerful techniques for using web archives.
Participants will learn how to recover deleted or hidden content and archive key material from platforms like Instagram and X.
Using real-world examples, we’ll demonstrate how these skills can strengthen reporting across a wide range of stories, from everyday reporting to investigative longreads.
After this session, you will be able to retrieve archived content, recover deleted posts (not necessarily the same things!), and preserve online material using advanced web archiving tools and techniques. We will teach participants how to tweak the URL and use the asterisk, and we will demonstrate why the "Golden Hour" of archiving is so important in breaking news situations.
No prior experience is required—just an interest in digital sleuthing and a willingness to explore new tools.
Please bring a laptop with you, preferably with the Chrome browser installed.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.10

1:45pm CEST

Together at last: R and Python united in the Positron IDE
Saturday May 24, 2025 1:45pm - 3:00pm CEST
For years datajournalists have been forced to choose between learning R or Python in order to do data analysis with a scripted language. This meant the choice of IDE (integrated development environment – the app for writing and managing scripts and files) was always a  defining decision.

R users mostly turned to RStudio to maintain R and run scripts, make plots etc. Python users have had a variety of options – Google Colab, Jupyter, Anaconda etc to manage their scripts and projects.

Now there’s a program built to handle both languages in parallel (but not quite simultaneously!) - it's called Positron.

In this session we will introduce you to the Positron program. We will show you the interface, and how to get started with your usual coding language, before working through some scenarios where being able to move quickly from one language to the other is desirable. (And if you have examples of times when you’ve needed this facility, please bring them to this session)

You will ideally have some experience of R or Python, and some appetite for using the other language, perhaps even on deadline. If you want to follow along in the session, install Positron beforehand from https://positron.posit.co/
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.10

3:30pm CEST

đź’ˇStreamlit for building tools and collaborate with non-coders
Saturday May 24, 2025 3:30pm - 4:45pm CEST
With Streamlit, you can set up a web page in just a few lines of Python code to share your findings with your team or audience – or to collect information from them. Use it to swiftly try out an idea for publication before asking your IT department to develop it, or to let a colleague make use of a Python-scripted tool you've written. Or build yourself a chatbot to help navigate your own research, local and safe on your computer.
In this session, we’ll cover the basics of Streamlit and build a page where users can upload a PDF along with some information, send it to a Python function for processing, and display the results. More advanced users will learn how to build an LLM-powered chatbot.
Streamlit is a Python library, so you should have a basic understanding of Python. You also need to be the admin of your computer, or at least have permission to start a local web server on it. If you want to build a chatbot, you’ll need to install Ollama (ollama.com) and download a model such as Gemma3 (ollama.com/library/gemma3) before the session starts.
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.10

5:15pm CEST

Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Scraped data can often be the backbone of an investigation, but some websites are more difficult to scrape than others. This session will cover best practices for dealing with tricky sites, including coping with captchas, IP blocks, and browser fingerprinting. This is an advanced session aimed at people who already have experience of writing code to scrape websites and want to move up to the next level: participants will leave with an understanding of how to approach hard-to-scrape websites, plus the tradeoffs and costs of these approaches.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.10
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.