Loading…
Type: Data skills clear filter
arrow_back View All Dates
Saturday, May 24
 

9:30am CEST

From source to chart 📈 : Using free tools to automate your data flows
Saturday May 24, 2025 9:30am - 10:45am CEST
DESCRIPTION:

For monitoring purposes and to inform our reporting, it’s helpful to keep an eye on trends over time in some datasets, for example when tracking the progress of an mpox outbreak, monitoring pre-election polls or investigating migration trends.

This is where automation comes in handy: with the data being automatically grabbed from the source, reconfigured and channeled into a chart for visualization, allowing data journalists and their colleagues to notice report-worthy trends early on.

To establish such a workflow, we have been relying on free tools like Datawrapper and Github Actions to run a Python script.

Participants will be guided through setting up such a workflow step-by-step.

LEARNING OBJECTIVES:

🚀 In this session you will learn how to...
- collect data from a url
- parse the data into the needed format using Python's `pandas` library
- use Python's `datawrapper` library to create a chart
- set up the script to run automatically on Github Actions

🔍 PREREQUISITES & TOOLS:

- ✅ Datawrapper API token
- ✅ Github Account (if you want to automate the chart update)
- optional: Code text editor (if you prefer working with your own code for automation), such as Atom or Sublime Text
- optional: Distill browser plugin (if you want to update on click)
- optional: basic understanding of Python/coding helpful, but not required
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.08

9:30am CEST

Python Without the pain: write code with LLMs
Saturday May 24, 2025 9:30am - 10:45am CEST
In an age where data is an important part of impactful storytelling, journalists need tools that enable them to work with it effectively. Python is a powerful resource for analyzing and visualizing data, but it can be intimidating for those without a technical background. This workshop breaks down those barriers, showing how AI tools like ChatGPT can make coding basics approachable and accessible. By equipping journalists with these skills, the workshop aims to empower them to create richer, data-driven stories and visualisations without relying heavily on external technical support.

The session will start with an overview of Python and how AI-assisted coding works, showcasing how these tools can simplify technical challenges, followed by real-life examples. Afterward, participants will dive into a hands-on session using Jupyter Notebook to practice running and adapting Python scripts. By the end, they’ll feel more confident tackling technical problems independently.

Participants are encouraged to have Python (with Jupyter Notebook) installed on their devices, or a Google Collab environment ready. You will also need a ChatGPT account set up before attending. While familiarity with Python is helpful, it’s not required.
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.09

9:30am CEST

Teaching LLMs to build your Machine Learning Models
Saturday May 24, 2025 9:30am - 10:45am CEST
In this practical session, participants will learn how LLMs like ChatGPT can assist in writing machine learning code for journalistic investigations.

We’ll start by prompting ChatGPT to generate code for analyzing a small dataset. Then, we’ll apply the code to a larger dataset locally. After attending this session, participants will be able to train and use this machine learning model on their own.

This method was used by Frontstory.pl to analyze thousands of messages on Telegram and reveal the scale of drug trafficking activity in Poland.

To follow along, participants should be comfortable using Python and Jupyter Notebook.
Saturday May 24, 2025 9:30am - 10:45am CEST
Z2.10

11:15am CEST

Make your own investigative application with minimal code
Saturday May 24, 2025 11:15am - 12:30pm CEST
Getting the most out of your investigative data requires more than ad-hoc scripting. Deep research requires a persistent state, data model, user tagging, collaboration, access management, and automatic updates. In short, you need a research application. In this session, you'll learn to use an open source, low-code application to turn a simple ETL into a complex app.

To follow along comfortably, you should be familiar with basic python scripting and REST APIs.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.08

11:15am CEST

Making maps with code
Saturday May 24, 2025 11:15am - 12:30pm CEST
Data journalists have traditionally thought of maps and spatial calculations as a job for special mapping software, like QGIS. But it's often more efficient to do GIS work in the script in which you perform the rest of your analysis.

In this class, you will see how easy it is to work with maps within your code.

The class will be taught in R, so some familiarity is recommended, but the skills are generic to all languages.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.09

11:15am CEST

More than just the Wayback Machine: how to investigate deleted and archived content
Saturday May 24, 2025 11:15am - 12:30pm CEST
Even among investigative journalists, web archives tend to be underrated – and undertaught. This hands-on session introduces journalists to powerful techniques for using web archives.
Participants will learn how to recover deleted or hidden content and archive key material from platforms like Instagram and X.
Using real-world examples, we’ll demonstrate how these skills can strengthen reporting across a wide range of stories, from everyday reporting to investigative longreads.
After this session, you will be able to retrieve archived content, recover deleted posts (not necessarily the same things!), and preserve online material using advanced web archiving tools and techniques. We will teach participants how to tweak the URL and use the asterisk, and we will demonstrate why the "Golden Hour" of archiving is so important in breaking news situations.
No prior experience is required—just an interest in digital sleuthing and a willingness to explore new tools.
Please bring a laptop with you, preferably with the Chrome browser installed.
Saturday May 24, 2025 11:15am - 12:30pm CEST
Z2.10

1:45pm CEST

Extract all relations in Game of Thrones using AI and Python
Saturday May 24, 2025 1:45pm - 3:00pm CEST
AI-models are great tools for structuring unstructured data, especially if you know some basic Python. Those language models can be used to find and classify all arguments in a bunch of documents, make statistics of political debates or hunt for greenwashing in corporate reports. In this session we will extract all relations in the Game of Thrones series and plot them as a network map. Come along if you want to know who slept with whom! (Here is an example of what it looks like: https://lasseedfast.se/got/ ). After this session, you will be able to use AI in combination with Python to systematically extract pieces of information from a large volume of data.

To follow along, participants need a basic understanding of Python.
Install Ollama  and download one of the models to follow along on your own laptop.
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.08

1:45pm CEST

Spreadsheets with superpowers: LLMs for data extraction and classification
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Lots of data and investigative journalism takes place in spreadsheets. Frequently, we want to  perform a task for every row in our spreadsheet. For instance, we may have cells containing:

- Quotes from a speech by a European politician that we want to classify into “Pro-EU”, “Anti-EU” or “Neutral”

- Company annual reports from which we want to extract the ultimate controlling party

- Political ads which we want to sift according to whether they mention immigration, directly or indirectly

In this session, participants will learn to write a custom AppScript function in Google Sheets that will enable them to apply Large Language Models (LLMs) from OpenAI and Anthropic to their spreadsheet data.

By the end, attendees will be able to write a formula like =LLM(A1, “gpt-4o”, “Is this text about immigration?”), then drag it down to apply it to hundreds of rows at once. This will enable us to apply the astonishing natural language capabilities of LLMs en masse to cells within our spreadsheet.

Attendees will acquire the following skills:

- Using AppScript to write custom functions in Google Sheets

- Using LLMs via APIs

- Some basic LLM prompting techniques and tips

- Understanding when an LLM is likely to be reliable (when its output is based entirely on data within the spreadsheet) and when it is more likely to hallucinate (when its output draws on its own limited knowledge of the world)
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.09

1:45pm CEST

Together at last: R and Python united in the Positron IDE
Saturday May 24, 2025 1:45pm - 3:00pm CEST
For years datajournalists have been forced to choose between learning R or Python in order to do data analysis with a scripted language. This meant the choice of IDE (integrated development environment – the app for writing and managing scripts and files) was always a  defining decision.

R users mostly turned to RStudio to maintain R and run scripts, make plots etc. Python users have had a variety of options – Google Colab, Jupyter, Anaconda etc to manage their scripts and projects.

Now there’s a program built to handle both languages in parallel (but not quite simultaneously!) - it's called Positron.

In this session we will introduce you to the Positron program. We will show you the interface, and how to get started with your usual coding language, before working through some scenarios where being able to move quickly from one language to the other is desirable. (And if you have examples of times when you’ve needed this facility, please bring them to this session)

You will ideally have some experience of R or Python, and some appetite for using the other language, perhaps even on deadline. If you want to follow along in the session, install Positron beforehand from https://positron.posit.co/
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Saturday May 24, 2025 1:45pm - 3:00pm CEST
Z2.10

3:30pm CEST

Protests, TikTok, and more 🎥: analyzing images and videos with AI
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Learn how AI can sort through images and video to help you wrangle footage from protests and riots (NYT), analyze trends on TikTok (Washington Post), keep an eye on your local school board meetings (Hearst), measure the effects of congestion pricing (Bloomberg), and a hundred other tidbits for when the cameras might be rolling.

With a little Python and a dash of foundational knowledge, this session will cover downloading videos, building and evaluating transcripts, splitting scenes, categorizing images, and detecting/counting/tracking objects.

Participants will get the most out of this session if they have a working knowledge of Python. To follow along, you should have Jupyter installed on your computer or a Google account to use Google Colab. Additional materials and installation tips will be available at https://github.com/jsoma/dataharvest25-ai-images-video
Speakers
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.08

3:30pm CEST

Start looking: finding patterns in data with your eyes 👀
Saturday May 24, 2025 3:30pm - 4:45pm CEST
You've just obtained a big dataset. Where do you begin? How do you find the story buried within the rows and columns?

In this session, you will learn how to quickly become familiar with your data by making a series of charts that will illustrate not just the contents of your data but unveil patterns that can help guide your reporting.

This class will be taught in R, so some familiarity is recommended, but the skills are common to all languages.
Speakers
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.09

3:30pm CEST

💡Streamlit for building tools and collaborate with non-coders
Saturday May 24, 2025 3:30pm - 4:45pm CEST
With Streamlit, you can set up a web page in just a few lines of Python code to share your findings with your team or your audience – or to collect information from them. Use it to swiftly try out an idea for publication before asking your IT department to develop it, or to let a colleague make use of a Python-scripted tool you've written. Or build yourself a chatbot to help navigate your own research, local and safe on your computer.
In this session, we’ll cover the basics of Streamlit and build a page where users can upload a PDF along with some information, send it to a Python function for processing, and display the results. More advanced users will learn how to build an LLM-powered chatbot.
Streamlit is a Python library, so you should have a basic understanding of Python. You also need to be the admin of your computer, or at least have permission to start a local web server on it. If you want to build a chatbot, you’ll need to install Ollama and download a model such as Gemma3 (ollama.com/library/gemma3) before the session starts.
Saturday May 24, 2025 3:30pm - 4:45pm CEST
Z2.10

5:15pm CEST

Advanced AI prompts for investigations
Saturday May 24, 2025 5:15pm - 6:30pm CEST
How can can journalists use metaprompting to let AI build system instructions for assistants to empower investigations? Through examples of iterative metaprompting Rune Ytreberg will show you how to master the art of efficient prompting, and use metaprompting to build AI assistants to unlock the black box that is AI.

After attending this session you will be able to use advanced prompting techniques (meta-prompting) to use a large language model (LLM) to make system instructions. These advanced prompts are detailed instructions that customize AI assistants supporting investigations. Among others they are used in Open AI and Anthropic projects, and RAG platforms like Kotaemon or Anything LLM.

You need no special knowledge to attend this session, but if you would like to follow along on your computer you will need some basic knowledge of prompting tools such as Chat GPT, Claude or similar, and will need to have signed up for a paid version of your chosen AI platform before the session.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.08

5:15pm CEST

AI cookbook 🥧: 6 recipes for the modern journalist
Saturday May 24, 2025 5:15pm - 6:30pm CEST
What if you could harness AI to automate repetitive tasks, extract meaningful insights from complex datasets, or even assist in storytelling? In this session, you’ll learn how to create practical, customizable workflows—“AI recipes”—designed to tackle real newsroom challenges.

Drawing inspiration from cutting-edge techniques in AI agent design, we’ll guide you through building tools that can annotate maps, analyze documents, and much more. Whether you’re a data journalist, editor, or simply curious about the potential of AI, this session will provide hands-on insights to integrate AI agents into your work.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.09

5:15pm CEST

Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Scraped data can often be the backbone of an investigation, but some websites are more difficult to scrape than others. This session will cover best practices for dealing with tricky sites, including coping with captchas, IP blocks, and browser fingerprinting.

This is an advanced session aimed at people who already have experience of writing code to scrape websites and want to move up to the next level: participants will leave with an understanding of how to approach hard-to-scrape websites, plus the tradeoffs and costs of these approaches.
Speakers
Saturday May 24, 2025 5:15pm - 6:30pm CEST
Z2.10
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -