Loading…
Type: Data skills clear filter
arrow_back View All Dates
Friday, May 23
 

1:15pm CEST

Beyond the pixels: the power of raster data in QGIS
Friday May 23, 2025 1:15pm - 2:30pm CEST
Manipulating and analyzing raster data can be intimidating, as it often appears more complex than vector data. However, raster data—such as satellite imagery or forest loss information—is essential for environmental and geographic storytelling. For example others, it enables journalists to assess vegetation health, visualize floods or droughts, and calculate deforested areas, even when true-colour satellite imagery is obscured by clouds.

In this hands-on session, participants will learn the key functions in QGIS needed to work with raster data. This includes loading raster layers, managing projections, setting band combinations (such as false color) for analysis, styling raster layers to enhance visibility, and performing raster calculations.

To attend this session, participants should have basic QGIS skills.

Before the session, please install QGIS on your laptops and make sure it is working properly. Download from: https://www.qgis.org/en/site/forusers/download.html

If you encounter any issues during installation, this guide may help: https://www.qgis.org/resources/installation-guide/
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.08

1:15pm CEST

Mapping independence: why and how newsrooms should host and style their own maps
Friday May 23, 2025 1:15pm - 2:30pm CEST
Tired of Google Maps' branding and tracking? In this session, we’ll explore how newsrooms can host their own map tiles using open-source tools such as Protomaps and OpenFreeMap. We’ll look at how to reduce costs, protect reader privacy, and gain full editorial control — alongside a hands-on workshop on customizing map styles with Maputnik. No prior knowledge is required for the workshop, though some experience working with maps will be a bonus, especially for the self-hosting part. The workshop takes place entirely in a browser. After this session, attendees will be familiar with alternatives to commercial map providers and know the foundations of map styling and self-hosting, empowering them to create maps that better serve their reporting and storytelling.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.09

1:15pm CEST

🤖 How LLMs can classify thousands of records in minutes
Friday May 23, 2025 1:15pm - 2:30pm CEST
You find yourself staring at a dataset with tens or hundreds of thousands of rows. Maybe you want to get up-to-date FOIA contact details for all government departments in your country, or to find out which political donors have links to the fossil fuels industry. What do you do?

Large Language Models (LLMs) can help journalists automate simple research and classification tasks that would take an unreasonably long time to do manually.

In this session, we'll outline how Global Witness has used LLMs, search engines and web scraping to help us identify fossil fuel lobbyists at COP29. These techniques can be applied to other investigations and research tasks.

The workshop will cover:

- An interactive classification demo
- Some basic tips on setting up a research/classification project
- The challenges of doing AI research at scale and how to address them
- Using more advanced tools

After attending this session, you will be able to take an existing dataset and automatically augment it with new data, opening up the potential for new stories and investigations.

If you want to follow along with the classification demo, you'll need to be able to run Jupyter Notebooks on your device or have a Google account. A basic understanding of Python would be useful, but we won't be writing any new code.
Friday May 23, 2025 1:15pm - 2:30pm CEST
Z2.10

3:00pm CEST

AI in code editors: save time when writing code
Friday May 23, 2025 3:00pm - 4:15pm CEST
In this session we will explore AI assisted code editors, and go over the obscure but necessary features that enable you to write web scrapers easily. We'll walk through examples of how to approach unfamiliar pages and web technologies, and how it's already being used to speed up development substantially.

To get the most out of this session, you should have basic knowledge of web scraping in Python.
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.08

3:00pm CEST

Browser puppetry: Playwright for dynamic website scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Playwright is a next-generation browser automation tool that allows you to use Python or JavaScript to scrape almost any web page. It can assist in downloading pages of government documents, capturing tweets before they get deleted, or simply breaking past the cookie consent banner. Beyond the basics, it can also easily take screenshots, monitor and log network requests, and even fit right into your traditional BeautifulSoup scraping approach.

We'll look at:

- Installing Playwright
- Accessing elements on the page
- Interacting with web pages (clicking, navigating, filling out forms)
- Taking screenshots
- Sending pages to traditional scraping tools like BeautifulSoup
- Common patterns including pagination and CAPTCHA breaking

For those of you familiar with tackling similar problems using Selenium: Playwright is a similar tool with a better interface, better install/upgrade process, and ten times the usability. It might be time to upgrade!

Participants should have a basic knowledge of Python and HTML, but we'll also cover how to breeze past those basics with AI assistance. To fully participate, participants should have Jupyter installed. Additional software and installation tips will be available at https://github.com/jsoma/dataharvest25-playwright-scraping
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.09

3:00pm CEST

Cracking the code: how to use RegEx in your investigations
Friday May 23, 2025 3:00pm - 4:15pm CEST
When you unlock the power of regular expressions (RegEx) you supercharge your spreadsheet!

Participants will learn how to extract hidden patterns from text, clean messy datasets, and automate repetitive tasks using the RegEx formulas within Google Sheets (but this session is also a good intro if you want to apply it in other code).

Through practical examples—like extracting donations, cleaning salutation-heavy lists, and extracting postcodes—you will leave with the confidence to apply RegEx in your day-to-day data work.

Attendees will receive a RegEx cheat sheet (customisable for their own use) and a practical demo spreadsheet to take their skills to the next level. No prior experience of RegEx is required, but you should be comfortable writing formulas in Google Sheets/Excel. I will be sharing a Google Sheet containing the data for which you will require a Google account.
Speakers
Friday May 23, 2025 3:00pm - 4:15pm CEST
Z2.10

4:45pm CEST

Data Magic made simple: three ways to crunch numbers in spreadsheets
Friday May 23, 2025 4:45pm - 6:00pm CEST
We know that thousands of lines in a dataset can be intimidating, especially if you’re not a programmer. Spreadsheets can do the heavy lifting — and mastering them is easier than you expect!

In this session, we will walk you through three different ways to dive into data using nothing but spreadsheet tools. Along the way, we’ll show you how to cross-check your calculations, ensuring your findings are accurate and reliable. Whether you’re a complete beginner or have already used spreadsheets in your work, you’ll leave with practical skills to handle data confidently without ever touching a line of code. Bring your laptop and join us to discover how easy and powerful data analysis can be!
Speakers
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.08

4:45pm CEST

Finding connections: transform your document collections into a graph visualisation
Friday May 23, 2025 4:45pm - 6:00pm CEST
A high level overview of the GraphRAG ecosystem. We aim to show:
- What GraphRAG is, and how it works
- How to prepare your documents
- How to build your own graph.
- How to interface with your graph using Python

These techniques can be used to gain a visual overview of the contents of document sets, and to find information in the documents without having to rely on keywords. Put simply, it enables you to make sense of large amounts of documents without having to read them all.

In the session you'll be making a graph from arbitrary documents, visualizing it, and using it to answer questions.

It will help if you've heard of RAG before, but this is not a prerequisite. To follow along, all you need is a browser and an e-mail address.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.09

4:45pm CEST

Investigating built expansion on protected natural areas 🌳: learn the basics of PostGIS
Friday May 23, 2025 4:45pm - 6:00pm CEST
Working with geospatial datasets is often critical to investigations dealing with where things are located, where an event occurred, land ownership, and the environment.

In this workshop, participants will learn the basics of using PostGIS to query and join geospatial datasets using the Arena+ "Europe in Grey" investigation as a case study.

We’ll run through a brief overview of PostGIS, the spatial data types, indexes, and functions it adds to Postgres, and a few of its strengths and weaknesses as a tool.

Participants will connect to an existing database containing a dataset of built expansion in Europe and a few datasets used in the Arena investigation. They’ll be guided through writing queries to reproduce some of the investigation’s results, answering questions like:
Which European country has lost the greatest proportion of its wild areas since 2018?
Which protected areas have had the most building on them?

To take part in this workshop, participants should feel comfortable writing basic SQL queries.
Participants should bring their laptops with either DBeaver or their favorite SQL client tool installed.
Friday May 23, 2025 4:45pm - 6:00pm CEST
Z2.10
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -