You find yourself staring at a dataset with tens or hundreds of thousands of rows. Maybe you want to get up-to-date FOIA contact details for all government departments in your country, or to find out which political donors have links to the fossil fuels industry. What do you do?
Large Language Models (LLMs) can help journalists automate simple research and classification tasks that would take an unreasonably long time to do manually.
In this session, we'll outline how Global Witness has used LLMs, search engines and web scraping to help us identify fossil fuel lobbyists at COP29. These techniques can be applied to other investigations and research tasks.
The workshop will cover:
- An interactive classification demo - Some basic tips on setting up a research/classification project - The challenges of doing AI research at scale and how to address them - Using more advanced tools
After attending this session, you will be able to take an existing dataset and automatically augment it with new data, opening up the potential for new stories and investigations.
If you want to follow along with the classification demo, you'll need to be able to run Jupyter Notebooks on your device or have a Google account. A basic understanding of Python would be useful, but we won't be writing any new code.
When you unlock the power of regular expressions (RegEx) you supercharge your spreadsheet!
Participants will learn how to extract hidden patterns from text, clean messy datasets, and automate repetitive tasks using the RegEx formulas within Google Sheets (but this session is also a good intro if you want to apply it in other code).
Through practical examples—like extracting donations, cleaning salutation-heavy lists, and extracting postcodes—you will leave with the confidence to apply RegEx in your day-to-day data work.
Attendees will receive a RegEx cheat sheet (customisable for their own use) and a practical demo spreadsheet to take their skills to the next level. No prior experience of RegEx is required, but you should be comfortable writing formulas in Google Sheets/Excel. I will be sharing a Google Sheet containing the data for which you will require a Google account.
Working with geospatial datasets is often critical to investigations dealing with where things are located, where an event occurred, land ownership, and the environment.
In this workshop, participants will learn the basics of using PostGIS to query and join geospatial datasets using the Arena+ "Europe in Grey" investigation as a case study.
We’ll run through a brief overview of PostGIS, the spatial data types, indexes, and functions it adds to Postgres, and a few of its strengths and weaknesses as a tool.
Participants will connect to an existing database containing a dataset of built expansion in Europe and a few datasets used in the Arena investigation. They’ll be guided through writing queries to reproduce some of the investigation’s results, answering questions like: Which European country has lost the greatest proportion of its wild areas since 2018? Which protected areas have had the most building on them?
To take part in this workshop, participants should feel comfortable writing basic SQL queries. Participants should bring their laptops with either DBeaver or their favorite SQL client tool installed.