Past

Towards Efficient and Accessible Geoparsing of Local Media: A Benchmark Dataset and LLM-based Approach

Date & Time

01/10/2025 1:15 pm – 2:45 pm

Simona Bisiani

Surrey Institute for People-Centred AI

Simona Bisiani is a Doctoral Researcher at the Surrey Institute for People-Centred Artificial Intelligence. Her PhD focuses on measuring spatial variations in news coverage in the UK, in order to understand the robustness of local media coverage across the country, how ownership consolidation affects media diversity and relevance, and how media diversity and relevance in turn affect democratic engagement. Her primary research methods are text mining through Natural Language Processing and statistical inference. She holds a MSc in Computational Social Science.

About the Event

Simona situated her path, a BA in journalism followed by computational social science, and spoke candidly about the learning curve that comes with picking up programming as a non-technical scholar. Her message was practical and encouraging, stick with the methods, pilot small, and keep the research question in front of the tools. Her focus here was local journalism in the United Kingdom and how to turn location mentions inside articles into structured geographic evidence that others can reuse.

As part of her focus on making her approach accessible and reproducible, Simona has shared her slides, code demo and other documentation directly on GitHub.

5 Key highlights

Seeing place in local news
Simona framed geoparsing as turning location mentions inside articles into structured geographic evidence. The aim is to analyze where coverage happens, across outlets, owners, and regions, in a sector shaped by outlet closures, newsroom centralization, and ownership concentration. This provides a way to study news deserts and proximity with content-level data rather than outlet counts alone.
Walking a runnable pipeline
She demonstrated a three-stage workflow, recognition, candidate lookup, and resolution. For recognition, Simona benchmarked spaCy on a local-news sample against human annotations and reported strong performance. For candidate lookup, she queried open geographic data, OpenStreetMap and the UK Ordnance Survey. For resolution, she converted candidates to administrative units and tested LLMs on the classification task using locally run models for privacy and cost control. Code and notebooks were demonstrated live to make these steps reproducible.
Keeping judgment in the loop
Model and prompt choices were tested on small samples before scaling. Simona varied prompts, temperatures, and added minimal versus richer metadata, then compared models programmatically. Results stressed practical guardrails, seed setting for reproducibility, timing runs, and using simple automation where reliable, with manual checks to refine rules and handle edge cases.
Evaluating, comparing, and scaling
Evaluation used two complementary views, accuracy on the classification task and spatial accuracy at 161 km, a standard in geoparsing. Experiments showed model choice drove performance most, lightweight metadata often helped more than lengthy context, prompt phrasing mattered, and temperature had little effect. She also tried simple ensembling across best configurations to probe robustness, and outlined batching and monitoring practices for long runs.
Opening policy conversations
With resolved locations, coverage can be mapped, clustered, and compared across outlets and owners. This connects methods work to debates on news deserts, ownership, and local accountability, creating evidence that can be tracked over time and linked to further qualitative work.

Watch the recording

Sign up to our seminars calendar

When you sign up, we’ll email you a link to the Data Methods Initiative Events Calendar Feed, where you can access Zoom links and stay updated on all future seminars.
You can also subscribe to our newsletter to receive detailed information, event reminders, and the latest news about our initiative directly in your inbox.
Your data is safe with us—we’ll never share it with anyone else. You can unsubscribe from our emails anytime by using the link in our emails.
For more details, check out our Privacy Notice.

This field is for validation purposes and should be left unchanged.

Name(Required)

First Last

Institutional affiliation

Academic discipline

Email(Required)

Which emails would you like to receive?(Required)

DMI Seminar Information

DMI Newsletter