Finished! Looks like this project is out of data at the moment!

See Results

Nature SPAM Filter project is now complete! Thank you for all your help. Stay in touch to learn about results

Research

Creating a Tool for Accurate Filtering of Online Media for Wildlife Mentions

The idea behind the project

Thanks to the internet, we can now gather information from around the world in real time. Many researchers use online news articles to analyse how people perceive wildlife. However, using keyword searches could lead to mistakes — articles like Toronto Blue Jays Score Season High would be put in the wildlife category, while clearly referring to sports. We will be comparing human responses and machine learning models to develop a tool that can quickly identify content about wildlife. This will help conservation scientists more accurately use media to understand what motivates people to care about wildlife and act to protect it.


Image: David Tree shares his water with an injured koala. Source: The Guardian. Photograph: Mark Pardew/AP.

Our aim

We plan to analyse how effectively existing machine learning models (for example, a zero-shot classifier or a pre-trained large language model), can pick out real wild animal mentions in newspaper article titles. Using ground-truth data generated by you will allow us to test different approaches to find one that distinguishes titles about wildlife as accurately as possible.

How You Can Help

Help us create a data set to evaluate various language models' accuracy. By sorting through article titles and choosing those related to wildlife you will provide us with a crucial piece of information — the ground truth data! This means that your answers would be used as a reference point for machine learning model evaluation. By comparing human classifications to those from machine learning models, we can make conclusions about the effectiveness of these AI tools. The more accurate the tool, the easier it will be to see how people engage with wildlife, and how those sentiments change over time.

By volunteering your time and effort, you’ll play a vital part in bringing this vision to life. We’re excited to work with you to build a tool that can make a real, positive impact on wildlife conservation efforts worldwide. Together, we can create something that will benefit researchers, conservationists, and, most importantly, the wildlife we all care about.


Image: These are the sort of articles we are interested in. Source: Wildlife Conservation Trust.

Why This Project Matters

Understanding how people feel about wildlife can make a big difference in conservation efforts. When people care about animals, they’re more likely to take steps to protect them. But figuring out how people feel isn’t always easy. Online media is a great source of this information, and accurate ML tools can help tap into it.

What We Hope to Achieve

With your help, we aim to develop a powerful tool that will give us a clearer understanding of how people talk about wildlife through online media. Our goal is to share the insights we gain with other researchers and conservationists. We plan to publish our findings in an open-access research paper, showing how this tool can improve projects that rely on media data. By filtering out unrelated content, researchers will be able to get clearer, more insightful results about how people view wildlife and what drives them to care.

Data sources

We're using three different data sources for this project -- ABC News (Australia), Times of India and a Mixed Sources dataset combining titles from many different news sources (AG's corpus of news articles). You're welcome to choose which one you want to work with -- just pick a workflow you like!

Example of data analysis

Table: This is an example of how data generated by you will be used. This table lists how confident our chosen models are when answering the questions about a newspaper title, as well as the confidence of volunteer replies, which are 100% -- it is the ground truth. In this example, out of four tested models the pre-trained LLM model 3 had the best performance, correctly identifying that the headline is about wildlife with 95% confidence. By comparing human answers to those of different models we will be able to draw conclusions and decide which model is the best for this task and makes the least mistakes.