Finished! Looks like this project is out of data at the moment!

See Results

Thank you everyone who helped complete our projects!

FAQ

Our project takes an iterative approach to research, with different types of tasks linked to research in progress at different points of the project. Our current task is about accidents involving machinery.

In search of industrial accidents - current tasks

Our first crowdsourcing task for the project asked people to look for industrial accidents in newspaper articles. Our rule of thumb for that task was that machines are devices or equipment not powered by people or animals (e.g. not bicycles, horse-and-cart vehicles or manual equipment). This led to questions about the differences between our current understanding of machines compared to the 19th century understanding, which in part inspired our recent 'what was a machine?' tasks.

In search of 'machines' - recent tasks

We're currently working on linguistic research questions. These questions have two tasks - 'What's that machine? Describe it!' and 'What's that machine? Classify it!'. They will help us understand how people living through the Industrial Revolution talked about machines in newspaper articles.

'Describe it' will help create a 'lexicon' (a list of types of machines), and 'Classify it' will help us understand which of several meanings for 'machine' was relevant.

What is a 'machine'?

We've debated this a lot within the project team - and now you can help us define machines in the language of the time. Our tasks ask contributors to help create a list of 'machines' from contemporary accounts, and to classify different senses for the word 'machine'. We're interested in anything that was described as a machine, no matter how it was powered.

What if more than one machine is mentioned in an article?

We've limited examples of machines to one per article to save people from the tedium of doing multiple annotations about the same 'machine'. This means you might see unhighlighted 'machines' on an image.

Please only answer questions for the highlighted 'machine' phrase. This means we will know which specific instance of the word is relevant when we come to do our analysis.

We'll have multiple sets of images, and over time we should cover most types of machines, but if you see something particularly interesting you could tag it in Talk to bring it to our attention.

Should I use the text around the phrase to understand the 'machine'?

Yes please! Often vital information about a machine is given in earlier or later lines of text. These tasks rely on people's ability to find relevant context in those longer passages of text.

How much detail should I include when describing the 'machine'?

Type in the entire phrase relating to the machine. We're looking for the full compound noun - in other words, please include any additional detail that's included in the description of the machine.
For example, type in 'silent sewing machine' rather than 'sewing machine'; 'self-raking reaping machine' rather than 'reaping machine'; 'portable thrashing machine' rather than 'thrashing machine'.

Use words as close as possible to the most specific description of the 'machine' in the article. It's easier for us to process extra text than it is to recreate missing detail.

Brand or manufacturer names are usually not relevant, unless the inventor or manufacturer is part of the name. For example, if the text said 'Hoover' rather than 'Hoover vacuuming machine', 'Hoover' might stand for 'vacuum'. If the text said 'Bob's vacuuming machine', 'Bob's' doesn't stand for 'vacuum'. Sorry, Bob.

If in doubt, include more text, rather than less. If you're not sure whether a detail is relevant, you can use the 'unclear' button. Click the 'unclear' button to add '[unclear]' tags to the text box, then type the text you're not sure about between the 'unclear' tags. In this example, you might end up with: '[unclear]self-raking[/unclear] reaping machine'.

Why are 'machines' hard to define?

'Machine' is an especially tricky word. Its main descriptive meaning - as in, a mechanism or contrivance - has remained constant over time, but it has also absorbed lots of figurative meanings. 'Machine' also has many metaphorical applications. We want to find examples of different uses of the word 'machine' over the length and breadth of the newspapers available to us.

Where did the definitions of 'machine' used in the classification task come from?

Our historians and linguists selected them from a longer list of definitions for 'machine' in the Oxford English Dictionary (OED).

Why can't you automatically exclude obvious matches for different types of machines?

We're working on it! We didn't want to make assumptions about what we'd find in the data, so our first queries were simply for 'machine' or 'this machine'. As we get results from these tasks and build up our lexicon, we can review those queries to exclude known types of machines and focus on more unusual or complex language about machines.

Should people mention machines not named as such?

We're looking into computational methods for detecting machines in articles that talk about them without using the word 'machine', using data gathered through tasks like this.

Can people classify 'mechanical apparatus' more precisely?

We know that some of you would like the option to apply additional classifications for common types of machines, especially those just classified as 'mechanical apparatus'. That's understandable, particularly if you're more of a historian than a linguist! If natural categories for analysis emerge from our research, we may look at sub-categories for specific types of machines in future.

For now, our linguists are interested classifications based on their selected definitions from the Oxford English Dictionary. Their work on 'word-sense disambiguation' means they are most interested in analysing the meaning of the word rather than specific objects being referred to in the text.

Why are there so many sewing machines?

We don't (yet) know! It turns out there are a lot of ads for sewing machines in our datasets because ... there were a lot of ads for sewing machines. According to the The Advertising Age Encyclopedia of Advertising. London (eds McDonough, John, and Karen Egolf, Fitzroy Dearborn Publishers, 2002, page 754), sewing machines and typewriters were 'the most heavily advertised machines of the time' (the late 1860s).

In general, we're aiming to get a more intuitively realistic range of 'machines' in our queries that select articles from the overall corpus. This means carefully reviewing our sampling strategy to be sure that we understand the consequences of tweaking the query to exclude specific types of machines.

Why do you ask if the 'machine' is a metaphor, a place or an occupation?

Articles identified as containing metaphorical uses of 'machine', or references to trades, industries, occupations or places could be analysed in more detail in later stages of our project.

Why have I seen the same article more than once?

There are a few possibilities. Sometimes we use sets of images in more than one task. Sometimes articles look identical but might be regularly repeated advertisements or have been copied from one newspaper to another, and both versions were picked up during our data querying process. (You can see the title and date of the newspaper via the 'i' icon under an image). And sometimes unfortunately articles were picked up more than once in our queries, leading to exact duplicates appearing over different datasets. Any tasks done on these duplicate images won't go to waste.

We're tweaking our 'article querying and image processing' workflow so that duplicates will be removed when we review the results and update processes after these tasks are complete. We may also be able to exclude advertisements from future queries, if they were labelled as such in the metadata created during digitisation.

General questions

Why do you need our help - can't computers do this work?

Not yet! We're applying sophisticated 'natural language processing' techniques to analyse newspaper articles, but there's still no beating our human ability to classify complex texts.

The datasets created through these tasks help us iterate and improve the queries we run to find candidate articles. Your classifications also help us identify articles that are too hard to read, and those that aren't relevant to our research tasks. These 'negative result' datasets could also be used to improve article selection processes.

What if I see something interesting that doesn't fit into the tasks?

You can add a note in 'Talk' by using the 'Done & Talk' option. If you're already signed in to Zooniverse, you can also 'favourite' or 'collect' the image to share later.

We encourage you to use hashtags to collect articles that interest you. Using Tag Groups to Collect Images on Talk has more information on how other projects use hashtags.

Why do you sometimes ask if images are hard to read?

Text may be unreadable if it is too small because the overall page is too big, too light or too dark, if the background is unclear, or if the page is ripped or folded.

It can be useful for us to know which records are hard to read as it helps create a potential training dataset for machine learning processes to identify 'articles' that have been incorrectly marked up in the original digitisation. Similarly, some tasks ask whether there's more than one article in the image, which could help improve future article segmentation.

Where do these records come from?

We've collaborated with FindMyPast to access 10,000 records from the British Newspaper Archive. Subsequent images will be drawn from the newspaper collection of the British Library, via earlier digitisation projects and digitisation commissioned by the project.

What kinds of feedback do you want?

We're keen to make the tasks as enjoyable as possible. We know that Zooniverse participants have lots of expertise built on their experience with different projects. Does the task make sense? What works well - and what doesn't? Is there anything that puts you off participating?

Will the input from volunteers be in the public domain?

We think the data you help create should be as re-usable as possible. We'll share the datasets of annotations and classifications on the British Library's Research Repository. If you use them in your own research, we'd love to hear about it!