Thanks a million (actually five and half million)!
The past eleven months have been wildly exciting! We never dreamed when we started out that we’d get such huge support for our project from the community of Zooniverse volunteers – 5.5 million classifications in all! At times it’s been a struggle processing results fast enough to maintain a supply of new snippets for classification. Once the final workflows are finished, we need to do some final processing before supplying data to the Office for National Statistics for release later in the year. We’ll do our best to publicise the release when we have further details.
Thanks again for all your support.
The 1961 UK Census is a uniquely valuable record of our society at a time of great social change in the UK. However, much of the 1961 Census data remains in microfilm format, which makes them inaccessible and useless for modern analyses.
This project aims to breathe new life into the data by retrieving and processing content from the Office for National Statistics 1961 Census Image Library. We want to make these data publicly available as a digital dataset, allowing people to use these data for their research.
Individual census records are confidential for 100 years, so we are not dealing with any census data that can identify an individual. We are dealing with data which has been aggregated to an area, for example the number of divorced males aged 30-34 living in Battersea.
We are processing 1961 English and Welsh data. These data have been digitised at different geographical areas, such as Districts, Wards and Enumeration Districts.
The image below shows a 1961 Census printout. It contains several tables which give information about a particular area, such as the number of households without hot water, or the number of people born in Cyprus.
We produced templates for all of these tables with regions for each cell. These allow us to “cookie cut” the values (persons, households, dwellings, rooms, establishments, etc.) into a data file.
As part of the template we assigned each variable a unique ID. Here you can see the template for one table (SH13 - Age and Marital Condition by Five Year Age Groups):
Using cutting edge OCR (Optical Character Recognition) techniques we have been able to extract around 97% of the values, but we need help with the remaining 3%, which the OCR either hasn’t recognised, or where we know there are errors due to discrepancies. This is why many of the values people check look fairly easy to interpret. This is where the Zooniverse site comes in. It’s a simple task, we are asking people to type what they see in the box (or most closely associated with the box):
You can see examples and further information in the tutorial, which can be found in the Classify section.
We are also undertaking work to compare values and identify inconsistencies:
This work helps us to target additional QA (Quality Assurance) work effectively, so we are only asking people to check what really needs to be checked.
The 1961 tables that we’re processing include the following information:
In case you wondered, the 1961 data was processed using an IBM 705 computer owned by Royal Army Pay Corps. The original machine still exists and is on display at IBM UKs headquarters at Hursley Park, near Winchester
We are a team based at the Pattern Recognition and Image Analysis (PRImA) research lab, University of Salford. We have received funding from the Office for National Statistics. We are all interested in data, and keen to make the 1961 Census aggregate data open to all. Take a look at the PRImA 1961 Census project page.
You’ll be helping to bring the 1961 data back to life, which allows anyone to see the results easily. We will make the resulting dataset open access.