Finished! Looks like this project is out of data at the moment!
This project is now complete! Many thanks to everyone who participated. See the results
Hospitals in England and around the world are increasingly moving from paper-based to computer systems to organise their services and to record information about the patients that pass through their doors. Recording this data electronically means it is much easier to store and retrieve, and once anonymised, these electronic health records become a huge potential asset for medical research. Studies which have traditionally relied on employing people to manually collect the data necessary to answer their specific research questions, may find that the data they need has already been collected in the normal course of patient care. By reusing this data, researchers can conduct much larger studies than they would have been able to do in the past, and to do so more quickly and at lower cost.
Some NHS providers, such as Oxford University Hospitals, have kept extensive records going back 20 years, including dates of admissions, conditions that were investigated or treated, and laboratory test results. This enables authorised researchers to study how different diseases have changed over time. However, the transition from data being typed into a computer by a health worker, to a data set that is suitable for a research study, is not nearly as straightforward or consistent as people might assume. Firstly, there are many steps along the way which are not visible to the end-users, where errors can potentially creep in. Secondly, hospital processes and machinery can and do change over time, which can lead to changes in the way data is recorded, even if there is no change in the actual patients coming in.
For example, the graph below shows the results for a certain type of blood test. You can see how the values before 1997 form two separate groups (an upper and a lower one, due to a mixture of measurement units being used), whereas after 1997 there are rarely any very low values. You can also see how in 2009 the average value suddenly drops from around 50 to around 30 (which was due to a change in the testing method used by the laboratory).
If a researcher using this type of data hasn’t inspected it carefully, and taken these sorts of changes into account, it can lead to flawed or simply incorrect results. In many cases, there will not be any central record of these types of changes (which are sometimes referred to as "artefacts") so the researcher has the responsibility to try to identify them themselves, which can be challenging when there is a lot of data and when there are few tools and little guidance to help.
In this project we will use human pattern-spotting skills to identify sudden, unexpected changes in data values that are likely to have been caused by these types of "infrastructure" changes, in order to do two things:
Lastly, we will create a tool that is freely available to other researchers so that they can use it to improve the quality of their own studies.
Better quality research will lead to better understanding of the causes and burden of disease, leading to better public health policy decisions, benefitting us all. Furthermore, these sorts of tools could potentially be used internally within hospitals to improve the accuracy of their routine management reporting, as well as to keep an eye on the data being entered during their day-to-day operations, so that if any unexpected changes or errors are found, they can be investigated and corrected quickly. Better quality data will lead to better local decision-making, whether for internal hospital management decisions and resource planning, or for individual patient care.
This project and its members are supported by funding from the Oxford Biomedical Research Centre and by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Healthcare Associated Infections and Antimicrobial Resistance at the University of Oxford in partnership with Public Health England (PHE) (NIHR200915)