Finished! Looks like this project is out of data at the moment!
Hello Genome Detectives. Thank you again for your incredible efforts - we have posted an update on the project results tab. We are taking a break while we develop the next iteration of Genome Detectives, but have left you a data set here for practice only – PLEASE NOTE THESE DATA ARE ALREADY COMPLETE. If you are ready for more of a challenge then please check out our new 'Training Academy' website. To browse other active projects that still need your classifications, check out zooniverse projects.
In this project you will help us to decode genes found in bacteria that cause serious infections and disease in humans. The genes that you will classify are important for the growth and survival of bacteria, the development of antibiotic resistance or the focus of vaccine development. Your work will help us to identify genetic types that result in differences among bacteria, for example making them more likely to cause disease or more resistant to a drug or antibiotic. The genomic data you shall investigate are part of the PubMLST database, based at the University of Oxford.
Infectious diseases remain a threat to life and to our day-to-day activities, as we have seen from the 2019 coronavirus pandemic. Control of infectious diseases, monitoring their spread, understanding their biology, and finding ways to prevent and treat them are a priority for scientists and doctors who study infectious diseases. In recent years, global efforts to reduce the burden of infectious diseases on humans have turned to studying the DNA of living organisms. DNA encodes the genes of all life on earth. Investigating the variation in DNA among individuals of the same species tells us how likely they are to cause disease and the characteristics and severity of that disease.
To study the changes among bacterial genes, which we call genetic variation, the PubMLST database was set up in 1998. This open-access and publicly available resource also includes analysis tools that help scientists and doctors to explore bacterial genes and genomes. The database includes >100 different species of bacteria and the largest databases have tens of thousands of genomes in them. These genomic data are stored, organised and catalogued mostly using automated computer programs, but there are aspects of the work that require the human eye. Scientists, public health specialists, and doctors around the world rely on the PubMLST database to monitor and research diseases that impact human health such as septicaemia, meningitis, food poisoning, and sexually-transmitted infections caused by: Neisseria meningitidis, Streptococcus pneumoniae, Group B streptococcus, Campylobacter jejuni, Salmonella species, and Neisseria gonorrhoeae, and many others.
With thousands of genomes being sequenced all over the world each day, each of which contains thousands of genes, there is a very large amount of data crunching to do and a limit to how much the international community of scientists working in the genomics field can do on their own! That is where you and the Zooniverse can help. That is why we are extending our "community curation” approach beyond the scientific community to include everyone who wants to help. By getting interested and enthusiastic people involved, the Zooniverse community will contribute to wider scientific discoveries that inform health promotion and policy. You will be helping scientists by characterising genes and contributing to the analysis of genomic data from global bacterial collections deposited in the PubMLST genome database.
DNA is the basic code for all living creatures. It encodes the characteristics and functions of the simplest of life forms, bacteria and viruses, to the most complex, including humans. All the DNA in one creature is described as its "genome".
We can now study the genomes of virtually any living thing, because decoding DNA has become easier over the last 20 years with the development of DNA sequencing technology, to read this code. DNA code contains genes, which are sections of DNA that encode information to do specific things. Most genes are processed to make a specific protein. Proteins are important building blocks of all creatures and have various structures and functions, e.g. our skin is coloured by the protein melanin, our food is digested by the protein enzymes amylases, proteases or lipases, each specialised to breakdown certain food types. Similarly, bacteria need proteins to maintain their structure and function as part of their growth, survival and replication processes. Therefore, the genome is important because it can provide information about how bacteria will look and behave. This is of particular interest for scientists studying bacterial infections, as proteins can be part of vaccines and affect antibiotics working effectively.
First, bacterial samples are collected. These may be from the environment, for example from farms or food (e.g. Campylobacter species that cause food poisoning) or from humans, as bacteria we carry on/in themselves naturally (our normal ‘microbiota’) or when we have an infectious disease (e.g. Neisseria meningitidis that causes meningitis or sepsis).
Next, the bacteria are grown on culture plates containing special media in the laboratory. The DNA is extracted from inside the bacteria, processed, and a sequencing machine is used to read the DNA code. The output is a VERY large amount of data, in the form of a string of four letters: C, G, A, and T, e.g. CCAATGACTAGTACAGATACAAACGTA. The letters of the code (called bases) are shorthand for the molecules: C=cytosine, A=adenine, G=guanine, T=thymine. Long strings of these letters provide the code to make a protein, the functional units that we all need to survive (similar to the letters of the alphabet forming words that has a specific meaning).
The bacteria and genes you will classify could be from any of the >100 bacteria on PubMLST. Our focus is on disease-causing bacteria, contributing to research into preventing human infectious diseases. Here are some of the diseases we are studying:
The bacteria that cause meningitis are generally carried in the throat of healthy people, but very occasionally they invade into the blood stream to cause sepsis or travel across the blood-brain barrier to cause meningitis. These bacteria include Neisseria meningitidis, Streptococcus pneumoniae and Haemophilus influenzae. These bacteria are transmitted through the respiratory droplets between people, through activities such as coughing, sneezing, kissing or even breathing in confined spaces. In newborn babies another group of bacteria are important, Group B streptococcus and they can be infected from passing through the birth canal. There are vaccines against many of these bacteria and we can study the impact of existing vaccines and identify new targets for future vaccine development using genomics.
Bacteria are all around our environment, including in the food and water we consume. When we eat food contaminated with harmful bacteria, we can get gastroenteritis, or food poisoning. The bacteria that can cause food poisoning include: Campylobacter, Salmonella, Shigella, E. coli, and Listeria. These bacteria can be found in the gut of some animals we eat, such as chickens, cows, sheep or other produce can be affected such as vegetables or cheese. By studying the genomes of these bacteria we can identify the food source and work to stop more people becoming infected, contributing to food safety. We can also study the antibiotic resistance mechanisms that these bacteria have developed, which impacts how doctors may treat severely unwell patients.
The PubMLST database has been supported by the Wellcome Trust since its inception in 1998. The team of researchers working on this project are all based in the Department of Zoology, University of Oxford, UK.