Finished! Looks like this project is out of data at the moment!
(Updated Feb 18th) Thanks for your patience -- more handwriting images will be coming next week!
Julian A. Steyermark was a 20th century botanist who was raised in St. Louis, and worked for the Field Museum for over 20 years. He collected over 130,000 plants during his career (many from Venezuela, Guatemala, and the United States), and described over 2,000 taxa. You can read more about him in this obituary.
Most of his collection is stored as herbarium sheets, which are commonly used to store dried plant specimens. A lot of information on a herbarium sheet appears within the label: where and when the plant was found, by whom, in what kind of habitat, and so on.
These herbarium sheet labels are often handwritten, and transcribing these labels enables important research, both at the Field Museum and around the world.
There are many challenges when digitizing handwritten collections like these. The handwriting is often difficult to read, or it may, for example, use copious shorthand, abbreviations, or scientific terms.
Once digitized, making the information useful comes with many challenges. For example, the taxon for a given sheet may be outdated or a synonym, and may even be incorrectly identified. Some of Steyermark's specimens were collected 100 years ago, meaning that many geopolitical place names have changed or no longer exist. Additionally, any of the information may be vague or even erroneous (for example, due to human error or a poorly calibrated compass).
In spite of these challenges above, the importance and usefulness of this information for research makes them worth tackling!
Being able to access and understand these older specimen opens up many useful kinds of research. For example, comparing recent specimens to older ones can help us understand the extent to which a species has been impacted by climate change. Given the current extinction crisis, these avenues of research are both needed and very time sensitive.
In a typical computer program, the programmer must provide all the instructions and rules to the computer before it will run -- in this case, we would need to teach a computer program all of the rules to tell a "c" from an "e," and so on. However, there are many different ways of writing "c" and "e," which makes it very difficult to tell the computer program all of the rules! With machine learning (a type of artificial intelligence), a computer program is able to "learn" these rules during a training process.
In this project, our goal is to have a computer program "learn" how to read Julian Steyermark's handwriting, and then use this program to transcribe other batches of his handwritten collections and notes. When successful, machine learning is able to work much more quickly than human staff & volunteers, and can transcribe thousands of words in an hour!
No! Once the computer program is able to read handwriting, this will create a large amount of new data -- we'll need community scientists like you to help us understand these new data.
The most advanced computers may process and analyze gigabytes of data, but they still rely on humans to understand and contextualize this information into knowledge. It's not a matter of "human vs. machine," but rather humans and computers working together!