Snowflake ID

Finished! Looks like this project is out of data at the moment!

Research

Welcome!

Thanks for checking out Snowflake ID! We need your help classifying snowflakes from all over the world- Utah, Alaska, Colorado, New York, Antarctica, the Swiss Alps, and Korea. When it snows in these places, a special camera takes photos of the falling snowflakes from 3 angles. You can use those photos to tell us what type of snowflake was photographed. We'll take the classification you give us and use it to teach a computer how to classify snowflakes automatically. This is important work because snowflake categorization helps scientists learn more about climate change!

Click here to get started!

What are our research goals?

Our goals are to be able to automatically classify new snowflakes with a high degree of accuracy. Specifically,

Identify the snowflake category with an accuracy of at least 95%
Identify the amount of riming with an accuracy of at least 90% (see Field Guide or classification tutorial for more information on riming)
Identify melting snowflakes with an accuracy of at least 90%

Advanced Research Overview

Snowflake characteristics such as size, shape, and density are important for accurate forecasting of severe weather and global climate change. For example, the Intergovernmental Panel on Climate Change (IPCC) has stated that accurate representation of the size, shape, and density of snow and ice particles is one of the most important factors for accurately predicting the amount of global warming for a given amount of increased greenhouse gases. Likewise, the exact type of snowflakes falling during a winter storm is directly related to how quickly the snow piles up and affects civilian life.
New technology leading to the development of an automated, Multi-Angle Snowflake Camera (MASC) has resulted in millions of high-resolution snowflake photographs taken at Utah, Alaska, Colorado, New York, Antarctica, the Swiss Alps, and Korea.
Given the millions of snowflake images automatically captured at several locations around the world, machine learning can be used to create a model that can rapidly and automatically classify thousands of unidentified images at a time.

Scientific Methods

We will use machine learning for image classification. That means that using examples of identified images, a computer model can learn how to classify new, unidentified images. To "teach" the computer model how to classify accurately, we must provide it with many thousands of correctly identified examples.
3 main components of machine learning:
1. Data: information fed into the model (snowflake images, for example).
2. Features: the important qualities of the images that are most useful for learning how to classify (image brightness variability, for example).
3. Algorithm: the decision logic in the model, or how the problem is solved (typically mathematical/statistical functions).
The first step is to gather the best examples of correctly identified snowflake image. We will need thousands of example snowflakes, with up to three images for each snowflake.
Using these example images, we will "teach" or "train" the different models how to classify new, unidentified images.
We will test 4 different types of models to come up with the best possible snowflake image classification model:
1. Decision tree: basically, a series of yes/no questions.
2. Random forest: many decision trees averaged together.
3. Multinomial logistic regression: uses a mathematical function that is ideal for classifying multiple categories at once.
4. Deep convolutional neural network: using the architecture of the brain as inspiration, this type of model consists of many "layers" of smaller models that are all connected and learning together to come up with the best solution.

More Background: Automated Imaging and Classification of Snowflakes

The Multi-Angle Snowflake Camera (MASC) automatically images precipitation particles in free fall from three different angles at a resolution of 30.5 microns (about one 33rd of a millimeter!) and an exposure time of 40 microseconds (one 25,000th of a second!). Such high-resolution images of falling precipitation particles lend themselves to automated classification using machine learning techniques. A machine learning algorithm consists of a mathematical/statistical learning function that maps input variables to output variables such as precipitation classes. The function can be trained using labeled data (called "supervised" machine learning), unlabeled data (unsupervised), or some combination of the two (semi-supervised). For classification, one of a discrete set of possible output classes is chosen as most probable based on the mathematical function(s) constructed during training.

This video is a bit outdated but gives a nice introduction to the motivation behind the development of the MASC (initially called the Present Weather Imager (PWI)).

Check out this live feed of precipitation captured by the MASC located in Red Butte Canyon at the University of Utah in Salt Lake City, Utah.