Finished! Looks like this project is out of data at the moment!
Thank you for your efforts! We've completed our project! To browse other active projects that still need your classifications, check out zooniverse.org/projects
NEWS: The AI4Mars results have been accepted for publication! Check back here soon for a link to the publicly available dataset.
The number of volunteers and classifications for this project have greatly exceeded our expectations. We appreciate all your hard work and would like to give you a glimpse into our progress in developing AI models for terrain classification. We are training an AI (called SPOC) to do deep learning for semantic image segmentation. What this means is that the AI model, once trained, will look at each pixel and predict which category it falls into in addition to how confident it is about that prediction.
So how are we doing numbers-wise? Quite well! You can find the results calculated by comparing ground truth labels to AI model predictions below. Ground truth labels came from a fine-grained validation set provided by NASA JPL experts, including those who plan and execute rover missions on Mars!
The results are primarily represented as confusion matrices below. The matrices provide detailed results on performance for SPOCv1 and SPOCv2 for each classification category. Casual observers may find it easiest to focus on the total accuracy and the diagonals (True Positve Rate) in the matrices below. You can clearly see that the labeling effort on Zooniverse has had a huge positive impact on model accuracy!
An earlier version of our AI model (SPOCv1) was trained a few years ago with about 1000 labeled images. The categories for each label were a little different, but we have been able to remap them to the categories presently used (except for Big Rock, for which there was no corollary) to compare the performance.
SPOCv2 is the model we are presently training using all the data (over 100K labels!) gathered here on Zooniverse. The model still struggles with big rock due to its rarity, but we are happy with its performance on the other classes (while of course always looking to improve). There are plans already underway for handling big rock using other methods (e.g. geometric determination of rock size using stereo image depth).
Below are a few examples of the AI making predictions on unseen images using the data gathered by all of you so far. To help visualize what the AI is telling us, each category is assigned a color and overlaid onto the original image.
Note regarding black regions: we attempt to mask out regions that are part of the rover itself or regions we know are far away since these are regions are not trained for and are not useful aids for the traversability estimation we are targeting.
Notice the example below is doing pretty well on distant big rock even though it only makes up about 3% of all data!
Here's a good example of soil, sand, and bedrock in the same picture, though one might argue it is a little overzealous about bedrock.
As we receive more data from volunteers, the network is better able to understand each picture. While we are making steady improvements, the AI does not always get it quite right. In the example below, you can see that it gets most of the classification right, but it was a little too conservative on its sand prediction in the middle of the image.