Research


A clumpy galaxy seen by SDSS, HSC and Euclid telescopes. You can see how the better resolving power of each subsequent telescope helps us see more and more detail about the star-forming clumps. (The bright object at the bottom right is a foreground star.)

Overview - new tools lead to new science

In this project, we are taking advantage of new tools - more powerful telescopes and machine learning - to more deeply investigate the enigmatic nature of giant star-forming clumps. As the name implies, we know these clumps are related to star formation, but how? Are the star formation processes that produce clumps in nearby galaxies the same as in the early Universe? Just measuring the numbers of clumpy galaxies, the locations of clumps within those galaxies and their properties like brightness, color or mass can begin to give us answers. We started our adventure with the original Galaxy Zoo: Clump Scout project using images from the Sloan Digital Sky Survey (SDSS). This project led to a better understanding of how rare giant star-forming clumps are in the late stages of galaxy evolution. But most of these clumps showed up as giant blobs in the SDSS images as can be seen in the above figure. Now, new telescopes such as the Euclid space telescope can give us more detailed images allowing us to see if there is more structure to giant clumps - so we need to get a revised count of clumps along with their locations in the host galaxies and their properties. But there's a catch - Euclid will end up with far too many potential clumpy galaxies for even our amazing volunteers to go through. So we need a machine to help. When we started with the original Clump Scout project back in 2018, using the labels from that project to train a machine learning algorithm had not been the goal. But once confronted with much bigger galaxy survey projects like Euclid we realized we could use the SDSS labels you provided to do the science we wanted to do with that data set, while also preparing for the future. So here we are! Our machine has made predictions about what it thinks are clumps, stars, or other objects in a Euclid image and we need your help to correct those machine labels. We'll use your corrections to improve the machine. But don't worry, there's so much science to do that we'll always need your help in some way. Read on for more detail on clumpy galaxies, the Euclid telescope, the machine learning strategy and all of the science questions we hope your efforts will help us answer!

Clumpy Galaxies

In the nearby Universe, most galaxies can be categorized as spirals, ellipticals, or irregular systems. (If you are new to galaxies and their common appearances or types, NASA has a helpful website on Galaxy Types.) However, in the more distant (and younger) Universe, galaxy shapes were more diverse and galaxies with "clumpy" structures dominate the images we see from that time in the evolution of the Universe. The figure below shows eight images of these distant clumpy galaxies that were taken by the Hubble Space Telescope (HST) and classified by Galaxy Zoo volunteers during the Galaxy Zoo: Hubble (GZH) project.

Galaxies of the past went through a period of intense star formation which is thought to be largely responsible for these unusual shapes. This period of star formation has since died off and most galaxies today have settled down into the types we see in the local Universe: spiral, elliptical, lenticular and irregular.

Galaxy Zoo: Clump Scout

We launched the first Clump Scout project in 2019 with the goal of building the largest catalog at the time of nearby clumpy galaxies so we could better assess their population in comparison to clumpy galaxies observed in the earlier Universe.
The results of the first Galaxy Zoo: Clump Scout project were based on your annotations of images from the Sloan Digital Sky Survey (SDSS) Legacy Survey. These results indicated that clumpy galaxies are far rarer in the nearby (and older) Universe than in distant (and younger) reaches of the Universe. Less than 5% of star forming galaxies in the nearby Universe exhibit obvious clumps compared to 50% of star-forming galaxies earlier in the history of the Universe. This rarity of nearby clumpy galaxies is likely due to the drop in star formation as the Universe evolved to its current state. The next image shows eight SDSS images of those rare, nearby clumpy galaxies.

These nearby galaxies look remarkably similar to the galaxies seen by the Hubble Space Telescope (HST) in the distant Universe. That's because, coincidentally, the physical sizes of structures that can be resolved by HST in distant galaxies are roughly the same as those that can be resolved by SDSS in nearby galaxies. To put it another way, distant galaxies appear equally sharp (or blurry) in HST images as nearby galaxies do in SDSS images.

Galaxy Zoo: Clump Scout II

Now, Galaxy Zoo: Clump Scout is back, with higher resolution images and new questions to answer! Thanks to much higher resolution data being collected by Euclid, we have a chance to revisit the conclusions made by the original Clump Scout project and more confidently characterize the distribution of clumpy galaxies in the nearby universe. Compared to our original SDSS results, we'll even be able to obtain a sample of clumpy galaxies a bit farther away in space (which also means further back in time in the history of the Universe). With your help, we will have a more detailed sample of clumps that we hope will enable us to better characterize the relationship between clumps and different mechanisms for star formation at play.

In this project you will be inspecting images taken by the Visible Imager Instrument (VIS) and Near-infrared Spectrometer and Photometer (NISP) instruments mounted on the Euclid space telescope. Euclid images are much sharper and more detailed than SDSS images which makes identifying clumps in galaxies much easier. This also makes it possible to see galaxies that are further away in more detail. This is because Euclid is a spacecraft-mounted telescope and operates outside of the Earth’s atmosphere whereas SDSS is ground based. Therefore, Euclid's cameras are not affected by weather and its images are not distorted by light passing through the Earth’s atmosphere. Additionally, the VIS camera uses 36 high-resolution CCDs (charge-coupled devices - used to convert light into electrical signals) with about 600 megapixels in total whereas SDSS’s imaging camera only has about 120 megapixels across its 30 CCDs. What appear like bright symmetrical blobs in the SDSS images are now resolved into more detailed and complicated shapes that tell us about the internal substructure of the clumps themselves. Measuring this substructure and comparing it with the predictions of high-resolution galaxy simulations can give us clues about how the clumps formed and how they affect their host galaxies and the intergalactic space around them. The next image (the same one as at the top of the page) shows a comparison of how SDSS (left), HSC (centre) and Euclid (right) see the same clumpy galaxy. The galaxy is barely visible in SDSS images, but HSC sees it clearly and some of the clumps have distinctly asymmetric shapes. In the Euclid image, it is clear that what seem to be single bright blobs in SDSS and HSC images are actually groups of smaller star-forming regions with complicated substructure.

Accelerating Clump Detection with Deep Learning

We need your help to improve the data quality of our machine's predictions. The classifications you provide in this project will help us to fine-tune our machine learning model to analyze the images from the next Euclid data releases when they arrive.
The Euclid space telescope was designed as a survey instrument. Eventually, when its survey is complete the Euclid telescope will have taken images of over 1 billion galaxies and about 250 million of those will be well enough resolved to detect any clumps they might contain. The only realistic way to find the clumps in such a large number of galaxies is by using machine learning in tandem with human eyes, and one of the main goals of this project is to help build the machine learning models we need. We have already made some progress. Using the results from the first Galaxy Zoo: Clump Scout project, we trained a deep learning (sometimes called AI) model that can detect and label clumps within galaxies in SDSS images. Then, using a small number of galaxies that had been seen by SDSS and Euclid, we trained another model to draw outlines (boxes) around clumps in the Euclid Q1 data - the first public release of Euclid images from 2025. While our Euclid model does pretty well, it isn't perfect. Sometimes it identifies stars in our own galaxy (so-called foreground stars) as clumps in a distant galaxy, sometimes it misses real clumps, and sometimes the regions it identifies as clumps aren't accurate enough. We need your help to correct and refine the model's predictions so we can build the best human-machine partnership to get through the onslaught of new data coming in from Euclid over the next several years.

A word or two about telling the difference between clumps and foreground stars and artifacts

As you start in on scouting for clumps - and in particular trying to figure out if the machine got a clump confused with a foreground star or vice versa - you might find it difficult at times to tell the difference. There's a story here - and that is that the Euclid team wanted to remove from the images foreground stars from our own galaxy as much as they could. So that is what the Euclid data processing pipeline does (if you want a bit more technical detail you can read this Euclid blog post. While it worked for many many many of the stars, unfortunately that process wasn't perfect and this can make our collective task much harder - sometimes the way the star was positioned on the camera means that the removal process left a blob where the star was...a blob that can often look a lot like a clump (left image below). So what to do? Well, as always just do your best. We'll combine everyone's votes on star versus clump and take the wisdom of the crowd. Another thing to ease your mind is when the marks are all collected, we can cross match with the Euclid star catalog - this won't get all the possible stars but should hopefully get most. If you want more detail and examples on the differences between stars and clumps, check out the Field Guide.

A much easier issue to spot are image artifacts left over from removing contaminants such as cosmic ray hits. Many times these show up as obvious brownish streaks across the image. But sometimes it can look like a dark clump where a chunk has been taken out of the image (right image above).

Unanswered questions

As we build better and better catalogs of clumps and their host galaxies, we can begin to tackle some of the big questions raised by these enigmatic objects. We still don't understand the details of how clumps form. Is it due to mergers between galaxies, or due to instabilities within the galaxies themselves that cause large clouds of gas to collapse and rapidly form stars? Is it a mixture of both mechanisms? Simulations of galaxy formation and evolution suggest that different clump formation pathways lead to clumps with different substructural properties. We also don't fully understand what happens to the clumps over time. Some models predict that these large clumps eventually migrate to the center of the galaxy to form the galactic bulge. Other models suggest that these clumps are short lived and dissipate before they can migrate to the center. Determining which scenario is correct could have dramatic implications about the formation and evolution of disk galaxies. In this project you will help us identify clumps in many more galaxies than the original Galaxy Zoo: Clump Scout. You will be detecting fainter and more distant clumpy galaxies that existed earlier in the Universe's history. Finally you will be helping to prepare for the fantastic images that will be delivered from Euclid over the coming decade. These data will help to answer the questions we have now, but we'd be amazed if they didn't introduce many new ones.

As always, we couldn't do it without you!


This work is funded in part by NASA-CSSFP grant No. 80NSSC24K1277 (learn more about NASA Citizen Science Projects here: science.nasa.gov/citizenscience) and "ELSA: Euclid Legacy Science Advanced analysis tools" (Grant Agreement no. 101135203), along with NSF Award IIS 2006894. This project makes use of Q1 data from European Space Agency's Euclid mission; learn more here: cosmos.esa.int/web/euclid/euclid-q1-data-release