Accelerating Research and Collaboration

A group of five galaxies that appear close to each other in the sky: two in the middle, one toward the top, one to the upper left, and one toward the bottom. Four of the five appear to be touching. One is somewhat separated. In the image, the galaxies are large relative to the hundreds of much smaller (more distant) galaxies in the background. All five galaxies have bright white cores. Each has a slightly different size, shape, structure, and coloring. Scattered across the image, in front of the galaxies are number of foreground stars with diffraction spikes: bright white points, each with eight bright lines radiating out from the center.

Our staff helped researchers tackle machine learning, released Webb’s first data, and celebrated 25 years of our astronomical archive.

About This Article

Circular logo shows a ship’s mast at right in light blue, over a darker teal background with a few white points representing stars. Large white text, 25 years, appears at left. Along the edge of the logo, from bottom left to top right is text, Barbara A. Mikulski Archive for Space Telescopes. At the bottom, the text reads 1997, MAST, 2022.
Twenty-five years ago, NASA made STScI the archive center for data from space-based missions from the visible to the infrared. The Barbara A. Mikulski Archive for Space Telescopes (MAST) has grown to encompass dozens of missions and datasets, from cutting-edge operating observatories to archival missions. Today, MAST holds thousands of terabytes of some of the world’s most precious astronomical data. MAST serves them to researchers and students around the world, allowing everyone to participate in science and learn about their universe.

Consider staring down a long list of detailed astronomical data. There are thousands of lines, each with a slew of specs. Astronomers have long written code to find patterns in the data, but the work can be time intensive, especially if they’re seeking to answer a new question. What if machines could do more of the “thinking”? Hello Universe, a new project created and released by astronomers who support the institute’s Barbara A. Mikulski Archive for Space Telescopes (MAST), invites newcomers and intermediate users alike to use—and later cast off their “training wheels” by following tutorials with step-by-step examples of how to apply machine learning techniques with curated astronomical data sets.

Researchers can use the analysis techniques to classify stellar flares, determine distances to galaxies, and identify simulated galaxy mergers. By taking the data for a “drive,” they can better visualize how machines identify patterns within complex data sets, and make sense of how they are built and applied. Hello Universe builds off of data analysis, which is familiar to researchers, and allows them to examine each step, ultimately deepening their understanding of machine learning and all the opportunities it provides. And, as they become more comfortable with the process, researchers can drop in their own data and questions, follow the method, find new patterns, and produce new science products that may benefit others.

Marking a Major Anniversary

Twenty twenty-two was the 25th anniversary of MAST. Our archive hosts data from more than 20 astronomical missions, active and retired, space- and ground-based. Our newest set of data is from the retired Spectroscopy of Plasma Evolution from Astrophysical Radiation (SPEAR) mission, also known as the Far-Ultraviolet Imaging Spectrograph (FIMS). Its ultraviolet data complement those already in MAST. Our staff members had to actively seek out its data, which were lingering on old servers. This mission surveyed large swaths of the sky in far ultraviolet light, detailing the hot, glowing gas in our Milky Way galaxy—and now its data are accessible worldwide.

Tending to our astronomical history also means caring for existing mission data. For example, this year MAST improved the Hubble Space Telescope’s source catalog with more precise astrometry (positions and motions) and photometry (brightnesses) of stars from the Gaia mission and the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) catalog.

New data are equally important and, in this case, cause for worldwide celebration! The James Webb Space Telescope’s first images and data, along with its extensive commissioning files were made available in MAST in mid-July, and the astronomical community came running to download them. Approximately 260 terabytes were served in three weeks, roughly five times that of any other mission data MAST has served to date, to people in more than 100 countries.

Blue text, Hello Universe, takes up the majority of the image. A blue space telescope illustration that looks like Hubble appears next to Hello. The background is black with small dots that resemble stars.
Hello Universe helps researchers learn how to add machine learning techniques to their analyses to examine large astronomical data sets. The accompanying notebooks for each of the four data sets lay out clear, visual instructions to guide researchers. Plus, the project’s questions and data sets can be swapped as users become more comfortable with machine learning and want to ask their own questions or inject their own data. Get going.

The process of adding Webb’s data to the archive and releasing it to the world was incredibly smooth. MAST’s servers responded well to the onslaught of community requests. Webb’s data sets are larger than previous missions, and the new flagship mission’s data are in great demand. Staff in MAST were also responsible for ensuring Webb’s data are successfully processed, and, in some cases, combined to provide composite data products. They also developed and shared examples of custom scripts to search for and retrieve data. Researchers were supported with a range of webinars, videos, and upgraded data analysis tools as they began their analyses.

Looking Further into the Future

As the volume of astronomical data grows, efficiency in the systems researchers use for processing can decrease. Astronomers and engineers in MAST have long thought about how to navigate the challenges of increasingly large data sets, and recently designed and began testing Greenplum, a scalable distributed database system with 40 data nodes. Each data node has 256 gigabytes of RAM and 20 CPU cores that can process several hundred billion rows of data in only eight to 10 minutes. For context, a large query used to be limited to 50,000 rows. If researchers wanted to go larger, they had to manually select chunks of data to run on several servers over several days. Greenplum eliminates that labor and time. The software is designed to support research with the upcoming Nancy Grace Roman Space Telescope, which may translate to 3 trillion rows. Greenplum’s premiere is planned for the American Astronomical Society’s winter meeting in January 2023.

Whether the data are new or old, massive or lightweight, many possibilities await researchers. Through regular data refinement, training, and collaboration, staff at MAST help speed up research that leads to astronomical discovery.

Sixteen equally sized boxes have very bright, light green centers, which becomes darker green, and in some cases, blue around their edges. The contents the boxes varies, each looking like splatter paintings, but represent simulated interacting galaxies.
You can now use a machine to classify which galaxies are merging and which aren’t! The image above shows a grid of simulated galaxy mergers from Hello Universe laid out on a logarithmic color scale. These simulated galaxies are part of a sample data set that allow users to master machine learning for astronomical discovery. Take machine learning for a spin.

 

Share This Page