
Our staff helped researchers tackle machine learning, released Webb’s first data, and celebrated 25 years of our astronomical archive.
About This Article
Consider staring down a long list of detailed astronomical data. There are thousands of lines, each with a slew of specs. Astronomers have long written code to find patterns in the data, but the work can be time intensive, especially if they’re seeking to answer a new question. What if machines could do more of the “thinking”? Hello Universe, a new project created and released by astronomers who support the institute’s Barbara A. Mikulski Archive for Space Telescopes (MAST), invites newcomers and intermediate users alike to use—and later cast off their “training wheels” by following tutorials with step-by-step examples of how to apply machine learning techniques with curated astronomical data sets.
Researchers can use the analysis techniques to classify stellar flares, determine distances to galaxies, and identify simulated galaxy mergers. By taking the data for a “drive,” they can better visualize how machines identify patterns within complex data sets, and make sense of how they are built and applied. Hello Universe builds off of data analysis, which is familiar to researchers, and allows them to examine each step, ultimately deepening their understanding of machine learning and all the opportunities it provides. And, as they become more comfortable with the process, researchers can drop in their own data and questions, follow the method, find new patterns, and produce new science products that may benefit others.
Marking a Major Anniversary
Twenty twenty-two was the 25th anniversary of MAST. Our archive hosts data from more than 20 astronomical missions, active and retired, space- and ground-based. Our newest set of data is from the retired Spectroscopy of Plasma Evolution from Astrophysical Radiation (SPEAR) mission, also known as the Far-Ultraviolet Imaging Spectrograph (FIMS). Its ultraviolet data complement those already in MAST. Our staff members had to actively seek out its data, which were lingering on old servers. This mission surveyed large swaths of the sky in far ultraviolet light, detailing the hot, glowing gas in our Milky Way galaxy—and now its data are accessible worldwide.
Tending to our astronomical history also means caring for existing mission data. For example, this year MAST improved the Hubble Space Telescope’s source catalog with more precise astrometry (positions and motions) and photometry (brightnesses) of stars from the Gaia mission and the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) catalog.
New data are equally important and, in this case, cause for worldwide celebration! The James Webb Space Telescope’s first images and data, along with its extensive commissioning files were made available in MAST in mid-July, and the astronomical community came running to download them. Approximately 260 terabytes were served in three weeks, roughly five times that of any other mission data MAST has served to date, to people in more than 100 countries.
The process of adding Webb’s data to the archive and releasing it to the world was incredibly smooth. MAST’s servers responded well to the onslaught of community requests. Webb’s data sets are larger than previous missions, and the new flagship mission’s data are in great demand. Staff in MAST were also responsible for ensuring Webb’s data are successfully processed, and, in some cases, combined to provide composite data products. They also developed and shared examples of custom scripts to search for and retrieve data. Researchers were supported with a range of webinars, videos, and upgraded data analysis tools as they began their analyses.
Looking Further into the Future
As the volume of astronomical data grows, efficiency in the systems researchers use for processing can decrease. Astronomers and engineers in MAST have long thought about how to navigate the challenges of increasingly large data sets, and recently designed and began testing Greenplum, a scalable distributed database system with 40 data nodes. Each data node has 256 gigabytes of RAM and 20 CPU cores that can process several hundred billion rows of data in only eight to 10 minutes. For context, a large query used to be limited to 50,000 rows. If researchers wanted to go larger, they had to manually select chunks of data to run on several servers over several days. Greenplum eliminates that labor and time. The software is designed to support research with the upcoming Nancy Grace Roman Space Telescope, which may translate to 3 trillion rows. Greenplum’s premiere is planned for the American Astronomical Society’s winter meeting in January 2023.
Whether the data are new or old, massive or lightweight, many possibilities await researchers. Through regular data refinement, training, and collaboration, staff at MAST help speed up research that leads to astronomical discovery.