Preparing for Roman as a Big Data Survey Mission

STScI Newsletter
2024 / Volume 41 / Issue 02

About this Article



The Nancy Grace Roman Space Telescope is currently on schedule for a late 2026 launch. The first Call for Proposals will be issued approximately one year before launch. This means that approximately one year from now the community will have their first opportunity to propose for funding for analysis of the Roman Space Telescope’s data and to obtain Roman observations. There are many existing and upcoming opportunities to learn about and engage with Roman in preparation for the first call.

Roman’s Wide Field Instrument (WFI) will provide Hubble-like resolution and sensitivity over a field of view ~200 times that of Hubble in the near infrared, with both imaging and slitless spectroscopy capabilities. The field of view is only part of the equation, however: due to Roman’s fast slew and settle times and its orbit around L2, it will be able to survey the sky more than 1000 times faster than achievable with Hubble. Community-defined surveys, including Roman’s three Core Community Surveys and a Galactic Plane General Astrophysics Survey, will utilize a large majority of Roman’s observing time during the first five years. Approximately 25% of the observing time in the first five years will be devoted to General Astrophysics Surveys, for which the community can propose. All of Roman’s data, including community- and PI-defined WFI surveys, will be made publicly available with no proprietary period.

Colorful cartoon showing the size disparity with Hubble at 2.7 gigabytes per day, JWSTat 58 gigabytes per day, Roman at 1,375 gigabytes per day
A comparison of the average data volume produced each day from Roman (green), Hubble (red), and Webb (blue). Roman will deliver unprecedentedly large high-resolution data sets, with WFI observations generating approximately 30 PB of data in the telescope’s first five years of operation.

Unique Considerations for Roman Proposals

For most users, interacting with data from the Roman Space Telescope will differ significantly from engaging with data from NASA’s flagship missions like Hubble, Webb, Spitzer, and Chandra, whose observing schedules and resulting data archives are primarily composed of PI-led observing programs. Due to Roman’s community-defined surveys and fully public data sets, the proportion of approved programs focused on analyzing Roman data, compared to those securing new Roman observations, will be much higher than Hubble, Webb, Spitzer, and Chandra users are accustomed to.  

Most important, as a space-based mission designed for executing large surveys Roman will produce unprecedentedly large high-resolution data sets. WFI observations are estimated to generate approximately 30 PB of data during Roman’s first five years. Even science teams interested in analyzing only a few Roman WFI fields will encounter a data volume larger than that of a Hubble Treasury Program.

Thus, in addition to becoming familiar with IPAC's Roman proposal submission system and Roman's proposal policies, and traditional proposal tools like Roman’s exposure time calculator (ETC), the Roman implementation of the Astronomer’s Proposal Tool (APT), and Roman image simulators, all potential users of Roman data will want to familiarize themselves with the resources and data products that will be available to them. These include the cloud-based science platform being developed to enable analysis of Roman’s large data volumes by large, distributed science teams, simulated data sets, the data processing pipelines and products that will be delivered by Roman’s Science Centers, and the pipelines and high level data products that will be delivered by Roman’s community Project Infrastructure Teams (described below). Familiarity with Roman data, data access, and analysis tools will not only lay the groundwork for properly scoped proposals to analyze Roman data, it will prepare the science community to maximize the science from Roman data.

Roman’s Cloud-Based Science Platform

STScI is developing a science ecosystem to provide a cloud-based computing space for the science community to explore, access, and analyze Roman’s petabyte-scale data. Roman’s science platform will come preconfigured with software needed to perform scientific analyses, simulate Roman data, and modify and re-run WFI imaging data processing. It will also enable scientists to bring their own software to Roman’s large data sets in an environment with dedicated compute resources. It will provide a unified user experience that enables collaboration, including the capability of sharing compute resources and scientific projects (e.g., data files and suites of Jupyter notebooks with analysis code) within user-specified teams.  

Roman’s science platform is being designed to lower the barrier to access Roman’s large data sets. Prior to launch, the platform will host simulation tools and simulated Roman data sets, along with associated tutorials in the form of educational Jupyter notebooks. The science platform will also provide workflows that lead users through various science use cases, such as WFI data simulation, WFI data analysis, and WFI observation planning. After launch, the science platform will additionally provide access to the Roman data archive, including both lower-level and higher-level data products and catalogs, as well as expanded and updated notebook tutorials and workflows.

The platform is being developed and tested with community input and feedback. A workshop focused on using the science platform will be held at the January 2025 AAS meeting in National Harbor, MD (see below) and broad community access to the platform is scheduled for June 2025.

Roman’s Distributed Data Processing Network

Roman is being developed and operated under a distributed model, with multiple project partners. These include NASA Goddard, the Science Operations Center (SOC) at STScI, and the Science Support Center (SSC) at IPAC, as well as industry and academic partners. STScI will host the Roman data archive and science platform, providing access to reduced data products developed and contributed by STScI, IPAC, and funded science teams.

STScI is responsible for lower-level processing of all WFI data, and general purpose higher-level processing of WFI imaging data. The lower-level processing includes transforming the packetized data received from the telescope into a data cube containing the uncalibrated ramp exposure data for each detector, and from those, producing calibrated rate images of the data for each detector. The higher-level processing of imaging data includes the creation of oversampled stacks of contiguous areas of the sky (in multiple epochs for time-domain surveys), as well as information extracted from the pixel-level data such as source catalogs. Roman WFI data products will be stored as Advanced Scientific Data Format (ASDF) files. 

The Roman WFI imaging data reduction pipeline is based on the JWST imaging pipelines, modified and extended for Roman data. Like the JWST pipeline, it is modular and open source, allowing users to modify individual processing steps for specific science cases. The high-level catalog products are being developed with community input, including by the WFI working groups (described below), and it is anticipated that given the scale of Roman data, many users will work primarily with the catalog products rather than directly with the images.

IPAC is responsible for higher level processing of all WFI spectroscopic data, including 2D and 1D calibrated spectra and spectroscopic catalogs including measurements such as redshifts and emission line strengths. IPAC is also responsible for the processing of the Galactic Bulge Time Domain Core Community Survey data in order to enable automatic detection and characterization of microlensing events and other transient phenomena in that survey’s high-cadence observations. Products will include oversampled, high resolution, deep stacks of images as well as light curves, a variability catalog, and a microlensing event catalog.

Project Infrastructure Teams, selected and funded via a NASA ROSES solicitation and comprised of several hundred science community members in total, are developing specialized data pipelines tailored towards specific science cases. Their charge includes producing the scientific infrastructure and specialized data products and catalogs needed for the community at large to realize Roman’s potential in cosmology, time domain, and exoplanet science. Their pipelines and data products will also be made publicly available to the science community. 

The Science Centers and the Project Infrastructure Teams are collaborating to maximize synergies, limit duplication of effort, and ensure Roman’s pipelines, data products, and high-level catalogs meet the needs and expectations of the science community.

Callout

Get Ready for Roman

Given the first call for proposals is only one year away, now is the time to learn about Roman and to engage with the Roman community. Science community members can anticipate multiple avenues for learning about Roman’s data processing and products and the science platform over the coming year. Highlights that are most relevant to preparing to use Roman data include:

Attend Workshops and Trainings

There will be multiple opportunities to learn more about Roman’s scientific capabilities and prepare for the first Call for Proposals over the coming year, with activities organized by STScI, IPAC, the Roman Project Science Office at NASA/GSFC, and funded science teams. The first set of opportunities will occur at the 245th American Astronomical Society meeting in January 2025. STScI-hosted activities will notably include two in-person workshops at the 245th American Astronomical Society meeting in January 2025: “Preparing for the Nancy Grace Roman Space Telescope: The New Cloud Science Platform” and “Python Data Analysis with the James Webb and Roman Space Telescopes.”  The latter will include training in working with ASDF files. Future large meetings, including the June 2025 AAS meeting, will provide additional opportunities to prepare for Roman, as will science meetings with content relevant to Roman.  These include the upcoming “Transients from Space Workshop” being held March 11–13, 2025 at STScI, and the next Roman science conference, “Cosmic Cartography with Roman: Advances in Galaxy Structures, Distributions, Dark Matter, and Dark Energy,’’ being held July 14–18, 2025 at STScI. Virtual demonstrations and trainings will also be provided over the coming year.  

Tune In

For the most regular updates on Roman, with approximately weekly updates targeted to scientists, head to the Roman Science Forum. STScI and IPAC both publish newsletters providing Roman updates and news with a focus on their areas of responsibility. STScI’s Roman newsletter can be subscribed to by checking “Roman Updates’’ in your myST account preferences (the preferred method), or by sending a blank email to roman_soc_news-subscribe-request@maillist.stsci.edu. See information on subscribing to the Science Support Center Roman mailing list. The Roman Project Science Office at Goddard Space Flight Center announces major project updates, as well as a regular virtual Roman Community Forum meeting providing updates on Roman mission status and plans, through its Roman mailing list, which can be subscribed to by sending an e-mail to roman-news-join@lists.nasa.gov. Major announcements are made using all of the above communication methods.

Explore Roman Documentation and Simulations

Roman simulation tools and simulated data sets are available for download and can be used to explore the scientific capabilities of the WFI. Initial documentation on Roman tools, data processing, and products is already available at the Roman User Documentation System (RDox) website. RDox will continue to be updated and expanded as the development of Roman’s pipelines, simulation tools, and science platform continues.

Actively Contribute

Development is ongoing for Roman’s calibration plans, data processing pipelines, simulations and tools, and science platform. There are a number of WFI working groups (including but not limited to the calibration, simulations, software, and time domain working groups) dedicated to discussing related scientific issues and advising the Roman project and partners, including STScI and IPAC. More information on the WFI working groups can be found on the Roman Science Forum, along with a working group sign-up form. Trainings and workshops will also provide opportunities to provide feedback about Roman software and tools. There is currently an opportunity to apply for funding for preparatory investigations via NASA ROSES, with notices of intent requested by January 17, 2025 and proposals due on March 6, 2025.

Request Help and Ask Questions

Both Science Centers have active Roman help desks. Questions related to the WFI design, the Roman archive, the WFI imaging mode and associated simulation tools, the science platform, or Roman’s ETC and APT, are answered by STScI’s Roman Help Desk. Questions related to Roman’s Coronograph instrument, the WFI spectroscopy mode, the Roman proposal process and grants, or Roman exoplanet science are answered by IPAC’s Roman Help Desk. If a request is made to one center that is better answered by the other, it will be transferred between the science centers.

End callout

Share This Page

This site is protected by reCAPTCHA and the Google

Contact our News Team 

Contact our Outreach Office