Skip to main content

The Roman Science Operations Center Data Management System

STScI Newsletter
2022 / Volume 39 / Issue 01

About this Article

H. Ferguson (ferguson[at]stsci.edu), M. Sosey (sosey[at]stsci.edu), G. Snyder (gsnyder[at]stsci.edu), and E. Kolatch (ekolatch[at]stsci.edu)

Roman's Data

The Nancy Grace Roman Space Telescope's Science Operations Center (SOC) at the Space Telescope Science Institute (STScI) is responsible for many of the data processing steps for the Wide Field Instrument (WFI), providing a data-analysis platform for astronomers to explore the data and carry out basic data analysis, maintaining the mission data archive (for the WFI and coronagraph instruments) and serving all the mission data through the Mikulski Archive for Space Telescopes (MAST). The Data Management System (DMS) leverages many aspects of the James Webb Space Telescope (JWST) DMS and its components, with Roman-specific modifications as needed to address the needs of the mission. From the Roman user's perspective, the following are some of the highlights:

  • Roman science data will publicly available from MAST with no proprietary period.
  • Data will be processed as they arrive, with calibrated individual-exposure data products available within a few days of the observations.
  • There will be periodic releases of uniformly processed and co-added survey data. These are nominally six months after the relevant data are obtained. More detailed plans will be published once the survey strategies are better defined.
  • The SOC will produce source catalogs that will be useful for a wide array of science. These will have astrometry consistent with Gaia, will include multi-band PSF-matched photometry and photometric redshifts, and will have systematic uncertainties and completeness quantified by source-injection simulations.
  • The Romancal pipeline software for the WFI is open source and hosted on github.
  • The Roman Science Platform (RSP) will provide cloud-computing resources, with the pipeline and common software tools pre-installed for analysis and custom processing of the Roman data. This enables high bandwidth access to the processed data and data products (Levels 2, 3, and 4; see Table 1).

Roman WFI Data Flow

Responsibility for processing the Roman WFI data is shared between the SOC and the Science Support Center (SSC), at the Infrared Processing and Analysis Center (IPAC) in Pasadena, California, as illustrated in the diagram in Figure 1. The SOC is responsible for processing all WFI imaging and spectroscopy data to remove instrumental signatures. Additionally, the SOC will create deeper mosaics of dithered datasets along with source catalogs for all imaging exposures. The SSC is responsible for higher-level processing of all spectroscopic data, as well as survey specific processing for galactic bulge time domain data. The SOC Archive holdings are split between storage on premises at the Institute and cloud storage provided by Amazon Web Services (AWS). The interfaces to the data will make it unnecessary for users to know the storage locations. Users will be able to view and perform customized processing of WFI data with low network latency using the Roman Science Platform hosted on AWS.

This graphic shows the complex activity of data processing for the Roman Wide Field Instrument (WFI).  Downlinked science data is transmitted to the STScI Science Operations Center Data Management System, represented by a large, blue circle. Low-level processed data is sent to the IPAC Science Support Center, represented by a smaller orange circle.  Here it is turned into science-ready data and returned to the STScI Science Operations Center Data Management System.  This data management system is responsible for WFI exposure-level processing, WFI high-level processing, data analysis, and the archive.  Data and software flow to and from the STScI Science Operations Center Data Management System and the science community in the areas of data analysis and the archive.
Figure 1: The distributed Roman WFI data processing.

Levels of Data

The early steps of the science calibration pipeline remove instrumental signatures and produce calibrated images for both imaging and spectral exposures. The next step of processing combines scientifically relevant exposures of the same field to produce mosaics that are placed on a common grid. The pipeline will also construct source catalogs from detected objects in those Level 3 products. The defined levels of data products are outlined in the table below. All Roman science data products will use the Advanced Scientific Data Format (ASDF), already in use for JWST.

Table 1: Description of the data levels used for Roman WFI data
Data Level
Description
Comment

0

Science telemetry data

Raw data in original telemetry formats.

1

Reformatted individual exposures

Pixel information formatted into the shape of the detector accompanied by relevant metadata.

2

Calibrated individual Exposures

Corrected for instrument artifacts, with slope fitting, outlier rejection, astrometric registration, etc.

3

Resampled to a regular grid and combined

Geometric distortions are removed. Relevant images taken in the same filter are co-added.

4

Derived Data

Source catalogs and associated ancillary information.  

5

Community-Contributed Products

Available through MAST.

 

Catalogs and Tools

The Roman survey data will be much more homogenous than for Hubble and Webb. It is therefore practical and scientifically efficient for the SOC to produce a set of source catalogs. These are aimed at providing high-quality positions and photometry to serve a broad range of science objectives. Catalogs of the "static sky" will include PSF-matched photometry and photometric redshifts, as well as shape information on the detected sources. Time-domain studies will be facilitated with a difference-imaging pipeline as well as a forced-photometry catalog, where photometry has been performed at specified locations. There will be opportunities for community input on the algorithms and measurements for these catalogs as the details are further refined.

The SOC is also responsible for constructing ancillary science tools. These include the following:

  • A tool for producing simulated Level 1 data. This will include realistic noise and some of the most important instrument signatures.
  • A tool for producing simulated Level 2 and Level 3 data. This will include realistic PSFs, geometry and noise, but without the other instrument signatures. Among the uses will be source injection for quantifying uncertainties and completeness of the source catalogs.
  • A curated, and time dependent, library of high-quality empirical point-spread functions (PSFs).

The Roman Science Platform

The enormous data volume necessitates different ways of thinking about data analysis and availability. A typical astronomer's desktop computer will not have the capacity to store and analyze an entire Roman core community survey. The data volumes involved also exceed the capacity of most departmental servers for storage and computing. In this computing regime, it makes more sense to move the computing to the data rather than the data to the computing.

To accomplish this, the Roman Science Platform will be deployed and maintained in the commercial cloud at AWS, as hosted by the Institute. The data will be available at high bandwidth from AWS S3 storage. The amount of computation is easily scalable, so it can grow and shrink in response to community needs and available funding.

The RSP will use JupyterLab and JupyterHub to provide a familiar collaborative computing environment for the science community. These tools are already being implemented and explored with Webb science data products and the Time Series Integrated Knowledge Engine (TIKE) that is employed by MAST. The full suite of Roman science calibration pipeline and analysis tools will be installed and maintained on the platform. Users will be also able to install their own RSP compatible software and run it from a Unix command line, provided through the terminal that is available with JupyterLab, or from Jupyter notebooks.

Several different tiers of platform access are anticipated. Modest resources for data exploration will be available to anyone with a MAST account. Augmented CPU and storage allocations will be available for those with approved NASA Roman-related grants and there will also be a process for those without NASA grants to request augmented resource allocations. 

How to Get Involved

There is still time to influence critical decisions while we are developing the Roman DMS. The details of the higher-level science data processing (e.g., the source catalogs) and the ancillary software tools have not been finalized. For the past several years there has been a regular meeting of the Roman Science Investigation Teams and SOC personnel to discuss software, algorithms, and data processing. Scientists interested in helping to shape the Roman Data Management System as part of a joint Working Group between the Roman Science Centers and members of the broad astronomical community are encouraged to contact ferguson@stsci.edu.

Share This Page

This site is protected by reCAPTCHA and the Google

Contact our News Team 

Contact our Outreach Office