About this ArticleL. Quick (quick[at]stsci.edu], P. Greenfield (perry[at]stsci.edu), and G. Snyder (gsnyder[at]stsci.edu)
Supporting science through quality engineering practices has always been a standing achievement at STScI. This is evident through years of successful science, development, and support of Hubble, Webb, Roman, MAST, and other missions. As leaders in the astronomical community, we have a responsibility to not only keep up with the status quo, but also push the boundaries of technology by innovating new products and processes.
The Data Management Division (DMD) is an engineering and operations division at STScI, comprised of talented engineers and scientists. DMD is responsible for the processing, storage, analysis, and distribution services in support of science data for the multitude of missions whose data are stored in the Mikulski Archive for Space Telescopes (MAST). Our staff are improving the way science is performed by finding more efficient and creative ways it can be done. Two notable areas of innovative technical progress are in the domains of science file generation, Advanced Scientific Data Format (ASDF) and Jupyter cloud development, Timeseries Integrated Knowledge Engine (TIKE). This article will describe these two areas of significant product development.
The Advanced Scientific Data Format
We have developed a new data format that is being used for JWST data and will be used as the primary data format for the Nancy Grace Roman Space Telescope. JWST pipelines will work with either FITS or Advanced Scientific Data Format (ASDF) versions of the data. It was developed to address the significant limitations that the FITS format was placing on HST pipelines. In particular, it has allowed addressing World Coordinate issues for raw data in a far more comprehensive way than is possible with FITS. It retains the best aspects of FITS in that its metadata is transparent and readable, and it is designed to be retained in archival ASDF. It also incorporates explicit hierarchical structure, removes limits on keyword name lengths and values, and enables storing data in either binary or text format. Content for specific data file standards, such as JWST products, can be prescribed by schemas that are used to validate the files. While developed for astronomical use, the basic format is actually generic for most science and engineering data.
The format allows representations of analytical and empirical functions to be stored, as well as complex expressions of such functions for multiple dimensions. This feature is the basis of the Generalized World Coordinate System (GWCS) used for JWST and Roman data files. In fact, all JWST FITS files store their most accurate WCS information as ASDF extensions (a good approximation to the GWCS is stored as FITS WCS information, for modes that FITS is able to handle). This capability allows the saving of most Astropy models without having to write any special Input/Output code. The format provides indicators to libraries that parts of the content should be turned into special software objects (e.g., a GWCS object), as well as means for different parts of the file to avoid copies by referring to a single instance of a data set. ASDF files may be streamed for reading and writing, and have features making use on cloud systems more efficient and practical. A mechanism is provided for developing local standards and library extensions beyond that provided in the base library, along with versioning for the standard, library, and extensions.
While the format is language-independent, currently the implementation is in Python. It is also being used by the Daniel K. Inouye Solar Telescope (DKIST), the Vera C. Rubin Observatory (previously known as LSST) for WCS interchange, and by some outside of astronomy.
The Timeseries Integrated Knowledge Engine
In order to make new discoveries exploiting huge swaths of large surveys, astronomers must either move big datasets across the internet to their computers, or deploy remote computing that is located near the data. To enhance both modes of science analysis, the Institute hosts data from several NASA missions in the Amazon Web Services (AWS) Open Data Program, where files are free to access with no AWS account required.
With TIKE, we now offer a user-friendly, browser-based computing environment for exploiting these cloud-hosted datasets, for both beginners and experts. TIKE provides each user with a JupyterLab server running in the same region as the MAST data in AWS, which results in high throughput and removes the need to transfer files across the internet to their computers before engaging with them. To teach the community about this opportunity, TIKE includes introductory documentation, notebooks, and tutorials on how to access MAST data in AWS. With an initial focus on analysis of timeseries data from the Kepler, K2, and Transiting Exoplanet Survey Satellite (TESS) missions, TIKE provides a Python environment pre-configured with 20+ community software packages.
To get started, users can navigate to TIKE and log in with a MyST account. TIKE uses a prototype architecture for increasing the accessibility of data archived by MAST at STScI, offering a permanent, secure, general computing platform for the MAST user community. TIKE is a predecessor and pathfinder for a powerful science computing platform that will support users in their analysis of data from the upcoming Roman Space Telescope.