Making Mission Data Available in the Cloud

STScI Newsletter

2019 / Volume 36 / Issue 03

A. Smith (arfon[at]stsci.edu), I. Momcheva, and J. Peek

Over the past 18 months, the Data Management Division (the home for the engineering and science teams at MAST), together with the Data Science Mission Office, have been working on making it possible for the astronomical community to access mission data in the cloud.

As part of the AWS Public Dataset Program, data from Hubble (ACS, COS, STIS, WFC3, & WFPC2), TESS (calibrated and uncalibrated full frame images, two-minute cadence target pixel and light curve files), and Kepler (light curves, target pixel files and full frame images) are now available next to the vast cloud computing resources of Amazon Web Services.

Foundational steps in enabling data science with astronomical data

Many of the most promising data science techniques, such as Deep Learning, require simultaneous access to large quantities of data, large amounts of compute (both CPU and GPU), and programmatic interfaces (APIs) for interacting with the data. Our work over the past 18 months has been about addressing all three of these points:

Making the data 'highly available' and next to substantial computational resources: By staging public data for Hubble, TESS, and Kepler in the cloud, we're making it possible for anyone in the community to access hundreds of terabytes of mission data in a high-performance computational environment. Whereas previously, astronomers wishing to analyze large volumes of mission data would have to download the data locally and place it somewhere with sufficient storage and compute, now any astronomer can rent a supercomputer by the hour (https://medium.com/descarteslabs-team/thunder-from-the-cloud-40-000-cores-running-in-concert-on-aws-bf1610679978) for their science.

Make sure there's a good programmatic interface for accessing the data: Bulk access to data is only useful if it's possible to programmatically script your analyses. The team at MAST have developed an open source module as part of the Astroquery project which is 'cloud aware' and can serve data from the cloud on demand (https://astroquery.readthedocs.io/en/latest/mast/mast.html).

Make it possible for people to secure resources to work with the data: While cloud computing makes it possible to access a variety of computational resources in a flexible, pay-as-you-go model, these services cost real money. That's why as of Cycle 26, there has been a new category of HST archival proposal "Legacy Archival Cloud Computation Studies" designed to support astronomers wishing to make use of the Hubble data in the cloud.

Additional resources

More detailed technical blog post showing how to work with Hubble data: https://mast-labs.stsci.io/2018/06/hst-public-data-on-aws
Using AWS Lambda for massively parallel analysis of data in the cloud: https://mast-labs.stsci.io/2018/06/exploring-aws-lambda-with-hst-public-data
Examples of how to use the astroquery.mast module in these example Jupyter notebooks: https://github.com/spacetelescope/notebooks
Amazon Web Services Public Dataset Program: https://aws.amazon.com/opendata/public-datasets/

Wrapping up

Whether you’re looking to process large volumes of mission data, or train some kind of deep learning algorithm to analyze Hubble images or hunt for exoplanets in TESS and Kepler data, we think that making these datasets available in the cloud is a first step in facilitating new, more sophisticated analyses of archival data.

If you'd like to provide us any feedback on this new initiative, please email archive@stsci.edu; we'd love to hear from you!

This site is protected by reCAPTCHA and the Google

Contact our News Team

Ask the News Team

Contact our Outreach Office

Ask the Outreach Office

Making Mission Data Available in the Cloud

About this Article

Foundational steps in enabling data science with astronomical data

Additional resources

Wrapping up

Share This Page

Pre-footer

Inbox Astronomy

Contact our News Team

Contact our Outreach Office