Enabling Fast Bayesian Exoplanet Atmospheric Retrievals Using Amazon Web Services

M. Bourque (bourque[at]stsci.edu) and I. Momcheva (imomcheva[at]stsci.edu)

In the 25 years since the discovery of the first exoplanet, astronomers have detected over 4,000 exoplanet candidates and confirmed over 2,000 of them. Statistical studies based on these discoveries suggest that planets in our galaxy are common and many are likely to orbit their host stars in their habitable zones—the temperate region around a star where the heat received from the host star is just right to allow for the presence of liquid water. Current astronomical facilities (including Hubble) allow us to probe the atmospheres of exoplanets. Future facilities, such as JWST, will give us such capabilities for many more planets in even more detail. By observing the intensity of light transmitted through or reflected off of the atmosphere of the planet, we can infer the gas composition and the physical conditions on these distant worlds.

Complex models are required to translate the observations into physical properties. Early observations were successfully interpreted using self-consistent models which include the full chemistry and dynamics of the atmosphere, but such models require very detailed inputs such as chemical reaction chains, heat transport, and cloud formation, which we do not yet understand well. Atmospheric retrieval reverses the problem by trying to "retrieve" the atmospheric properties of an exoplanet from the observed spectrum. These properties can then provide insights into the atmospheric physics, chemical processes, and formation histories. In combination with Bayesian statistical inference, atmospheric retrieval has become a powerful tool for parameter estimation and model comparison. Bayesian atmospheric retrievals will be critical to the robust determination of exoplanetary atmospheric properties in the JWST era and beyond.

However, Bayesian atmospheric retrieval is computationally intensive and requires advanced programming skills which makes this technique inaccessible to some researchers. As an example, a typical retrieval of a JWST time-series observation takes two days on an eight-core laptop. Science servers with additional cores can effectively speed up the process, but purchasing, setting up, maintaining and upgrading such machines can be prohibitively expensive. This makes Bayesian atmospheric retrieval a perfect use case for commercial cloud-computing services.

Cloud-computing services provide on-demand computational resources without the up-front investment and maintenance needed for physical servers or clusters. These resources are secure, reliable, scalable, and cost-efficient. Cloud-computing resources are an allowed grant expense for HST and JWST, and many researchers are utilizing these platforms to speed-up their research.

As part of the Exoplanet Characterization Toolkit (ExoCTK) developed at STScI, we are addressing this challenge by providing a module that performs atmospheric retrievals on GPU-enabled Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instances. AWS provides a means to outsource this computational effort to machines that live in the cloud, and for low costs. With AWS, users can create a virtual machine (VM), perform atmospheric retrievals, and have the machine automatically shut down upon completion. Depending on the type of VM, typical costs can range from anywhere between ~$0.02/hour (i.e., a small, single-CPU Linux machine) to ~$3.00/hour (i.e., a heftier, multiple-CPU, GPU-enabled Linux machine). Tests have shown that fitting a JWST time series with GPU-enabled machines can take as little as four hours. Even on the high end of the AWS pricing, cloud computing can be significantly faster and cheeper than buying and setting up local resources.

ExoCTK is a Python-based, open-source and modular software library which provides a cohesive collection of tools necessary for the observation planning and characterization of exoplanets. This software is free, functional, well-documented, tested, and supported by STScI. A companion web application provides useful tools for observation planning, forward modeling, data reduction, limb darkening, light curve fitting, and retrievals. The atmospheric_retrievals subpackage of ExoCTK contains methods and tools for performing retrievals with PLATON (PLanetary Atmospheric Transmission for Observer Noobs; Zhang, M., et al. 2018). We have introduced a new method to the ExoCTK atmospheric retrievals subpackage which allows users to authenticate to Amazon Web Services and use its cloud resources. A fully worked-out example, using WASP-19b, is provided in the GitHub repository and guides users through the setup and installation via a Jupyter Notebook. The output of the atmospheric retrieval is a corner plot illustrating the results of the fit, a log file describing the execution of the software and a results file containing the fit parameters (Fig. 1).

ExoCTK models and Python code
Figure 1: Outputs from the atmospheric retrievals module of ExoCTK. Left: A corner plot describes the quality of the best-fit result. Top right: A log file captures the information about the execution of the code, including software, environment, start-and-stop time, retrieval information, results, and total computational time. Bottom right: The data file containing the best-fit results of the retrieval.

When using AWS we find that the improvement in computation time is substantial and comes at a minimal cost. We conducted trial runs of the atmospheric retrieval software with transmission spectra from four exoplanets, using two different AWS EC2 instances (one CPU-enabled, and one GPU-enabled) and the two retrieval algorithms supported by PLATON: Multinested Sampling ("multinest"), and Markov Chain Monte Carlo ("emcee"). Table 1 provides the specifications of these instances, and the average results from five trial runs are given in Figure 2. From these results, we see that the GPU-enabled instances result in 15% to 60% faster retrieval times compared to their CPU counterparts across each algorithm and exoplanet. Figure 3 displays the cost of operating various AWS EC2 instances as a function of time; we note that the longest retrieval times in our study (CPU: multinest, hd209458b, ~37 minutes; and GPU: emcee, hat‑p‑12b, ~28 minutes) result in a total cost of ~$0.08 and ~$3.15, respectively.

While PLATON is satisfactory for this proof-of-concept exploration, there are other popular atmospheric retrieval algorithms within the exoplanet community that we aim to support in the future. Perhaps the most notable of these is the Caltech Inverse Modeling and Retrieval Algorithms (CHIMERA). Compared to PLATON which approximates atmospheres on an isothermal model, CHIMERA uses a more complete, more involved atmospheric model, and thus is a more computationally expensive algorithm. Therefore, we expect retrievals, such as those in this study, to take several times longer, meaning that cloud-computing services will be even more essential.

table shows two AWS results
Table 1: The specifications of two AWS EC2 instances used to conduct trial runs of atmospheric retrieval algorithms.
Retrieval time for several Exoplanets
Figure 2: The average retrieval times for five trial runs of PLATON's Mutlinested Sampling (multinest) and Markov Chain Monte Carlo (emcee) algorithms, conducted on each of the AWS EC2 instances described in Table 1. Trial runs were conducted for four separate exoplanets.
Time vs price for different servers
Figure 3: The cost of various AWS EC2 instances as a function of time.


Moving forward, in addition to supporting CHIMERA, we also aim to enable additional python methods for storing, visualizing and interacting with results, as well as enabling users to perform such calculations through the ExoCTK web application directly. In the meantime, we hope you try out the ExoCTK library for your research. We are open to contributions and happy to respond to questions.

Acknowledgement: This project was supported by the Data Science Innovation Initiative (DSII), a program at STScI to encourage innovation and develop expertise in data science.