State Parametrization in Machine Learning for Atmospheric Convection

1. Introduction

Global climate models (GCMs) face a significant challenge in explicitly resolving convective-scale motions while conducting long-term climate simulations (IPCC 2014). To accommodate grid spacings of 25 km or larger, crucial physical processes occurring at smaller spatial scales, such as moist atmospheric convection, must be approximated. This approximation, known as subgrid-scale parameterization, is a major source of uncertainty in projections of future climate change magnitude and spatial distribution (Schneider et al. 2017).

Machine learning (ML) has emerged as a viable alternative to traditional parameterization, driven by advancements in computing power and data availability. From an ML perspective, parameterization can be viewed as a regression problem. It involves mapping a set of inputs—atmospheric profiles of humidity and temperature—to outputs—profiles of subgrid heating and moistening. Pioneering work by Krasnopolsky et al. (2005) and Chevallier et al. (1998) demonstrated the potential of training emulators for atmospheric radiation parameterizations. O’Gorman and Dwyer (2018) successfully trained a random forest (RF) to emulate the convection scheme of an atmospheric GCM, reproducing its equilibrium climate. More recently, neural networks (NNs) have been employed to predict total heating and moistening using datasets from the Superparameterized Community Atmosphere Model (SPCAM) (Rasp et al. 2018; Gentine et al. 2018) and a near-global cloud-resolving model (GCRM) (Brenowitz and Bretherton 2018, 2019). While most ML parameterizations are deterministic, potentially oversimplifying the representation of physical processes (Palmer 2001), stochastic extensions have been proposed (Krasnopolsky et al. 2013).

Random forests exhibit robustness to coupling as their output spaces are inherently bounded (O’Gorman and Dwyer 2018). In contrast, neural networks often demonstrate numerical instability when coupled with atmospheric fluid mechanics. Brenowitz and Bretherton (2019) addressed instability in coarse-grained GCRM data by removing upper-atmospheric humidity and temperature, which are influenced by lower-level convective processes, from the input space. Rasp et al. (2018) also encountered instability, resolving it through deeper architectures and extensive hyperparameter tuning. However, instability resurfaced when increasing the number of embedded cloud-resolving columns within SPCAM, even without significant retuning.

These sensitivities suggest a link between numerical instability and noncausal correlations in input data or suboptimal network architecture and hyperparameter choices. Brenowitz and Bretherton (2018) posit that NNs might detect correlations between upper-atmospheric humidity and precipitation and use them causally (humidity affecting precipitation), when the true causality is reversed. While the SPCAM instabilities’ origins are not fully understood and appear less sensitive to causal ambiguity, hyperparameter tuning sensitivities are suggestive. Regardless of the source, the potential for unbounded heating and moistening predictions from NNs outside the training data envelope poses a significant numerical stability challenge. This motivates the question: Can we predict the stability of NN convection parameterizations before GCM coupling?

Predicting NN behavior is intertwined with the challenge of interpreting NN emulators of physical processes. Various interpretability techniques exist for NNs, such as permutation importance and layer-wise relevance propagation (McGovern et al. 2019; Toms et al. 2019; Montavon et al. 2018; Samek et al. 2017; Molnar et al. 2018). However, adapting these techniques is crucial for interpreting NN convection parameterizations. This leads to a second question: How can we tailor ML interpretability techniques like partial-dependence plots and saliency maps for NN convection parameterization interpretation?

In atmospheric sciences, analyzing convective dynamics often involves examining the linearized response of parameterized or explicitly resolved convection to equilibrium perturbations (Beucler et al. 2018; Kuang 2018, 2010; Herman and Kuang 2013). These linearized response functions (LRFs) are typically derived by perturbing inputs and observing outputs (Beucler 2019) or by perturbing forcing and inverting operators (Kuang 2010). With finite-dimensional input/output bases, LRFs can be represented as matrices and directly computed from data through linear regression or automatic differentiation of nonlinear regression models (Brenowitz and Bretherton 2019).

While visualizing LRFs as matrices provides sensitivity insights, it doesn’t fully predict coupling consequences with atmospheric fluid mechanics (GCM dynamical core). Kuang (2010) addresses this by coupling CRM-derived LRFs with linearized gravity wave dynamics, developing a vertically truncated ordinary differential equation model (Kuang 2018) and discovering convectively coupled wave modes distinct from LRF eigenmodes. This 2D linearized framework is used for tropical plane wave instability studies (Hayashi 1971; Majda and Shefter 2001; Khouider and Majda 2006; Kuang 2008), typically analyzing vertically truncated equations for two to three vertical modes. Coupling the full LRF generalizes this to a fully resolved vertical structure basis. Wave spectra derived from machine-learned parameterization LRFs can indicate coupled simulation stability at reduced computational cost.

LRFs offer a comprehensive sensitivity view but can be difficult to interpret due to high-dimensional input and output spaces. However, moist atmospheric convection, the dominant parameterized process, is known to be sensitive to midtropospheric moisture and lower-tropospheric stability (LTS). Convection intensity increases exponentially with midtropospheric moisture (Bretherton et al. 2004; Rushley et al. 2018), possibly due to buoyancy control by environmental moisture (Ahmed and Neelin 2018). Sufficiently large LTS inhibits convection, favoring stratocumulus cloud layers (Wood and Bretherton 2006). Lower stability is expected to increase convection depth, even for shallow, turbulent clouds unresolved in SPCAM or GCRM training datasets.

This study investigates NN parameterizations using interpretability techniques to: 1) build confidence in the realistic behavior of ML parameterizations of moist convection and 2) establish a diagnostic framework for predicting NN-induced numerical instability. Two sets of NN parameterizations, trained using GCRM (Brenowitz and Bretherton 2019) and SPCAM (Rasp et al. 2018), are analyzed, comparing their sensitivities and the SPCAM and GCRM models they emulate.

The paper is structured as follows: Section 2 introduces GCRM and SPCAM training datasets and summarizes the similar ML parameterizations. Section 3 presents the LTS-moisture sensitivity framework (Section 3a) and wave-coupling methodology (Section 3c) for assessing NN parameterization stability, along with stabilization techniques (Section 4). Section 5 presents results, comparing LTS and moisture control on NN parameterizations (Section 5a) and applying the wave-coupling framework to predict coupled instabilities (Sections 5b and 5c). Section 6 concludes the paper.

2. Machine-Learning Parameterizations

a. Global Cloud-Resolving Model

Brenowitz and Bretherton (2018, 2019) trained their NNs using a near-global aquaplanet simulation from the System for Atmospheric Modeling (SAM), version 6.10 (Khairoutdinov and Randall 2003). The simulation was conducted in a channel configuration (46°S to 46°N) with a 4 km horizontal grid spacing and 34 vertical levels over a zonally symmetric ocean surface, with sea surface temperature ranging from 300.15 K at the equator to 278.15 K at the poleward boundaries. The GCRM training data comprised 80 days of instantaneous three-dimensional fields sampled every 3 hours.

The NN scheme parameterizes apparent heating and moistening over 160 km grid boxes, a 40-fold downsampling from the original 4 km data. This resolution retains significant precipitation autocorrelation at a 3-hour lag, capturing relevant moist dynamics. Apparent heating Q1 and moistening Q2 are defined using SAM’s prognostic variables: total nonprecipitating water mixing ratio *qT (kg kg−1) and liquid–ice static energy sL (J kg−1). On the coarse grid, these variables are advected by large-scale flow and forced by Q1 (W kg−1) and Q*2 (kg kg−1 s−1), described by:

∂sL¯∂t=⁡(∂sL¯∂t)GCM+Q1, (1)

∂qT¯∂t=⁡(∂qT¯∂t)GCM+Q2, (2)

where the overbar denotes a coarse-grid average.

Unlike SPCAM, GCRM apparent sources are budget residuals. Storage terms (left-hand side) are estimated using 3-hourly data finite differences. Coarse-resolution GCM tendencies (∂sL¯/∂t)GCM and (∂qT¯/∂t)GCM are estimated by initializing the coarse-resolution SAM (cSAM) at 160 km grid spacing, running it forward for ten 120-second time steps without parameterized physics, and computing time derivatives using finite differences. cSAM runs at 160 km resolution as the coarse-graining timescale. Further details on this workflow are in Brenowitz and Bretherton (2019).

b. Superparameterized Climate Model

Complementing the SAM global cloud-resolving model-trained NN parameterization, NNs trained on SPCAM, version 3.0 (Khairoutdinov and Randall 2001; Khairoutdinov et al. 2005), are analyzed. SPCAM embeds eight SAM columns (4 km × 20 s spatiotemporal resolution) within each Community Atmosphere Model (CAM) grid column (2° × 30 min spatiotemporal resolution), replacing CAM’s deep convection and boundary layer parameterizations to enhance convection, turbulence, and microphysics representation. SPCAM offers a compromise between computationally expensive global SAM and overly coarse CAM, which struggles with convection representation (Oueslati and Bellon 2015). The NN parameterization (Rasp et al. 2018; Gentine et al. 2018) emulates the embedded SAM models’ vertical redistribution of temperature (approximately Q1) and water vapor (approximately Q2) in response to coarse-grained conditions from the host model’s dynamical predictions (temperature profile, water vapor profile, meridional velocity profile, surface pressure, insolation, surface sensible heat flux, surface latent heat flux; prior to convective adjustment). The NN also incorporates cloud-radiative feedback by predicting total diabatic tendency (convective plus radiative). Key differences from the SAM NN parameterization (Section 2a) include training on higher-frequency data (30 min vs. 3 hourly) of a different form, allowing unambiguous separation of gridscale drivers and subgrid-scale responses, simplifying NN parameterization definition by avoiding coarse-graining challenges. SPCAM’s lower computational cost permits longer training datasets (2 years vs. 80 days for SAM) and its spherical dynamics enable fully global (including extratropical) effects. The NN was trained on the first year of data (~140 million samples) and validated on the second. SPCAM-based NNs’ main disadvantage is the artificial scale separation inherent in superparameterization, inhibiting some dynamics modes, and embedded CRMs’ idealizations (2D dynamics, limited array extent) compromise emergent dynamics aspects (Pritchard et al. 2014). Section 2 of Rasp (2019) provides a detailed comparison of various ML convection parameterizations.

c. Neural Network Parameterization

Both SPCAM and GCRM parameterizations share a similar form. A neural network predicts Q1 and Q2 as functions of the thermodynamic state within the same atmospheric column. The functional form is:

Q=f⁡(x,y;φ); (3)

where Q = [Q1(z1), …, Q1(*zn), Q2(z1), …, Q2(zn)]T represents heating and moistening for a column; x is the concatenated vector of thermodynamic profiles—qT and sL* for GCRM, humidity and temperature for SPCAM; y are auxiliary variables like sea surface temperature, top-of-atmosphere insolation for GCRM, or surface enthalpy fluxes for SPCAM. ML does not prognose these auxiliary variables’ sources.

Rasp et al. (2018) and Brenowitz and Bretherton (2019) use simple multilayer feed-forward NNs for f. While hyperparameters and structures differ slightly (layers, activation functions), the key aspect is NNs’ almost-everywhere differentiability, advantageous over tree-based models. NNs achieve this through composing affine transformations with nonlinear activations. A layer transforms hidden values *x*n to the next layer’s activations:

xn+1=σ⁡(Anxn+bn), (4)

where An is a weight matrix, b**n is a bias vector, and σ is a nonlinear activation function applied element-wise. Inputs feed into the first layer (x0 = x), and outputs are read from the final layer (xm = Q). NN parameters **φ = {A1, …, Am, b1, …, b*m} are tuned by minimizing a cost function J(φ*), typically a weighted mean-squared error, using stochastic gradient descent. Libraries like TensorFlow (Abadi et al. 2015) and PyTorch (Paszke et al. 2019) enable automatic derivative computation for training. This capability is used to explore NN parameterization’s linearized sensitivity to inputs across base states.

For GCRM, an NN initially including all vertical levels of humidity and temperature inputs (unstable configuration, crashing after 6 days Brenowitz and Bretherton 2019) is analyzed. The network has three 256-node hidden layers, ReLU activation, and is trained for 20 epochs using Adam to minimize mass-weighted mean-squared error of Q1 and Q2 budget residuals. Training loss includes a regularization term for stable equilibrium convergence. PyTorch is used, input data is scaled by mean and variance, and hyperparameters are identical to Brenowitz and Bretherton (2019). Western data half trains, eastern half tests to prevent overfitting. After 20 epochs, mass-weighted root-mean-squared error (MSE) of Q1 are 3.46 and 3.47 K day−1 for training and testing, respectively; for Q2, 1.51 and 1.48 g kg−1 day−1. Near-identical errors suggest no overfitting due to large sample size (millions) vs. network parameters (100,000s). Analyses use the full dataset.

For SPCAM, two NNs (“NN-stable” and “NN-unstable”) with identical architectures (9 fully connected 256-node layers, 20 epochs, Adam) and training are analyzed using TensorFlow. NN-stable, trained with 8 SAM columns per grid cell and manual hyperparameter tuning (Rasp et al. 2018 SI), led to successful multiyear climate simulations when coupled to CAM. NN-unstable, trained with 32 SAM columns and less tuning, produced moist midlatitude instabilities, crashing prognostic simulations within 2–15 days (movie S1). Crash time varied with initial conditions, but no NN-unstable simulations lasted over 2 weeks. SPCAM interpretability analyses use validation period data only.

The following sections demonstrate how physically motivated diagnostics anticipate, explain, and begin to resolve these instabilities.

3. Interpreting ML Parameterizations

a. Variation of Inputs

Several key parameters govern moist atmospheric convection strength and height. A robust parameterization, including machine learning-based ones, should capture convection’s dependence on these parameters. One such parameter is the Lower Tropospheric Stability (LTS):

LTS=θ⁡(700 hPa)−SST, (5)

where θ is potential temperature and SST is sea surface temperature. Low LTS indicates conditional instability in the lower troposphere, favoring deep convection.

Another controlling parameter is midtropospheric moisture, defined as:

Q=∫850hPa550hPaqT dpg. (6)

Cumulus updrafts entrain surrounding air as they rise. Dry surrounding air (low Q) induces evaporative cooling when mixed into updrafts, hindering deep precipitating convection. Moist columns tend to precipitate exponentially more than dry ones (Bretherton et al. 2004; Neelin et al. 2009; Ahmed and Neelin 2018; Rushley et al. 2018).

To examine NN dependence on these parameters, GCRM and SPCAM training datasets are partitioned into LTS and Q bins. For GCRM, tropical and subtropical points (23°S–23°N) are used; humidity bins are 2 mm wide, and LTS bins are 0.5 K wide. For SPCAM, 20 bins are evenly spaced between 0–40 mm for moisture and 7–23 K for LTS, spanning observed tropical ranges (Figs. 1a,b). NN inputs x are averaged over these bins, denoted as E[x¯|Q,LTS]. Parameterization sensitivity to Q and LTS is given by:

f⁡(Q,S)=f⁡(E[x¯|Q,LTS];φ), (7)

where φ are NN parameters. Due to f‘s nonlinearity, this differs from averaging NN outputs over bins, testing nonlinear sensitivity to systematic input changes. Bin averages of “true” apparent heating E[Q1|Q, LTS] and moistening E[Q2|Q, LTS] are also plotted to indicate the NN’s predictive success across bins.

b. Linear Response Functions

The previous method shows nonlinear ML parameterization dependence on a few inputs but is hard to extend to the full input space. Instead, Linear Response Functions (LRFs) or saliency maps are used. LRFs are computed from NN outputs. Using nonlinear continuous (almost everywhere differentiable) models like NNs allows computation of local linear convection response for various base states.

LRFs have been used in developing ML parameterizations. Brenowitz and Bretherton (2019) used LRFs to analyze causes of unstable dynamics in their NN parameterizations when coupled to a GCM. Most analyses linearize the GCRM-trained NN around the tropical mean profile, where coupled GCM-NN instabilities develop. This state isn’t radiative-convective equilibrium (RCE), so positive LRF modes likely represent decay to RCE. However, this equilibration is assumed slower than coupled GCM-NN blowups, allowing LRFs to reveal instability mechanisms. Developing ML parameterizations with stable radiative-convective equilibrium is a future research task.

c. Coupling to Two-Dimensional Linear Dynamics

While LRFs offer insights into parameterization effects on a single atmospheric column in radiative convective equilibrium, they alone cannot predict feedbacks when coupled to the environment. This coupling is considered critical in catastrophic instability because NNs accurate in single-column models (Brenowitz and Bretherton 2018) can still fail when coupled to GCM wind fields (Brenowitz and Bretherton 2019). While nonlinearity may cause coupled simulations to catastrophically fail outside the training set, the initial movement towards the training manifold edge is hypothesized to be linear, arising from interactions between parameterization and small-scale gravity waves – the fastest modes in large-scale atmospheric dynamics. This interaction is well-studied (Hayashi 1971; Majda and Shefter 2001) and can cause gridscale storms with traditional moisture convergence closure-based parameterizations (Emanuel 1994).

Linearized dynamics of ML parameterization coupled to gravity waves are derived, assuming two-dimensional (vertical and horizontal) flow, neglecting rotation, and assuming anelastic and hydrostatic dynamics with zero mean winds. Linearized anelastic equations for humidity q, static energy s, and vertical velocity w perturbations are:

qt+q¯zw=Q2′, (8)

st+s¯zw=Q1′, (9)

wt=−⁡(A−1B)xx−dw. (10)

The last equation is derived by taking the divergence of the horizontal momentum equation, eliminating the pressure gradient using hydrostatic balance and anelastic nondivergence (see Appendix, Section a for details).

Here, A is a continuous elliptical vertical differential operator Aw = (∂/∂z)[(1/ρ0)(∂/∂z)(ρ0w)]. With rigid-lid boundary conditions, w(0) = w(H) = 0, A is invertible, yielding A−1. In practice, operators are discretized using finite differences for matrix algebra operations.

Focused on free-tropospheric dynamics, the virtual effect of water vapor is neglected, and buoyancy is approximated by B=gs/s¯. Momentum damping rate is fixed at d = 1/(5 days). Equations are discretized using centered differences (Appendix, Section b), with rigid-lid (w = 0) boundary conditions at the surface and atmosphere top, similar to Kuang (2018).

Perturbation heating Q1′ and moistening Q2′ are linear transformations of the perturbed state, potentially vertically nonlocal:

Q1′⁡(z)=1H∫0H[∂Q1⁡(z)∂q⁡(z′) q⁡(z′)+∂Q1⁡(z)∂s⁡(z′) s⁡(z′)]dz′, (11)

and similarly for Q2′. Discretizing onto a fixed height grid and concatenating s and q fields into vectors s = [s(z1), …, s(*z*n)]T and q, Q1, and Q2, yields:

(Q1Q2)=(MssMsqMqsMqq)⁡(sq). (12)

Matrix subblocks (e.g., *Mqq) encode the linearized response function for heights z1, …, zn. Assuming horizontal plane wave solution with wavenumber k*, (8)-(10) are encoded in matrix form:

∂∂t⁡(qsw)=T⁡(qsw), (13)

where the linear response operator T for total water, dry static energy, and vertical velocity is:

T=(MqqMqsdiag⁡(q¯z)MsqMssdiag⁡(s¯z)0−gk2A−1diag⁡(s¯)−1−dI). (14)

Here, I is the identity matrix, and diag creates a diagonal matrix from a vector.

The spectrum of (13) is computed numerically for each wavenumber k to obtain a dispersion relationship. Appendix Section b derives the discretization for elliptic operator A. Eigenvalue λ‘s real component is the growth rate; phase speed is cp=−ℑλ/k. Eigenvectors of T describe wave mode vertical structure in s, q, and w. For eigenvector v, wave structure over a complete oscillation phase is shown by ℜ{v exp iϕ} for 0 ≤ ϕ < 2π. Phase speed and growth rates are shown for every eigenmode across wavenumbers, visualizing vertical structures of interesting modes like unstable propagating or standing waves.

4. Regularization

NNs are often numerically unstable when coupled to atmospheric fluid dynamics, a central challenge addressed in recent research. Causality issues in coarse-grained training data may contribute. In GCRM, strong correlation exists between upper-tropospheric total water (input) and precipitation (output), expected because deep convection lofts moisture high, and total water includes cloud water and ice. However, using this as a closure violates physical causality, as moist convection is triggered by lower-atmospheric thermodynamic properties.

Brenowitz and Bretherton (2019) found stable schemes by removing temperature above level 15 (375 hPa) and humidity above level 19 (226 hPa) from NN input features, reducing spurious causality potential. These “ablated inputs” are ad hoc but effective, discovered using LRF analysis (Section 3b), demonstrating ML interpretability techniques’ aid in ML parameterization development.

SPCAM-trained NN stability remains challenging. Removing upper-atmospheric inputs (Brenowitz and Bretherton 2019) did not stabilize SPCAM NNs (movie S2). SPCAM instabilities may be linked to NN fit imperfections, exacerbated by limited hyperparameter tuning. However, “input regularization,” developed using LRF analysis, stabilizes SP-trained NNs.

Similar to upper-atmospheric ablation for GCRM NNs, input regularization was discovered through LRF analysis. Direct LRF calculation of SPCAM-trained NNs via automatic differentiation yields noisy, hard-to-interpret LRFs (Fig. S2 top in online supplement). Coupled to 2D dynamics, these LRFs produce unphysical stability diagrams, with unstable modes exceeding 300 m s−1 phase speeds even for NN-stable (Fig. S2 bottom).

LRF noisiness indicates NN nonlinear response to small input changes. While potentially desirable for NN convective parameterization (Palmer 2001), it hinders clean NN parameterization interpretation via automatic differentiation about individual basic states. SAM-trained NNs have smoother responses, possibly due to network architecture, training strategy, or training data differences. Understanding this discrepancy is a future challenge.

NN-stable SPCAM network’s stable interactive coupling to GCM dynamics suggests insensitivity to larger perturbations. This motivates feeding an ensemble ⁡(X) of randomly perturbed inputs to the SPCAM NN parameterization, averaging the resulting output ensemble ⁡(Y) to better understand NN behavior.

“Input regularization” is applied to the NN-unstable instability problem in 5 steps:

Track instability (longitude, latitude) coordinates (movie S1) back to identify base state xunstable leading to instability. Choose earliest time step with maximum convective moistening field (Q2) perturbation.
Construct input ensemble {xi=1,2,…,n⁡(i)} of n members by perturbing xunstable using normal distribution N with mean 0 and standard deviation σreg:

xj⁡(i)=(1+zj⁡(i))xjunstable,zj⁡(i)~N⁡(0,σreg). (15)

σreg is “regularization amplitude” (%): larger σreg broadens the input ensemble.
3. Feed each ensemble member x(i) into NN parameterization, producing output ensemble {y(i)}i=1,…,n.
4. Calculate LRF about input ensemble by automatically differentiating each input–output pair and ensemble-averaging LRFs:

LRFreg=1n ∑i=1n∂y⁡(i)∂x⁡(i). (16)
5. Couple “regularized” LRF to 2D wave coupler to calculate stability diagram for NN behavior near regularized inputs {xi=1,2,…,n⁡(i)}.

Adding significant input vector spread may push NN outside its training set, causing biases. This approach is not advocated as a general SPCAM-based NN stabilization strategy. However, regularization is used to investigate interpretability framework robustness by comparing offline predictions to online prognostic failure modes. Gaussian noise addition to inputs is common for NN training regularization.

5. Results

a. The Onset of Deep Convection in ML Parameterizations

Figure 1 shows 2D binning of combined tropical and subtropical data in midtropospheric humidity Q and LTS space for latitudes equatorward of 22.5°. GCRM simulation distribution is bimodal, with high-moisture, low-stability samples (tropics) and another peak at lower moistures (~7 mm) from subtropics. SPCAM simulation is moister, with modal Q around 30 mm vs. GCRM’s <20 mm, due to SPCAM’s 2–3 K warmer peak SST.

Net precipitation—surface precipitation (P) minus evaporation (E)—is used as a deep convection proxy, predicted by both NNs as column integral of apparent drying −Q2 and distinguishing between regimes of low (P < E) and substantial (P > E) precipitation. GCRM results are for an unablated NN. Both training datasets and ML parameterizations predict net precipitation increases dramatically with midtropospheric humidity, consistent with numerous studies (Bretherton et al. 2004; Neelin et al. 2009; Rushley et al. 2018). Net precipitation has weaker LTS dependence, but stability is important for intermediate moistures (~15–20 mm). Differences between bin-averaged net precipitation and NN predictions with bin averages are small (~10 mm day−1). ML parameterizations smoothly and realistically depend on Q and LTS.

Vertical structure variations of inputs and predicted heating/moistening with Q and LTS are examined. Figure 2 shows vertical profiles for varying LTS, binned over tropical GCRM columns with 20 ≤ Q < 22 mm. Figure 3 shows LTS dependence for moister SPCAM simulation with 33.7 ≤ Q < 35.7 mm. Humidity reaches higher in the atmosphere for low stabilities, with lower-tropospheric humidity offsetting these gains to maintain constant Q. GCRM predicted heating and moistening shift from shallow to deep profiles as LTS decreases, but moistening magnitude remains relatively unchanged (Fig. 1c), suggesting LTS controls convection height more than strength. However, it’s unclear if these changes are direct responses to lower-tropospheric temperature structure or humidity profile changes (Figs. 2a and 3a). SPCAM NNs fail to predict clear convection deepening with decreased stability, possibly due to chosen Q bin, fundamental moist convection differences between SPCAM and GCRM, or warmer SPCAM SSTs. Each NN faithfully represents its training dataset’s convective sensitivity.

, but for deepening of convection for SPCAM, varying the LTS with midtropospheric humidity between 33.7 and 35.7 mm.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f3.jpg)

Predicted heating and moistening vary similarly to bin-averaged Q1 and Q2 profiles, but the latter are more LTS-sensitive than NNs. NNs predict with single input profiles, while bin-averaged Q1 and Q2 are statistical averages of many profiles. Figures demonstrate NN prediction sensitivity to systematic input variable changes, whereas bin-averaged heating and moistening show statistical, potentially noncausal, links between heating, moistening, moisture, and LTS in data.

Figures 4 and 5 show midtropospheric humidity control over vertical structure for GCRM and SPCAM, respectively. LTS is fixed between 9–10 K for GCRM and 11–12 K for SPCAM, both unstable ranges. Water amount changes dramatically with Q in both simulations, as Q is highly correlated with total precipitable water. Both NNs predict cooling and moistening near the boundary layer top for drier profiles (z = 2000 km−1) due to shallow clouds. Above a Q threshold, the sign flips, and ML parameterization predicts increasingly deep heating and moistening. For the three moistest profiles, heating and moistening strengthen dramatically with little vertical structure change. Predicted tendencies are similar to bin averages.

, but for strengthening of convection for SAM, varying midtropospheric moisture Q for LTS between 9.0 and 9.5 K.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f4.jpg)

, but for strengthening of convection for SPCAM, varying midtropospheric moisture Q for LTS between 11 and 12 K.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f5.jpg)

b. Stabilizing via Ablation of Inputs

Brenowitz and Bretherton (2019) achieved a stable scheme by training their NN without upper-atmospheric temperature or humidity input. However, they didn’t examine whether ablating temperature, humidity, or both prevented instability. “Ablate” is used analogously to neuroscience’s brain tissue removal effect on behavior. This section uses the wave-coupling framework (Section 3c) to explore independent effects of temperature and humidity ablation. Wave coupling is wavenumber-specific, computationally cheaper than nonlinear simulation, but indicative of NN coupled simulation performance.

The LRF of an NN trained with all atmospheric inputs (Brenowitz and Bretherton 2019, cf. Fig. 1) is computed first. Brenowitz and Bretherton (2019) chose lower humidity level as upper-tropospheric humidity is vanishingly small, while temperature remains of similar magnitude. Upper-atmospheric humidity and/or temperature inputs are sequentially removed by inserting 0 in corresponding LRF entries. Configurations studied:

All atmospheric inputs (unablated).
No humidity above level 15 (375 hPa).
No temperature above level 19 (226 hPa).
No humidity above level 15 nor temperature above level 19 (fully ablated).

Figure 6 shows dispersion relationships after each ablation (zoomed in Fig. 7). With all inputs, numerous propagating modes with 10–25 m s−1 phase speeds and positive growth rates exist, becoming more unstable at shorter wavelengths. Ablating upper-atmospheric humidity still leaves unstable modes, including a fast 100 m s−1 propagating instability. Ablating only upper-atmospheric temperature yields similar results to full input. Ablating both upper-atmosphere temperature and humidity removes many unstable propagating modes, leaving stationary or very slow phase speed unstable modes. Ablating both humidity and temperature is necessary for stable nonlinear simulations using ML trained on coarse-grained 3-hourly GCRM data. Table 1 summarizes these results.

). Waves inside of this region are likely responsible for instability.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f6.jpg)

, but only showing modes with a phase speeds less than 10 m s−1.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f7.jpg)

Table 1. Summary of stability issues related to removing upper-atmospheric inputs. The fourth column shows the maximum phase speed over all modes with growth rates larger than 0.05 day−1.

Phase-space structure of two modes is shown in Figs. 8 and 9. Figure 8 shows a standing mode in the fully ablated case. At 628 km wavelength, this mode has an e-folding time of 1/0.32 ≈ 3 days, zero phase speed, and primarily couples vertical velocity (Fig. 8a) to total water mixing ratio (Fig. 8c), with upward velocity (negative ω) corresponding to anomalous moistening and moisture (Fig. 8c). Heating and cooling (Fig. 8d) nearly compensate for vertical motions.

, but for the structure of a spurious unstable propagating mode.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f9.jpg)

Free-tropospheric heating and temperature are relatively uninvolved, but boundary layer anomalies are large. The standing mode appears to be mostly a moisture mode, similar to modes causing convective self-aggregation in large-domain CRM radiative-convective equilibrium simulations (Bretherton et al. 2005) and thought important for large-scale organized convection like the Madden–Julian oscillation (Sobel and Maloney 2013; Adames and Kim 2016). Kuang (2018) found similar mode instability only with radiative effects in LRF. In contrast to Kuang, a NN trained to predict Q1 − Qrad, where Qrad is coarse-grained radiative tendency, also predicted an unstable standing mode (not shown). However, reliable convection-radiation separation is unclear due to GCRM budget residuals and training method noisiness.

LRF-wave analysis can also identify spurious wave-like modes contributing to numerical instability. Figure 9 shows an unablated model mode with planetary-scale 6283 km wavelength, 44 m s−1 phase speed, and fast 1.29 day−1 growth rate. This mode is stable at shorter wavelengths. Moisture, humidity, moistening, and drying tendencies are in phase, in quadrature with ω. Vertical motion tilts upward away from propagation direction. Both humidity and temperature anomalies contribute significantly but have strange vertical structures. Upper-atmosphere temperature anomalies are strongly coupled to free-tropospheric moisture anomalies with complex vertical structure. These structures are not reminiscent of known convective organization modes, and this mode could explain instability when coupled to a GCM.

c. Stabilizing Gravity Wave Modes via Regularization of Inputs

Whether the wave-coupling framework successfully applied to GCRM data can predict prognostic-mode failures in SPCAM simulation is explored. ML climate models trained on different datasets exhibit different instability forms. SPCAM-trained NN-unstable tends to become unstable outside tropics, more gradually than GCRM-trained model. Different ML-based models may become unstable for different reasons. Input ablation, while stabilizing GCRM-trained NN, does not stabilize SPCAM-trained NN-unstable, which still crashes after ~3 days coupled to CAM, even after ablating input vector’s top half (movie S2). “Input regularization” is introduced as an alternative empirical technique to improve SPCAM-trained NN’s prognostic performance. The computationally affordable wave-coupling (Section 3c) predicts the regularization needed to stabilize coupled GCM-NN simulations.

NN-unstable’s physical credibility is revisited compared to NN-stable using diagnostics revealing causal ambiguity in GCRM-trained NN. NN-unstable is expected to struggle prognostically due to positive heating and moistening biases (Fig. S1) for high midtropospheric moisture (~20 mm) and low LTS (~12 K), deep convection-favorable conditions. Positive convective moistening biases in moist regions can lead to instability through spurious moistening of growing water vapor perturbations.

Increased input regularization (Section 4) effectively controls stability properties (Fig. 10). Top row shows regularized LRFs about base state xunstable for 1%, 10%, and 20% regularization amplitudes. Preliminary stability analysis, including eigenvalue analysis and weak temperature gradient coupling (Beucler et al. 2018), indicates NN-unstable’s damping rates are closer to 0 than NN-stable’s. NN-unstable may not damp instabilities quickly enough. Wave coupler stability diagram (Fig. 10f) provides more extensive instability description using tightly regularized (1%) input ensemble about xunstable. While NN-stable has few slow-growing modes about unstable base state (Fig. 10e), NN-unstable exhibits fast-growing (~1 day−1) modes (Fig. 10f) propagating at 5–20 m s−1 phase speeds. Wave-coupling framework anticipates instability from coupling NN-unstable to CAM.

, the light blue background indicates where the phase speed |*c*p| > 5 m s−1 and the growth rate is positive.](/view/journals/atsc/77/12/full-jas-d-20-0082.1-f10.jpg)

Wave-coupling predicts more input regularization should stabilize NN-unstable. Largest propagating growth rates decrease from ~1 day−1 (1% regularization) to ~10−3 day−1 (25% regularization) (Fig. 10h). CAM simulations test this prediction, coupling host model grid columns to mean prediction of 128-member NN prediction ensemble with Gaussian-perturbed inputs, instead of single NN prediction. Input spread varied 1%–25% across five experiments, each repeated in a four-member miniensemble from different SPCAM-generated initial conditions spaced five days apart. Figure 11 shows time to failure for increasing regularization. With ≤15% regularization, no simulations last >21 days, consistent with unstable modes in 2D wave-coupler diagnostic. Extratropical failure modes occur. With 20% spread, longer-term prognostic tests become possible. Most dramatic instability delay occurs between 15% and 20% spread, consistent with offline 2D wave coupler tests indicating unstable mode shutdown near 16% regularization. While linear model neglecting rotation may not accurately predict nonlinear midlatitude instabilities in full-physics GCM, offline diagnostics tools developed in this study apply to a wide range of instability situations.

6. Conclusions

Machine learning parameterizations can address structural biases in traditional parameterizations by increasing tuning parameters, but they present interpretability and stability challenges when coupled to atmospheric fluid dynamics. This study developed an interpretability framework for ML convection parameterizations and analyzed offline vs. online coupled prognostic performance.

Systematic input variable variation in 2D thermodynamic space demonstrated that two independent ML parameterizations behave as expected. Increased midtropospheric moisture increases predicted heating and moistening (Bretherton et al. 2004), and increased lower-tropospheric stability controls convection depth. These sensitivities are consistent with training data and are somewhat consistent between SPCAM and GCRM neural networks, showing machine-learning parameterization robustness. Both predict net precipitation increase for moist and unstable columns, though heating and moistening vertical structures differ. Future work could build on these methods, using estimated inversion strength (Wood and Bretherton 2006) and lower-tropospheric buoyancy (Ahmed and Neelin 2018) instead of stability and moisture for more orthogonal convection controls.

An offline technique was developed to predict ML parameterization online stability in GCRM and SPCAM limits. Coupling gravity wave dynamics to ML parameterization linearized response functions allows computation of growth rates, phase speeds, and gravity wave mode physical structure. Propagating unstable modes are associated with numerical instability in online coupled runs in both SPCAM and SAM, likely a root cause. Stabilized schemes (input ablation for SAM, input regularization for SPCAM) show stable propagating modes and more stable prognostic tests. These stabilization schemes are crude and may bias climate modeling. Frameworks neglecting rotation or background shear indicate gravity wave-parameterized heating interactions play a role in numerical instability, potentially causing drift towards training dataset boundaries and subsequent extrapolation-induced blowup.

This study also compares NNs trained on coarse-grained GCRM vs. SPCAM data. SPCAM LRFs are noisier than SAM LRFs, requiring further study of factors beyond training data (hyperparameters, network architecture, optimization). Smoothing neural network Jacobians is needed, using ensemble-averaged Jacobians (Krasnopolsky 2007), Jacobian of mean profile (Chevallier and Mahfouf 2001), or weight smoothing (Aires et al. 1999). “Input regularization” is introduced as an alternative to ensemble averaging without training multiple networks. Regularized SPCAM LRFs are smoother, aiding physical interpretation, comparison to GCRM LRFs, and analysis using wave-coupling framework. “Input regularization” is not a solution to SPCAM-trained model instability but helps prognostic stability. Hyperparameter tuning for optimal skillful fits may be more stable online.

Wave-coupling analysis has physical implications, but more work is needed to compare NN-derived LRFs with other approaches (Kuang 2010). Analysis wasn’t in radiative-convective equilibrium (RCE) profiles but near instability regions. Spectra robustness is hard to quantify, especially near-zero phase speeds and growth rates, acceptable for spurious unstable wave discovery but requiring more rigorous statistical framework for true physical mode inferences.

This manuscript presented techniques for examining machine-learning parameterizations, leading to new regularization techniques and allowing domain experts to assess ML parameterization physical plausibility. ML parameterizations behave according to physical intuition, potentially accelerating parameterizations and developing accurate data-driven parameterizations. Interpretability techniques may aid in discovering elegant solutions to coupled stability problems and exploring neural network hyperparameters (e.g., depth).

Acknowledgments

When starting this work, N.B. was supported as a postdoctoral fellow by the Washington Research Foundation, and by a Data Science Environments project award from the Gordon and Betty Moore Foundation (Award 2013-10-29) and the Alfred P. Sloan Foundation (Award 3835) to the University of Washington eScience Institute. C.B. was initially supported by U.S. Department of Energy Grant DE-SC0016433. NB and CB acknowledge support from Vulcan, Inc., for completing this work. TB and MP acknowledge support from NSF Grants OAC-1835863, OAC-1835769, and AGS-1734164, as well as the Extreme Science and Engineering Discovery Environment supported by NSF Grant ACI-1548562 (charge numbers TG-ATM190002 and TG-ATM170029) for computational resources.

Data availability statement

The software and bin-averaged data are available on GitHub (https://github.com/nbren12/uwnet) and archived on zenodo.org (Brenowitz and Kwa 2020). The training dataset for the GCRM simulation have been archived on zenodo.org (Brenowitz 2019). The raw SPCAM outputs amount to several TB and are available from the authors upon request.

APPENDIX

Derivation of 2D Anelastic Wave Dynamics

a. Continuous equations

The linearized hydrostatic anelastic equations in the horizontal direction x and height z are:

qt+q¯zw=Q2′, (A1)

st+s¯zw=Q1′, (A2)

ut+ϕx=−du. (A3)

Prognostic variables are humidity q, dry static energy s = T + (g/*c*p)z, horizontal velocity u, and vertical velocity w, perturbations from a large-scale state (overbar). Anelastic geopotential term ϕ = p′/ρ0, where ρ0(z) is reference density profile.

These are completed by hydrostatic balance and mass conservation. Hydrostatic balance is:

ϕz=B, (A4)

where B=gT/T¯ is buoyancy. Mass conservation is:

ux+1ρ0 ∂zρ0w=0. (A5)

Combining diagnostic relations and zonal momentum equation into a prognostic equation for w. Operators L = ∂z and ℋ=(1/ρ0)∂zρ0 are defined. Taking x derivative of momentum equation, and applying divergence-free condition:

ℋwt+dℋw−ϕxx=0. (A6)

Applying L:

Lℋ⁡(∂t+d)w=Bxx. (A7)

Let A=Lℋ, manipulate equations to obtain:

wt=−∂xxA−1B−dw. (A8)

A is an elliptic vertical operator requiring boundary conditions, rigid lid and impenetrable surface [w(0) = w(*HT) = 0], HT* is atmosphere depth.

b. Vertical discretization

Solving (A8) numerically requires discretizing elliptic operator A. Assume w, s, and q are vertically collocated. In domain interior, A is discretized as tridiagonal matrix:

⁡(Aw)k=akwk−1+bkwk+ckwk+1, (A9)

where

ak=ρk−1⁡(zk−zk−1)⁡(zk+1/2−zk−1/2)ρk−1/2, (A10)

bk=−ρk⁡(zk+1/2−zk−1/2)×[1⁡(zk+1−zk)ρk+1/2+1⁡(zk−zk−1)ρk−1/2], (A11)

ck=ρk+1⁡(zk+1−zk)⁡(zk+1/2−zk−1/2)ρk+1/2. (A12)

Index k ranges from 1 to N (vertical grid cells), z is height.

Rigid-lid boundary conditions: w0 = −w1 and *wn+1 = −wn. Vertical velocity at cell center. Boundary conditions implemented by modifying A*’s matrix representation:

⁡(Aw)1=−a1w1+b1w1+c1w2, (A13)

⁡(Aw)n=anwn−1+bnwn−cnwn (A14)

at lower and upper boundaries.

REFERENCES

Abadi, M., and Coauthors, 2015: TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.

Adames, Á. F., and D. Kim, 2016: MJO initiation by stochastic convective triggering. J. Climate, 29, 6777–6794, https://doi.org/10.1175/JCLI-D-15-0831.1.

Ahmed, R., and J. D. Neelin, 2018: Controls of boundary layer equivalent potential temperature and convective available potential energy by surface fluxes and entrainment. J. Climate, 31, 7789–7809, https://doi.org/10.1175/JCLI-D-17-0879.1.

Aires, F., P. Brunel, and C. Prigent, 1999: Smoothing regularization of neural networks for brightness temperature retrieval. IEEE Trans. Geosci. Remote Sens., 37, 2621–2627, https://doi.org/10.1109/36.803471.

Beucler, T., 2019: Linearized response functions of convection and clouds: Towards process-oriented parameterizations. Ph.D. thesis, Yale University, 204 pp. [Available online at https://search.proquest.com/docview/2338438566.]

——, G. Bellon, and J. B. Edson, 2018: A linearized parameterization of shallow cumulus clouds for mesoscale atmospheric models. J. Adv. Model. Earth Syst., 10, 2818–2841, https://doi.org/10.1029/2018MS001428.

Brenowitz, N. D., 2019: GCRM training data. Zenodo. https://doi.org/10.5281/zenodo.3463785.

——, and C. S. Bretherton, 2018: Prognostic validation of a neural-network unified physics parameterization. Geophys. Res. Lett., 45, 6289–6298, https://doi.org/10.1029/2018GL078539.

——, and ——, 2019: Spurious causality in neural network climate schemes. J. Adv. Model. Earth Syst., 11, 4197–4213, https://doi.org/10.1029/2019MS001823.

Bretherton, C. S., M. E. Peters, and L. E. Back, 2004: Relationships between boundary layer depth, lower tropospheric stability, column water vapor, and precipitation over the tropical oceans. J. Climate, 17, 2408–2425, https://doi.org/10.1175/1520-0442(2004)017<2408:RBBDLT2.0.CO;2>.

——, S. Uttal, C. Fairall, E. Schulz, S. Yuter, X. Zhou, and J. Comstock, 2005: Cloud droplet number concentrations in marine stratocumulus clouds: Aircraft measurements of spatial variability and droplet activation physics. J. Geophys. Res., 110, D08208, https://doi.org/10.1029/2004JD005022.

Chevallier, F., and J.-F. Mahfouf, 2001: Evaluation of Jacobians of radiative transfer models for infrared land surface emissivity and temperature retrieval. J. Geophys. Res., 106, 17 085–17 098, https://doi.org/10.1029/2000JD900849.

——, J.-J. Morcrette, and F. Cheruy, 1998: Neural network parameterization for the longwave radiative scheme of the ECMWF model. J. Geophys. Res., 103, 9143–9153, https://doi.org/10.1029/97JD03672.

Emanuel, K. A., 1994: Atmospheric Convection. Oxford University Press, 580 pp.

Gentine, P., M. Pritchard, S. Rasp, G. Reinaudi, and R. Y. Yuval, 2018: Could machine learning break the convection parameterization deadlock? Geophys. Res. Lett., 45, 5742–5751, https://doi.org/10.1029/2018GL078202.

Hayashi, Y., 1971: Instability of large-scale equatorial waves. J. Meteor. Soc. Japan, 49, 38–43, https://doi.org/10.2151/jmsjgeosoc.49.38.

Herman, J. R., and Z. Kuang, 2013: A linearized convective cloud parameterization for climate modeling based on cloud-resolving model simulations. J. Adv. Model. Earth Syst., 5, 321–339, https://doi.org/10.1002/jame.20028.

IPCC, 2014: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, 1535 pp.

Khairoutdinov, M. F., and D. A. Randall, 2001: Cloud resolving modeling of ARM summer-1997 IOP: Model formulation, results, validations. J. Geophys. Res., 106, 14 753–14 778, https://doi.org/10.1029/2001JD900033.

——, and ——, 2003: The cloud-resolving model intercomparison project (CRMip) tier-1 simulations. J. Geophys. Res., 108, 8655, https://doi.org/10.1029/2001JD001102.

——, D. A. Randall, and C. DeMott, 2005: Simulations of the midlatitude atmospheric general circulation with a cloud-resolving model embedded in a single-column model. J. Atmos. Sci., 62, 2135–2149, https://doi.org/10.1175/JAS3430.1.

Khouider, B., and A. J. Majda, 2006: Multicloud statistical models for tropical convection. Proc. Natl. Acad. Sci. USA, 103, 5343–5348, https://doi.org/10.1073/pnas.0601037103.

Krasnopolsky, V. M., 2007: Neural network emulations of complex GCM physics and parameterizations. Neural Networks, 20, 488–500, https://doi.org/10.1016/j.neunet.2006.12.006.

——, D. C. Lee, and L. C. Breaker, 2005: Neural network retrieval of geophysical parameters from satellite data. Neural Networks, 18, 187–198, https://doi.org/10.1016/j.neunet.2004.09.008.

——, A.нгаIlina, and N. N. Nesterov, 2013: Stochastic neural network ensemble-based weather nowcasting. Neural Networks, 45, 14–27, https://doi.org/10.1016/j.neunet.2013.04.007.

Kuang, Z., 2008: A minimal model of convectively coupled waves. J. Atmos. Sci., 65, 2599–2614, https://doi.org/10.1175/2007JAS2559.1.

——, 2010: Linear response functions of cumulus convection from cloud-resolving model simulations. J. Atmos. Sci., 67, 3737–3754, https://doi.org/10.1175/2010JAS3483.1.

——, 2018: Convective quasi-equilibrium with linearized convection. J. Adv. Model. Earth Syst., 10, 1289–1305, https://doi.org/10.1029/2018MS001288.

Majda, A. J., and M. J. Shefter, 2001: Models for radiative-convective instability and mesoscale triggering. J. Atmos. Sci., 58, 58–78, https://doi.org/10.1175/1520-0469(2001)058<0058:MFRCIA2.0.CO;2>.

McGovern, A., and Coauthors, 2019: Making sense of weather and climate predictions with interpretable machine learning. Bull. Amer. Meteor. Soc., 100, 2273–2291, https://doi.org/10.1175/BAMS-D-18-0195.1.

Molnar, C., P. Casalicchio, and B. Bischl, 2018: Interpretable machine learning – A brief history, state-of-the-art and challenges. Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020), Lecture Notes in Computer Science, Springer, Cham, 417–431, https://doi.org/10.1007/978-3-030-62415-3_27.

Montavon, G., W. Samek, and K.-R. Müller, 2018: Methods for interpreting and understanding deep neural networks. Digit. Signal Process., 73, 1–15, https://doi.org/10.1016/j.dsp.2017.10.011.

Neelin, J. D., C. Chou, and H. Su, 2009: Patterns of tropical rainfall and SST in observations and models. J. Geophys. Res., 114, D00A13, https://doi.org/10.1029/2008JD011005.

O’Gorman, P. A., and J. G. Dwyer, 2018: Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events. J. Adv. Model. Earth Syst., 10, 2548–2563, https://doi.org/10.1029/2018MS001351.

Oueslati, B., and G. Bellon, 2015: Sensitivity of simulated tropical cyclones to cumulus parameterization schemes in WRF model. Atmos. Res., 161–162, 92–110, https://doi.org/10.1016/j.atmosres.2015.03.019.

Palmer, T. N., 2001: A nonlinear dynamical perspective on model error: A proposal for non-probabilistic ensemble forecasting. Quart. J. Roy. Meteor. Soc., 127, 279–304, https://doi.org/10.1002/qj.49712757102.

Paszke, A., S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, J. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, 2019: PyTorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst., 8024–8035.

Pritchard, M. S., and Coauthors, 2014: The super-parameterized CAM: Evaluation and intercomparison with observations and conventional parameterizations. J. Climate, 27, 8717–8738, https://doi.org/10.1175/JCLI-D-13-00614.1.

Rasp, S., 2019: ClimaX: A library to benchmark machine learning climate models. J. Open Source Software, 4, 1893, https://doi.org/10.21105/joss.01893.

——, M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 9684–9689, https://doi.org/10.1073/pnas.1808204115.

Rushley, S., B. J. Hoskins, and R. Plant, 2018: On the observed relationship between mid-tropospheric humidity and precipitation over tropical oceans. Quart. J. Roy. Meteor. Soc., 144, 2584–2595, https://doi.org/10.1002/qj.3374.

Samek, W., T. Wiegand, and K.-R. Müller, 2017: Explainable artificial intelligence: Opening the black box. Inf. Fusion, 46, 70–85, https://doi.org/10.1016/j.inffus.2017.10.007.

Schneider, T., P. A. O’Gorman, and X. Levine, 2017: An emergent constraint on climate sensitivity deduced from cloud observations. Nature Climate Change, 7, 935–942, https://doi.org/10.1038/nclimate3414.

Sobel, A. H., and E. D. Maloney, 2013: Moisture modes in the MJO and other tropical disturbances. J. Atmos. Sci., 70, 2213–2232, https://doi.org/10.1175/JAS-D-12-0287.1.

Toms, B. A., V. Lakshmanan, K. L. Elmore, and A. McGovern, 2019: Applications of explainable artificial intelligence in weather forecasting. Bull. Amer. Meteor. Soc., 100, 2293–2300, https://doi.org/10.1175/BAMS-D-18-0197.1.

Wood, R., and C. S. Bretherton, 2006: Diurnal cycle of marine stratocumulus sheets. J. Atmos. Sci., 63, 3549–3562, https://doi.org/10.1175/JAS3804.1.

Brenowitz, N., and A. Kwa, 2020: UWNET v0.1.0. Zenodo. https://doi.org/10.5281/zenodo.3908489.