CDAnet: A Physics‐Informed Deep Neural Network for Downscaling Fluid Flows

Generating high‐resolution flow fields is of paramount importance for various applications in engineering and climate sciences. This is typically achieved by solving the governing dynamical equations on high‐resolution meshes, suitably nudged toward available coarse‐scale data. To alleviate the computational cost of such downscaling process, we develop a physics‐informed deep neural network (PI‐DNN) that mimics the mapping of coarse‐scale information into their fine‐scale counterparts of continuous data assimilation (CDA). Specifically, the PI‐DNN is trained within the theoretical framework described by Foias et al. (2014, https://doi.org/10.1070/rm2014v069n02abeh004891) to generate a surrogate of the theorized determining form map from the coarse‐resolution data to the fine‐resolution solution. We demonstrate the PI‐DNN methodology through application to 2D Rayleigh‐Bénard convection, and assess its performance by contrasting its predictions against those obtained by dynamical downscaling using CDA. The analysis suggests that the surrogate is constrained by similar conditions, in terms of spatio‐temporal resolution of the input, as the ones required by the theoretical determining form map. The numerical results also suggest that the surrogate's downscaled fields are of comparable accuracy to those obtained by dynamically downscaling using CDA. Consistent with the analysis of Farhat, Jolly, and Titi (2015, https://doi.org/10.48550/arxiv.1410.176), temperature observations are not needed for the PI‐DNN to predict the fine‐scale velocity, pressure and temperature fields.

2 of 22 means of using coarse information of the variables of a dynamical system to predict their higher-resolution counterparts (D. Chen et al., 2006). Downscaling methodologies are split into two categories: statistical and dynamical downscaling. Statistical downscaling extracts a statistical relation between the coarse information and their high-resolution counterpart (R. L. Wilby et al., 1998). These techniques were used, for instance, for downscaling regional climate (Laflamme et al., 2016) and ocean wave fields (Camus et al., 2014). On the other hand, dynamical downscaling generates high-resolution fields by constraining a dynamical model with the coarse-scale information (R. Wilby & Wigley, 1997). Nudging techniques are popular dynamical downscaling methods, where the solution of a dynamical system is nudged toward the information, point-by-point in grid nudging and low-frequencies to low-frequencies in spectral nudging (Altaf et al., 2017). Nudging techniques are commonly utilized for predicting turbulent unsteady flows (Zauner et al., 2022), and downscaling atmospheric fields (Liu et al., 2012), to name a few.
Continuous Data Assimilation (CDA) was introduced as an alternative dynamical downscaling algorithm to construct an increasingly accurate description of the high-resolution solution of a dynamical system (Azouani et al., 2014). Specifically, CDA introduces a source term that continuously nudges the coarse-scales of the governing equations toward those of the coarse-scale information. The source term constrains the large features by comparing the coarse-data of the system state variables to the corresponding large-scales from the solution of the governing equations. CDA theorizes the existence of a map, called determining form map, from the coarse-scale data of the system state to the fine-scale counterpart (Biswas et al., 2019;Foias et al., 2012Foias et al., , 2014Foias et al., , 2017. CDA has been theoretically analyzed for a variety of dynamical systems including the 2D Rayleigh-Bénard (RB) convection (Farhat, Lunasin, & Titi, 2015;Farhat et al., 2017), surface quasi-geostrophic equations (Jolly et al., 2019), and 3D planetary geostrophic model (Farhat et al., 2016), to name a few. CDA has also been examined numerically in applications to downscale the 2D Navier-Stokes (Gesho et al., 2016), 2D RB convection (Altaf et al., 2017;Farhat et al., 2018), and recently Global Circulation Models (Desamsetti et al., 2019(Desamsetti et al., , 2022. A surrogate model of the determining form map could serve as a viable solution to alleviate the computational burdens of dynamically downscaling coarse-scale data of the system state by solving the equations on high-resolution meshes. Deep neural networks are capable of learning complex relations between input and output fields (Goodfellow et al., 2016), and are also generally cheaper to evaluate, once trained (Gottwald & Reich, 2021). Training such networks has been made more efficient with the development of tools such as PyTorch (Paszke et al., 2019a(Paszke et al., , 2019b and TensorFlow (Abadi et al., 2016;TensorFlow Developers, 2022), which harness the computational power of graphics processing units (GPUs) (Vuduc et al., 2010). This enabled deploying deep learning techniques in a variety of applications including autonomously navigating drones (Lee et al., 2021), drug discovery (K. Huang et al., 2021) and classifying crops from satellite images (Kussul et al., 2017). The predictions of a neural network can be further constrained by the underlying physical equations in the loss function (Bihlo & Popovych, 2022;Raissi et al., 2019). Specifically, the total loss function used to construct these so-called physics-informed networks can be designed to include the norm of the residual of the governing equations, initial and/or boundary conditions. An overview of various applications of physics-informed neural networks can be found in Cai et al. (2021) and Karniadakis et al. (2021). Recently, physics-informed neural networks were used in computational fluid dynamic applications (Jiang et al., 2020a;Wang et al., 2020), subgrid scale parametrization of chemically reactive flows (Bode et al., 2021), as well as weather and climate applications (Kashinath et al., 2021;Mooers et al., 2021).
Inspired by Azouani et al. (2014), Foias et al. (2014), and Jiang et al. (2020a), we propose a physics-informed deep neural network (PI-DNN), called CDAnet, that learns the determining form map that was theoretically described and established by Foias et al. (2012Foias et al. ( , 2014Foias et al. ( , 2017 for downscaling dynamical fluid flows. The PI-DNN outputs are constrained with the physics by following the approach introduced by Raissi et al. (2019). In addition, CDAnet is trained for the conditions set by Azouani et al. (2014) and Foias et al. (2014Foias et al. ( , 2017 for the existence of the determining form map, where downscaling is limited by the spatio-temporal resolution of the coarse-scale data. Some work has been carried out in the literature to address the presented issues. This is the first work that learns the determining form map as theorized and established by CDA. Specifically, purely data-driven approaches based on super-resolution were applied for enhancing the resolution of coarse information without constraining the solution by the physics (e.g., Vaughan et al. (2021)). Similar work was introduced for data compression, where the training and validation of the network were conducted using the same data set, comprising a single or multiple solution trajectories (Jiang et al., 2020a). The proposed neural network was implemented to learn the CDA determining form map, from coarse-scale data of the RB convection model at a fixed Rayleigh (Ra) number HAMMOUD ET AL.

10.1029/2022MS003051
3 of 22 to its high-resolution counterpart. We emphasize that our objective differs from that in Bode et al. (2021), who aimed at learning a subgrid-scale parametrization, and from that in Jiang et al. (2020a), who essentially focused on data compression. Specifically, our goal is to determine a neural network approximation of the determining form map, and assess the performance of this approximation in downscaling a history of coarse-scale observations that have not been considered in either training or in validation. CDAnet was trained to generalize different solution trajectories of the RB model by evaluating the trained model on solution trajectories that are different than that from training. In addition, being a surrogate model of this determining form map, the neural network is constrained by theoretical bounds on the extents at which downscaling could be performed, namely, the spatial and temporal resolutions of the observed data.
The performance of the proposed PI-DNN is assessed for RB flows at several Rayleigh numbers, and for different spatio-temporal resolutions of the coarse-scale information. The proposed network consists of a U-net to extract spatio-temporal features from the input (Ronneberger et al., 2015a) and a multi-layer perceptron (MLP) to predict the system states at continuous spatio-temporal coordinates. The network was trained by minimizing a compound loss function composed of a weighted sum of a regression loss and a partial differential equation (PDE) loss, which constrains the predicted solutions using the actual governing equations (Cai et al., 2021;Raissi et al., 2019).
PI-DNN models were trained to downscale the RB convection for a variety of spatio-temporal resolutions of the input. We rely on datasets corresponding to different Ra numbers to examine the impact of chaotic dynamics on downscaling and its limitations in terms of the spatial and temporal resolutions of the input. A well-trained model was also used to downscale RB flows at Ra numbers different than the one trained on to assess the robustness of the proposed model. The outputs of the proposed PI-DNN were compared to the dynamically downscaled solution by CDA for out-of-distribution (OOD) samples, which are samples not seen during training by evaluating a long-term history of coarse-scale data. The framework is also tested to downscale the RB convection without using temperature information as theoretically proven by Farhat, Jolly, and Titi (2015). Numerical experiments suggest that the trained model serves as a surrogate of the determining form map, where a well-trained model is able to mimic numerically downscaling using CDA with an additional error within 4%.
The study is organized as follows. Section 2 describes the RB convection, CDA downscaling, numerical discretization of the governing equations and the datasets used for training and validation. Section 3 presents the architecture of the PI-DNN along with the training procedure. The numerical experiments are outlined and accompanied with a thorough analysis in Section 4. Finally, Section 5 wraps up by drawing conclusions on the main results and presents possible future work.

Rayleigh-Bénard Convection
Consider a Newtonian fluid with kinematic viscosity , thermal expansion coefficient and thermal diffusivity enclosed in a 2D periodic channel of horizontal period ̃ , and vertical height ̃ . Dimensional quantities are denoted using tildes. The flow is driven by buoyancy effects induced by a temperature gradient between a hot bottom plate and a cold top plate where the gravitational acceleration, g, points downwards (Pandey et al., 2018). Applying the Boussinesq approximation, and the suitable scaling to the governing equations, the RB equations can be expressed as: where = ( ΔΘ 3 ) ( ) −1 is the Rayleigh number, ΔΘ is the absolute temperature difference between the plates, Pr = νκ −1 is the Prandtl number, t is time and e y is the unit vector in vertical direction. The governing equations are solved for the temperature T, pressure p, horizontal velocity u and vertical velocity v.

of 22
In the present work, the RB governing equations are solved in the 2D domain Ω = [0, ] × [0, 1] , subject to horizontal periodic boundary conditions with period L x , no slip and fixed temperature conditions at the bottom and top boundaries, that is, The system is initialized using random velocity and temperature fields drawn from independent, spatially-uncorrelated, uniform distributions, specifically according to Here,  ( ) refers to a uniform distribution on the interval (a, b). In order to generate a diverse data set, the system of equations was solved several times, each starting from a different realization of the initial conditions, leading to different trajectories of the system.

Continuous Data Assimilation
CDA is a dynamical downscaling algorithm that was introduced by Azouani et al. (2014). It consists in introducing a spatial feedback nudging term in the governing equations that depends on the difference between the large spatial scales of the high-resolution solution and their corresponding spatial scales from the coarse data of the system state. CDA was shown theoretically (Azouani et al., 2014;Farhat, Jolly, & Titi, 2015) and numerically (Altaf et al., 2017;Desamsetti et al., 2019;Gesho et al., 2016) to converge exponentially in time to the unknown reference solution, of which only sparse measurements are available. The solution converges when the nudging term is adequately scaled and sufficient observations, which are available in the form of coarse-scale information in this study, of the system states are provided. In this section, a brief outline of the CDA methodology for the RB system is presented.
Consider the spatial interpolation operator ℎ of an appropriate interpolant function ϕ operating on a uniform observation grid of size , where, x = (x, y), Q k are disjoint subsets such that diam(Q k ) ≤ for = 1, … , ℎ , ℎ is the number of observation points, ⋃ ℎ =1 = Ω , x k ∈ Q k , and χ S is the indicator function of the set S. Let = (̃̃) denote the velocity vector of the CDA (downscaled) solution, Ψ denote the downscaled temperature, and ϱ the pressure. For the RB system, the governing equations for the CDA-downscaled solution are expressed as: where, μ T and μ u are strictly positive constants called the nudging parameters and the superscript "o" represents an observed quantity. In this study, the nudging factors were selected to be equal, that is, μ T = μ u = μ.
The CDA algorithm, which was first introduced for the Navier-Stokes equations as a paradigm, guarantees that the downscaled solution converges to the true solution regardless of the selected initial conditions when sufficiently enough observations of the system states are provided and the nudging factors are appropriately chosen (Azouani et al., 2014). It was also proven that there exists a unique map, called the determining form map, that maps the coarse observations to the high-resolution solution. Extending this work, Farhat, Jolly, and Titi (2015) theoretically demonstrated that when CDA is applied to the RB system, the downscaled solution converges to the reference solution regardless whether temperature observations are omitted or not. The goal of this work is to construct an approximate surrogate of the determining form map by suitably training a PI-DNN.

of 22
CDA was implemented to numerically downscale the RB system of equations (Altaf et al., 2017). Numerically, the downscaled solution was shown to converge to the reference solution exponentially in time provided sufficient observations of the system state that are sufficiently spaced apart both in space and time, and the nudging factor is adequately chosen. In addition, Altaf et al. (2017) reported that the CDA algorithm is not sensitive to the choice of the interpolation function where different interpolants resulted in a similar convergence rate and asymptotic errors. To this end, a limited number of interpolation methods were tested, and the nearest neighbor interpolation was selected due to its efficient computational properties.

Numerical Solution
The RB and corresponding CDA equations are solved using a finite difference methodology on a uniformly-spaced staggered Cartesian coordinate grid. The grid has n x and n y points in the x and y directions, respectively, yielding cells with Δx = L x /n x and height Δy = 1/n y . The governing equations are spatially discretized using a second-order central differencing scheme. The time integration is conducted using the third-order Adam-Bashforth scheme with a fixed time step, Δt. A pressure projection scheme (Chorin, 1968) is used to satisfy the continuity equation, and a fast Fourier transform algorithm is employed to solve the corresponding elliptic equation for pressure.
The RB equations are solved over a 2D periodic channel with a 3:1 aspect ratio, that is, L x = 3 and L y = 1. The fundamental domain Ω is discretized using a computational grid with n x = 768 and n y = 256. Time marching was conducted using a time step Δt = 5 × 10 −4 , and the simulations are performed up to a final time t f = 60. We consider a fixed Prandtl number of 0.7, and three different Rayleigh numbers, Ra = 10 5 , 10 6 , and 10 7 , allowing us to observe transitional and chaotic flow regimes (Lohse & Xia, 2010;Plumley et al., 2016). For all the selected parameters, the maximum grid Reynolds number is approximately  (10) , and the peak Courant-Frechet-Levy (CFL) number remains below 0.15. The numerical simulations are thus considered to be sufficiently resolved in space and time, and thus suitable for the purpose of training and validation.

Training and Validation Data Sets
The governing Equations 1-3 were solved 25 times for each of the aforementioned Ra numbers, each initialized by a unique random field. Obtaining multiple solution trajectories enriches the datasets with a variety of unique features, which helps the deep neural network to have better generalizability (Jan et al., 2019;Paschali et al., 2018). From the solution trajectory described in the previous section, the solution corresponding to t ∈ [25, 45] at time increments of δt were considered for training and validation. Specifically, δt was selected to be 0.1 for the flow with Ra = 10 5 , and 0.05 for flows at Ra numbers of 10 6 and 10 7 . This selection would first remove the initial transient corresponding to the noisy field and second leaves instances outside the training to examine the performance of the trained neural network at downscaling OOD samples. The training data set corresponding to each Ra number consists of 20 model runs while the validation set is constructed using the remaining five runs.
A supervised training approach is adopted for training the PI-DNN, which requires input-output pairs to do so. The training pairs consist of clips defined by a sequence of snapshots of the solution fields, where the input clip is composed of a sequence of coarse spatio-temporal snapshots of the solution fields, while the output clip corresponds to the high-resolution counterpart. To generate these training pairs, randomly sampled clips are first extracted from the available datasets by selecting crops of size [ ] , where n t is the clip length, n x,c and n y,c are the number of points in the crops along the horizontal and vertical dimensions, respectively. The sampled clips are then coarsened in both space and time to generate the input clips to the network. Denote by  the spatial downscaling factor, where the coarse input clip is constructed by choosing every  grid point in space. Similarly, denote by  the temporal downscaling factor where the coarse input clip is constructed by selecting one frame every  snapshots in time. Therefore, the sampled clips would consist of a low-resolution input that is  times smaller in space and  times smaller in time than the high-resolution truth, which is used to supervise the PI-DNN during training. The coarse input clip is interpolated spatially using the ℎ function propagating it through the network, which is the first link to the CDA theory.
Remarks. As outlined above, our approach to training the neural network consists in drawing training data from different trajectories than those used to provide the validation data. This differs from the framework of Jiang et al. (2020a), who focused on data compression and relied on a single data set, comprising a single or several solution trajectories, for training and validation.

of 22
Finally, we emphasize that the low-resolution data considered as input to the neural network is synthesized so as to mimic the spatially-sparse (local) measurements typically encountered in environmental flow applications. This differs from the setting of some LES applications, where coarse (spatially-filtered) LES data is used as input (e.g., Bode et al. (2021)). As noted earlier, the use of local measurements simultaneously places us in the framework of observations assimilation and downscaling, and within the scope of a well-established theory (Azouani et al., 2014;Cao et al., 2022;Foias et al., 2012Foias et al., , 2014.

CDAnet
The PI-DNN, CDAnet, is composed of two-stages, a features extractor that precedes an MLP. The low-resolution input is first input to a 3D U-net that extracts spatio-temporal features yielding features vectors at each coarse-spatio-temporal grid point. The features vector is then combined with the spatio-temporal coordinates by concatenation, and input to the MLP to predict the high-resolution output and to compute the necessary partial derivatives needed to approximate the residual of the PDE. The features extractor is first described in Section 3.1 where the adopted architecture is explained. The details of the architecture of the MLP are then presented in Section 3.2. Figure 1 outlines the training pipeline accompanied by a descriptive illustration of the PI-DNN detailing the architecture of each of the two stages involved. The total loss, consisting of a linear combination of the regression and PDE loss terms, is discussed in Section 3.3.

Feature Extraction
The input to the PI-DNN is a low-resolution clip consisting of sparse snapshots of the evolution of the solution fields. First, the clips are propagated through a modified U-net, which is a convolutional neural network that is used to extract the spatial and temporal features of the input. Originally, the U-net was first employed for semantic segmentation of medical images (Ronneberger et al., 2015a(Ronneberger et al., , 2015b. The utility of this convolution network was later extended on to include computer vision tasks such as classification (Alom et al., 2019), human pose estimation (Newell et al., 2016) and image denoising (Komatsu & Gonsalves, 2020). For the application at hand, the input is 3-dimensional in ( ) with pressure, temperature and two velocity components treated as different channels. This requires modifying the original U-net that takes 2-dimensional, RGB or gray-scale, images as input by replacing the 2D convolution with their 3D counterparts, thus, extracting features in both space and time.
In addition to the 3D convolution operations, we use Inception-ResNet blocks (Szegedy et al., 2017) to first improve the gradient flow for enhanced training properties (He et al., 2016) and second to have a larger receptive field (Szegedy et al., 2015). We tested Residual (He et al., 2016) and Dense connections (G. Huang et al., 2017), however, the Inception-ResNet blocks were found to provide the best compromise between memory requirement and model accuracy.
In similar fashion to the original U-net, the proposed feature extractor consists of an encoder-decoder setup, where the encoder is composed of several stages of convolutional Inception-ResNet blocks, succeeded by a max-pooling layer with a stride equal to 2. The Inception-ResNet blocks were designed to comprise of three convolution branches with kernel sizes of [1 × 1 × 1], [5 × 5 × 3], and [9 × 9 × 5], respectively. The outputs of the Inception-ResNet are first concatenated and then linked to the input of the convolution block via a Res-connection, as proposed by Szegedy et al. (2017). A batch normalization (BatchNorm) layer follows each convolution layer, which in turn is succeeded by a rectified linear activation unit activation function.
Zero-padding was adopted so that the input and output of each convolution layer maintain the same size spatially and temporally. The decoder is constructed to mirror the encoder with nearest neighbor upsampling replacing the max-pooling operations. Finally, the U-net skip connections link the features from each level of the encoder to its counterpart from the decoder. These skip connections were shown to improve training by promoting a stronger gradient flow and maintaining information from previous layers (He et al., 2016;Ronneberger et al., 2015a).

Multi-Layer Perceptron Network
The next stage of the proposed PI-DNN consists of an MLP, based on the architecture originally proposed by Z. Zhang (2019a, 2019b) for representation learning and shape generation. In this study, however, the features extracted by the 3D U-net are input along with the spatio-temporal coordinates to the network. Moreover, 7 of 22 The low-resolution clip is first fed to a 3D U-net that outputs a latent grid of spatio-temporal features. The features are concatenated and passed to an MLP that predicts the high-resolution system states. The network is supervised by high-resolution ground truth clip, which is compared with the predicted clip via a regression loss (  ) . A partial differential equation (PDE) loss ( ) is used to enforce the physics described by the governing equations. (b) Illustration of the architecture of CDAnet, showing the 3D U-net, the Inception-ResNet blocks and the MLP.

of 22
this input vector is concatenated with each of the hidden layers of the network, maintaining the information throughout the network. The resulting MLP outputs the temperature, pressure and two velocity components corresponding to the input coordinates, thus yielding a "continuous" prediction of the high-resolution fields. Specifically, the MLP predicts the system state for the input spatio-temporal coordinates within the domain. The MLP predictions are performed at the spatio-temporal coordinates described by the reference solution grid used to supervise the training, where the predicted and reference solutions should lie on the same grid to compute the error metrics suitably.
The MLP facilitates constraining the predicted solution to a physically-consistent one by including of a soft constraint in the loss function. Considering the spatio-temporal coordinates as input to the MLP enables computing the partial derivatives of the four output variables with regards to the spatial and temporal coordinates. This methodology was first developed by Raissi et al. (2019) who proposed using backward propagation to approximate the partial derivatives. The physics of the system can then be used to constrain the network's prediction by including the norm of the residual of the PDEs as an additional loss term in the total loss function. Several activation functions were examined, however the infinitely differentiable Softplus nonlinearity was chosen for the MLP because of its reasonable memory requirement and high prediction accuracy.

Loss Functions
To meet the objectives of downscaling while enforcing the dynamics, the PI-DNN is trained by minimizing a total loss function  consisting of a linear combination of the regression loss ( ) and the PDE loss ( ) .  measures how far the prediction is from the truth, while  measures how well the prediction satisfies the physical constraints. The total loss function is expressed as: where y true = {T, p, u, v} is the true solution of the system, ̂ is the network's predicted solution, ‖⋅‖ p denotes the p-norm, λ is a positive constant scaling the PDE loss, and Γ̂− is the residual of the system of governing equations (Equations 1-3) with Γ the discrete differential operator and f is the discretized forcing term.
Computing  at all spatio-temporal grid points of the high-resolution grid is computationally prohibitive because a computational graph needs to be constructed for each point. To alleviate these burdens, graphs are constructed for a limited number of points where  is evaluated. Specifically, a total of 3,000 points are selected from the predicted clip having a spatial distance of . We note that the value of λ, which reflects a trade-off between exact reconstruction of the training data and dynamical consistency, was observed to have a significant impact on network training and performance. In particular, the impact of λ becomes more pronounced at larger Ra and for coarser resolution inputs. Specifically, a large value of λ would lead to the model overfitting to noise, whereas a small value leads to poor overall reconstruction accuracy in comparison to a model trained with the tuned value of λ. It is noteworthy to mention that this experience is different than that of Jiang et al. (2020a), where the trade-off parameter did not noticeably impact the accuracy of the trained models. This may be due to the fact that in the present study the validation and evaluation rely on a large number of OOD samples, whereas Jiang et al. (2020a) rely on a single data set of finite-horizon trajectories.

Evaluation Metrics
The trained models are assessed based on their performance on evaluation metrics quantifying the disagreement between the true and predicted output fields. Specifically, these metrics include the ℓ 1 error, ℓ 2 error, and normalized ℓ 2 error given by relative root-mean-squared error (RRMSE). In this manuscript, only results based on the RRMSE are presented since it was used to select the best model. The RRMSE is expressed as: 9 of 22 where g is a generic true field, and is its predicted counterpart. The error metric is computed on the high-resolution grid, for each solution variable independently of the other variables. These metrics are averaged over Ω, the time span of the output and the evaluation batch.

Training Setup
CDAnet is trained for 100 epoch, each consisting of 3,000 randomly-sampled clips totaling to 300,000 instances for training. Models are trained for the three datasets described in Section 2.3 and for different combinations of  and  . The input clips are made up of eight snapshots with a spatial dimension of . The predicted clips have a spatial dimension of [256,256] and a temporal dimension of 8 ×  for memory-efficient training. After training, the model is evaluated on the entire domain, that is, output arrays of spatial size [768,256]. Results in the following section are presented for evaluating the model on an entire history of coarse-scale data to yield forecasts of the high-resolution counterpart both in space and time.
A stochastic gradient descent (SGD) algorithm was selected for minimizing  . The SGD algorithm was tuned to have a momentum of 0.9 and weight decay of 10 −4 . The ℓ 1 -norm was used for both terms of  , however, the best model was selected based on the average RRMSE across all solution variables. The reconstruction loss was computed over the entire high-resolution grid, in contrast to the random sampling approach presented in meshfreeFlow-net (Jiang et al., 2020a(Jiang et al., , 2020b, designed to test recovery of a particular solution trajectory at arbitrary space-time coordinates. The PDE loss term was computed for 3,000 randomly selected spatio-temporal grid points along the high-resolution prediction, where the number of points is limited by the available GPU memory. For each combination of (,  ) and each data set, a parameter search over the learning rate (LR) and λ was conducted. Specifically, λ was varied logarithmically within {0.001, 0.01, 0.1} and LR within {0.01, 0.05, 0.1, 0.15, 0.2, 0.25} when constructing the surrogate of the determining form map. The values for λ were selected in this range to make sure that the two loss terms have a similar magnitude. A model was trained for each λ and LR combination mentioned and the RRMSEs corresponding to each of these trained models are analyzed. The value of λ could then be fine-tuned to obtain a trained model with optimal RRMSE values. The best model for which the results are shown corresponds to that with the lowest averaged RRMSE, where the averaging is performed over all the solution variables and 3,000 randomly selected clips drawn from the validation data set. During training, the LR was scaled by a factor of 0.1 if no new best model was achieved within 10 consecutive training epochs to enable taking finer optimization steps.

Remark.
The results presented in the following section rely on an evaluation strategy having the following features. For each Rayleigh number considered, an evaluation data set consisting of one solution trajectory is used, and the entire history of coarse-scale information spanning the domain is lifted using the PI-DNN to obtain high-resolution fields. Note that the evaluation trajectory is independent of the data sets used in training and validation. This enables us to test the predictive capabilities of CDAnet, and provides a fair setting for contrasting its performance with that of CDA.

Results
CDAnet is systematically analyzed by conducting a number of numerical experiments to examine its performance in downscaling the RB flow at various Ra numbers, and for a number of (,  ) combinations. First, the RRMSE of the best trained models are presented for different Ra numbers and combinations of (,  ) . The results of the trained PI-DNN are then compared to those dynamically downscaled using CDA. One of the trained model is then evaluated on datasets corresponding to flows with Ra numbers different than the one trained on. Finally, the framework was tested for downscaling coarse-scale velocity and pressure observations only, which required slightly modifying the network.

Learning the Determining Form
The primary objective of this study is to construct a surrogate of the determining form map theorized in Cao et al. (2022) and Foias et al. (2012Foias et al. ( , 2014Foias et al. ( , 2017, which maps the coarse-scale observations of the system state to the higher-resolution counterpart. To this end, the proposed PI-DNN in Section 3 is trained using the datasets described in Section 2.3. Several networks are trained for different combinations of downscaling parameters HAMMOUD ET AL.

10.1029/2022MS003051
10 of 22 (,  ) . In particular, a grid search for the optimal ( ) pair is conducted for each data set and for each (,  ) . This enables a thorough analysis of the proposed methodology across the downscaling resolutions and different Ra numbers.  Figure 2 illustrates the evolution of the RRMSE of CDAnet for RB flow with Ra = 10 5 . Predictions of the high-resolution solution variables are performed using (,  ) = (4, 4) and for observations of the system state at δt = 0.1 starting from the initial random field and till t = 60. The figure shows that CDAnet is not able to predict the fine-scale solution during the initial transient of the flow field, which is not surprising because the simulations are initialized from a random state that does not lie on the attractor. However, once a fully-developed flow regime is reached, CDAnet is able to accurately predict the fine-scale solution. This illustrates the fact that, similar to CDA, CDAnet can be initialized from an arbitrary state and low prediction errors are achieved when the flow is developed. Note that the RRMSE of all the variables oscillates about an error level that corresponds to that listed in Table 1. In addition, the red dots resemble the RRMSE at the last frame of the predicted clip, which appear to be at the minimum RRMSE. This indicates that CDAnet is a surrogate of the determining form map because it was able to downscale coarse-scale observation that were not accessible during training. Table 2 presents the error metrics along with the training and validation losses corresponding to the best trained networks for different combinations of (,  ) for the Ra = 10 6 data set. The trained models achieve RRMSE values below 3% for all the considered (,  ) pairs. Similarly to Ra = 10 5 case, the RRMSE increases linearly with  for a fixed value of  . For a fixed  , however, the RRMSE increases with  with no particular relation being observed. We note that beyond  = 8 , the CDA downscaled fields diverge away from the reference solution because the spatial resolution is not within the conditions set by the theory. This was numerically validated by solving the CDA-based RB equations with coarse-scale observations with  = 8 , where the RRMSE of the CDA solution fails to drop below 50%. Similarly, the lowest RRMSEs achieved by CDAnet for the same downscaling parameters were slightly above 15% for all solution variables. Finally, Table 3 lists the optimal ( ) pair corresponding to each (,  ) combination for downscaling the flow at Ra = 10 7 accompanied with the RRMSEs of the solution variables and the training and validation losses.

Table 1
Performance of CDAnet for Ra = 10 5 Figure 2. Evolution of the RRMSE of the predicted (a) temperature, (b) u-velocity, and (c) v-velocity fields for (,  ) = (4, 4) . The RRMSE of the predicted fields decreases as time progresses, reaching a mean error level that corresponds to the average RRMSE obtained during training. The insets provide enlarged views of RRMSE at large time. The blue curve represents the RRMSE at every predicted high-resolution frame, whereas the red circles represent the RRMSE of the last predicted frame in the evaluation clip.

of 22
Similarly to the Ra = 10 6 case, it is only feasible to downscale the coarse-scale observations using CDAnet and CDA for the spatial resolution presented, that is,  = 2 . For larger  values, the spatial resolution is too coarse for the downscaled solution to converge to the reference solution. For this spatial resolution, the RRMSE of the temperature field roughly doubles as  is doubled from 2 to 4, and approximately triples when  doubles from 4 to 8. This indicates that for  ≥ 8 , the temporal resolution is coarse, leading to a rapid increase in RRMSEs. Lower prediction RRMSEs might be achieved if additional data were employed during training, however, this is limited by the available memory (Soekhoe et al., 2016). A lower RRMSE may also be obtained by using a deeper and/or wider network with more features, however, a larger network is prone to overfitting (Srivastava et al., 2014) and is limited by the available computational resources (Thompson et al., 2020).

Comparison With CDA
The proposed network is trained as means to build a surrogate model of the determining form map inspired by Azouani et al. (2014) and theoretically outlined by Foias et al. (2014), Biswas et al. (2019), and Cao et al. (2022). Essentially, this surrogate model is expected to follow similar theoretical error bounds as the determining form map. First, the network was shown to converge with high-accuracy only when the dynamically downscaling via CDA does so. Specifically, only under the necessary conditions that are outlined in the theory on the spatial and temporal resolutions of the observation would the network be able to downscale observations of the RB flow at similar Ra numbers. In addition, there is a particular difference between CDA and the surrogate model in the sense that CDA constructs an increasingly accurate high-resolution solution of the system state. The proposed network, however, is a surrogate of the determining form map, and is a map from the coarse observations to the fine solution (i.e., no time dependence to improve the estimate).
The performance of CDAnet is compared against that of dynamically downscaling using CDA in a number of ways. First, as indicated in Figure 2, the lowest RRMSE is generally achieved at the last frame of the predicted clip, which is similar to CDA in constructing an increasingly accurate representation of the reference solution. Moreover, as eluded to in Section 4.1, the constraints on the spatio-temporal resolution of the observations, for Note. The PI-DNN is evaluated on a data set of RB convection at Ra = 10 6 and δt = 0.05. The RRMSE of the solution variables are reported along with the training and validation losses for different combinations of  and  .  a fixed Ra number, of the system state for the downscaled solution to accurately represent the reference solution are similar for both CDA and the surrogate CDAnet. Specifically, for observational data that is too coarse (i.e., beyond the conditions of the theory), the CDA-downscaled solution diverges from the reference solution. Similarly, for the same conditions, RRMSEs above 20% are obtained with a trained CDAnet. Figure 3 illustrates plots of the evolution of the RRMSE of the temperature solution in time for different combinations of (,  ) ; selected snapshots of the temperature fields downscaled using CDA and CDAnet are contrasted in Figure 4. RRMSE curves obtained by dynamically downscaling coarse observations using CDA are compared to those obtained by evaluating the trained deep neural network. In particular, we notice that both CDA and CDAnet are oscillatory in time, however, the RRMSEs corresponding to CDA decrease with time while those corresponding to CDAnet oscillate with an almost uniform amplitude. In addition, the CDAnet RRMSE curves are almost always higher in value than the CDA curves, which is an expected behavior being a surrogate of the determining form map. The CDA RRMSE curves decrease exponentially in time asymptoting to zero error as opposed to the CDAnet RRMSEs, which remain about the RRMSEs indicated in Table 2.

Generalizability to Flows at Different Ra
In this section, we investigate whether a network trained at a specific Ra number can perform well at downscaling observations of the RB flow at a different Ra number. Such generalizability would be particularly beneficial as it would help avoid computational overhead required to perform independent training for different Ra values. Another consideration is that in realistic settings, the "effective" Ra and Pr numbers might not be precisely known. We investigate these questions by analyzing the performance of CDAnet trained using the Ra = 10 6 data set and downscaling parameters (,  ) = (4, 4) , and evaluating the trained model on datasets with the following Ra numbers {10 5 , 7 × 10 5 , 8 × 10 5 , 9 × 10 5 , 1.1 × 10 6 , 1.2 × 10 6 , 1.5 × 10 6 , 2 × 10 6 , 3 × 10 6 , 5 × 10 6 , 1 × 10 7 }. Although this is a wide range of Ra numbers, we are particularly interested in the generalizability of a network trained at a given Ra number to neighboring Ra values. Note that the present approach differs from the experiment conducted by Jiang et al. (2020a) in which a single data set, comprising 10 solution trajectories having different  Ra numbers, was used for training and validation. Specifically, in the present case, the trained model is evaluated on unseen flow regime, whereas no OOD samples were considered in the analysis of Jiang et al. (2020a). Figure 5 shows curves of the evolution of the temperature RRMSE, for the aforementioned Ra numbers, in time. The figure portrays RRMSE curves that drop rapidly after the initial transient to an error plateau. The height of the RRMSE plateau is comparable to that obtained with training for all considered Ra numbers. The RRMSE plateau's height is slightly lower when the Ra number of the flow is smaller than the one used for training, but is noticeably higher when CDAnet is evaluated for flows at higher Ra numbers. This is not surprising as one would anticipate higher energy in the small scales as the Ra number increases. For all Ra numbers up to 2 × 10 6 , the RRMSE curves are within 1% of the RRMSE corresponding to downscaling a flow with the same Ra number as the one trained on. As Ra increases beyond 2 × 10 6 , the RRMSE of the downscaled fields increases, and the oscillations of the RRMSE curves increase as well. Nevertheless, for all cases, the RRMSE plateau's height remains below 4%, suggesting that networks trained at a fixed Ra value continue to perform well for neighboring Ra values, with only a slight increase in RRMSE values.

Downscaling Without Temperature Observations
CDA was proven to be able to downscale the RB convection even if temperature observations were not available (Farhat, Jolly, & Titi, 2015;Farhat, Lunasin, & Titi, 2015). To ensure convergence in this setting, the theory requires stricter conditions on the spatial resolution of the observations (i.e., finer observations) and a smaller nudging factor. This leads us to consider a more complex network with a larger learning capacity to determine the more sophisticated determining form map, namely, from coarse-partial state data to high-resolution full-state prediction. Specifically, we double the number of features extracted by the U-net and those input to the MLP to construct a surrogate model that achieves adequate error levels. This also amounts to doubling the number of neurons in the U-net, and adding a hidden layer to the MLP. The input to the U-net thus consists of a low-resolution 3D clip with three channels corresponding to the pressure and both velocity components. Table 4 presents the RRMSE of the solution variables along with the training and validation losses for the best trained model with their respective optimal ( ) combinations. The table shows that for all (,  ) pairs, both c and LR are smaller than the case when all solution variables are observed at a coarse-scale. In addition, the results suggest that the RRMSE of the temperature field is approximately 2% larger than the case when the full state is observed. Figure 6 illustrates snapshots of the input, true and predicted solution fields for the case of (,  ) = (16,8) . The snapshots indicate that the predicted solutions are qualitatively similar to the true solutions for all time steps. Figure 7 presents the evolution of the RRMSE of the temperature and velocity components in time. All three curves decrease as time evolves till reaching a plateau around which the error oscillates. The plots also indicate that although the training data set contained clips of the fields for ∈ [20,40] , the trained PI-DNN was capable of generalizing to OOD clips during evaluation.

Downscaling Using Temperature Only: A Counter Example
In the literature, researchers have tried to examine the use of temperature information only to recover the full system state (Agasthya et al., 2022;Charney et al., 1969). This approach, however, is not supported by theory (Farhat, Jolly, & Titi, 2015), and computational evidence (Altaf et al., 2017) provides a concrete example showing divergence of the downscaled fields from the true system state.
In this section, we examine the behavior of CDA and CDAnet in this context, namely when only temperature data is available. Specifically, in the case of CDA, the source term is present in Equation (9) only, whereas for CDAnet, Figure 5. Generalizability of CDAnet. Evolution of the RRMSE for different Rayleigh numbers, as indicated. The PI-DNN is trained using RB data at Ra = 10 6 with (,  ) = (4, 4) , and is evaluated on datasets for Rayleigh numbers that are different than the one used for training.
HAMMOUD ET AL.

10.1029/2022MS003051
16 of 22 the input to the PI-DNN consists of coarse temperature data only. In both cases, we are of course interested in evaluating the full system state on a fine resolution mesh.
To this end, we performed experiments at Ra = 10 5 , with downscaling parameters (,  ) = (4, 4) . To analyze the results, we contrast downscaled results obtained with PI-DNN trained with (a) all state variables used as input, (b) velocity and pressure are used as input, and (c) only temperature is used as input. For CDA, we also consider three analogous scenarios, assimilating (a) both velocity and temperature observations, (b) velocity observations only, and (c) temperature observations alone. Figure 8 shows the evolution of RRMSE for all six cases outlined above. As expected (Altaf et al., 2017;Farhat, Jolly, & Titi, 2015), the results indicate that CDA recovers the true system state exponentially in time, when coarse velocity and temperature observations are downscaled or when only velocity observations are available. On the other hand, when only temperature observations are available, at late stages the downscaled CDA solution diverges from the true system state. Similar experiences are observed for CDAnet. Specifically, in scenarios (i) and (ii), the RRMSE errors in CDA tend to 0, whereas in CDAnet the RRMSE oscillates around a small amplitude plateau. In scenario (iii), both predictions diverge at later stages, with comparable error values.
These experiences underscore the importance of performing downscaling within a well-defined framework, which guarantees convergence of the downscaling algorithm or alternatively guarantees the existence of a learnable mapping.

Discussion and Conclusions
We proposed a PI-DNN, named CDAnet, that represents a surrogate model for downscaling fluid flows based on the CDA methodology, which theorizes a map, called determining form map, from the coarse observations of the system states to the fine solution. CDAnet was trained to predict the high-resolution solution, in both space and time, of a fluid flow provided coarse observations of the system state. The proposed framework harnesses elements from super-resolution networks along with physics-informed techniques to learn the determining form map using the data and governing equations. We assess the performance of the proposed PI-DNN using a number of numerical experiments that compare the predictions of CDAnet to dynamically downscaled solutions using CDA. In particular, the network was first used to learn the determining form map for flows at different Rayleigh (Ra) numbers and for different spatial and temporal downscaling parameters (,  ) . The outputs from CDAnet are then compared to those obtained by numerically downscaling the CDA equations. The model was tested for generalizability for downscaling flows at a Ra number different that the one originally trained on. Finally, the Note. The PI-DNN is evaluated on a data set of RB convection at Ra = 10 5 and δt = 0.1 without temperature information.
The RRMSE of the solution variables are reported along with the training and validation losses for different combinations of  and  .

Table 4
Performance of CDAnet for Ra = 10 5 Without Temperature Observations  study was concluded by testing the proposed methodology at downscaling the RB convection without using temperature observations as shown theoretically by Farhat, Jolly, and Titi (2015).
The RRMSE was adopted as an error metric to evaluate the performance of the PI-DNN during training based on which the best model was selected. For Ra = 10 5 , the model was able to downscale coarse observations up to  = 16 with RRMSEs below 5%. Similarly, for Ra = 10 6 the RRMSE corresponding to each of the solution variables were below 3% and below 4% for the case of Ra = 10 7 . Numerical experiments demonstrated that the trained CDAnet is able to downscale observations of the flow at time instances not included in the training data set. The evolution curves of the RRMSE all show that beyond the initial transient, the errors oscillate about a mean error level. The errors of the downscaled fields were larger in the case of CDAnet in comparison to numerically downscaling via CDA. CDAnet was shown to generalize to flows at Ra numbers different than the one trained on. Specifically, numerical experiments show that a network trained to downscale flows at a specific Ra number provides robust predictions for flows at neighboring Ra numbers, with only a marginal increase in RRMSE. Moreover, the framework was slightly modified to accommodate downscaling without coarse temperature data. In particular, the network was made deeper and wider to be able to learn this more complex mapping. The trained models were able to achieve RRMSE values below 5% for all solution variables and all combinations of (,  ) that were considered. On the other hand, when only temperature data is assimilated or used as input, both CDA and CDAnet failed to recover the fine-scale truth. This highlights the need to perform assimilation or learning in a setting that can guarantee convergence of downscaling algorithms or the existence of mappings that can be approximated by neural networks.
The current PI-DNN was trained to generate a deterministic mapping of perfect low-resolution information onto high-resolution fields. In environmental flow applications, low-resolution fields are nowadays generally provided as ensembles of possible scenarios raising the question of handling these uncertain inputs. In such settings, one could pursue training of networks with random parameters (Gagne et al., 2020;Ravuri et al., 2021) or alternatively the possibility of using deterministic mappings for the purpose of lifting solution statistics. We plan to explore both avenues in future work.
In summary, the proposed framework offers a PI-DNN architecture and training pipeline to learn the determining form map of a dissipative dynamic system. Although currently tested for a 2D time-varying flow, this framework can be extended to more complex settings, such as 3D time-dependent flows or environmental flow-predictions based on general circulation models. Such extensions will be the focus of follow-on work.