### Recent Submissions

• #### Using the IR as a Research Data Registry

(2018-05)
As data and software become increasingly common research outputs, universities have an opportunity to expand their existing efforts to record affiliated publications so that they also capture information about research data releases. At KAUST we have taken several steps to put our repository on a path towards becoming a reliable registry for information about the existence and location of research data released by affiliated researchers. These included developing a process to retrospectively retrieve and register information about datasets with machine-readable relationships to publications already in the repository, and updates to our active publications tracking procedures so that data availability statements are retrieved at the time of harvesting and checked for references to research data. The presentation will conclude by discussing how these efforts help put the repository in a position to provide expanded services in support of improved research data management, including access to and preservation of research data not explicitly linked to a formal publication.
• #### 1057 Vemurafenib acts as an aryl hydrocarbon receptor antagonist

(Elsevier BV, 2018-04-19)
• #### 665 Nail lesions in 30 old inbred mouse strains

(Elsevier BV, 2018-04-19)
• #### Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

(2018-04-10)
Use H-matrices to approximate large covariance matrices in spatial statistics
• #### PredMP: A Web Resource for Computationally Predicted Membrane Proteins via Deep Learning

(Elsevier BV, 2018-02-06)
Experimental determination of membrane protein (MP) structures is challenging as they are often too large for nuclear magnetic resonance (NMR) experiments and difficult to crystallize. Currently there are only about 510 non-redundant MPs with solved structures in Protein Data Bank (PDB). To elucidate the MP structures computationally, we developed a novel web resource, denoted as PredMP (http://52.87.130.56:3001/#/proteinindex), that delivers one-dimensional (1D) annotation of the membrane topology and secondary structure, two-dimensional (2D) prediction of the contact/distance map, together with three-dimensional (3D) modeling of the MP structure in the lipid bilayer, for each MP target from a given model organism. The precision of the computationally constructed MP structures is leveraged by state-of-the-art deep learning methods as well as cutting-edge modeling strategies. In particular, (i) we annotate 1D property via DeepCNF (Deep Convolutional Neural Fields) that not only models complex sequence-structure relationship but also interdependency between adjacent property labels; (ii) we predict 2D contact/distance map through Deep Transfer Learning which learns the patterns as well as the complex relationship between contacts/distances and protein features from non-membrane proteins; and (iii) we model 3D structure by feeding its predicted contacts and secondary structure to the Crystallography & NMR System (CNS) suite combined with a membrane burial potential that is residue-specific and depth-dependent. PredMP currently contains more than 2,200 multi-pass transmembrane proteins (length<700 residues) from Human. These transmembrane proteins are classified according to IUPHAR/BPS Guide, which provides a hierarchical organization of receptors, channels, transporters, enzymes and other drug targets according to their molecular relationships and physiological functions. Among these MPs, we estimated that our approach could predict correct folds for 1,345-1,871 targets including a few hundred new folds, which shall facilitate the discovery of drugs targeting at MPs.

(2018-01-24)

(2018-01-24)

(2018-01-24)
• #### Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

(2017-11-01)
The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\H$-) matrix format with computational cost $\mathcal{O}(k^2n \log^2 n/p)$ and storage $\mathcal{O}(kn \log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
• #### Risk assessment of salt contamination of groundwater under uncertain aquifer properties

(2017-10-01)
One of the central topics in hydrogeology and environmental science is the investigation of salinity-driven groundwater flow in heterogeneous porous media. Our goals are to model and to predict pollution of water resources. We simulate a density driven groundwater flow with uncertain porosity and permeability. This strongly non-linear model describes the unstable transport of salt water with building ‘fingers’-shaped patterns. The computation requires a very fine unstructured mesh and, therefore, high computational resources. We run the highly-parallel multigrid solver, based on ug4, on supercomputer Shaheen II. A MPI-based parallelization is done in the geometrical as well as in the stochastic spaces. Every scenario is computed on 32 cores and requires a mesh with ~8M grid points and 1500 or more time steps. 200 scenarios are computed concurrently. The total number of cores in parallel computation is 200x32=6400. The main goal of this work is to estimate propagation of uncertainties through the model, to investigate sensitivity of the solution to the input uncertain parameters. Additionally, we demonstrate how the multigrid ug4-based solver can be applied as a black-box in the uncertainty quantification framework.
• #### Spatio-temporal Characterization of Ligand-Receptor Interactions in Blood Stem-Cell Rolling

(2017-08-16)
One of the most important issues in the research on hematopoietic stem/progenitor cells (HSPCs) is to understand the mechanism of the homing process of these cells to the bone marrow after being transplanted into patients and establish the production of various blood cell types. The HSPCs first come in contact with the endothelial cells. This contact is known as adhesion and occurs through a multi-step paradigm ending with transmigration to the bone marrow niche. The initial step of the homing, tethering and rolling of HSPCs is mediated by P- and E-Selectins expressed on the endothelial cell surface through their interactions with the ligands expressed by HSPCs. Here we developed a novel experimental method to unravel the molecular mechanisms of the selectin-ligands interactions in vitro at the single molecule level by combining microfluidics and single-molecule fluorescence imaging. Our method enables direct visualization of the nanoscale spatiotemporal dynamics of the E-selectin-ligand (PSGL-1) interactions under conditions of shear stress acting on the cells at the molecular level in real time.
• #### Discrete Exterior Calculus Discretization of Incompressible Navier-Stokes Equations

(2017-05-23)
A conservative discretization of incompressible Navier-Stokes equations over surface simplicial meshes is developed using discrete exterior calculus (DEC). Numerical experiments for flows over surfaces reveal a second order accuracy for the developed scheme when using structured-triangular meshes, and first order accuracy otherwise. The mimetic character of many of the DEC operators provides exact conservation of both mass and vorticity, in addition to superior kinetic energy conservation. The employment of barycentric Hodge star allows the discretization to admit arbitrary simplicial meshes. The discretization scheme is presented along with various numerical test cases demonstrating its main characteristics.
• #### CFD Modeling of a Multiphase Gravity Separator Vessel

(2017-05-23)
The poster highlights a CFD study that incorporates a combined Eulerian multi-fluid multiphase and a Population Balance Model (PBM) to study the flow inside a typical multiphase gravity separator vessel (GSV) found in oil and gas industry. The simulations were performed using Ansys Fluent CFD package running on KAUST supercomputer, Shaheen. Also, a highlight of a scalability study is presented. The effect of I/O bottlenecks and using Hierarchical Data Format (HDF5) for collective and independent parallel reading of case file is presented. This work is an outcome of a research collaboration on an Aramco project on Shaheen.