For more information visit: https://cemse.kaust.edu.sa/

Recent Submissions

  • To Encourage or to Restrict: the Label Dependency in Multi-Label Learning

    Yang, Zhuo (2022-06) [Dissertation]
    Advisor: Zhang, Xiangliang
    Committee members: Wang, Di; Moshkov, Mikhail; Feng, Zhuo
    Multi-label learning addresses the problem that one instance can be associated with multiple labels simultaneously. Understanding and exploiting the Label Dependency (LD) is well accepted as the key to build high-performance multi-label classifiers, i.e., classifiers having abilities including but not limited to generalizing well on clean data and being robust under evasion attack. From the perspective of generalization on clean data, previous works have proved the advantage of exploiting LD in multi-label classification. To further verify the positive role of LD in multi-label classification and address previous limitations, we originally propose an approach named Prototypical Networks for Multi- Label Learning (PNML). Specially, PNML addresses multi-label classification from the angle of estimating the positive and negative class distribution of each label in a shared nonlinear embedding space. PNML achieves the State-Of-The-Art (SOTA) classification performance on clean data. From the perspective of robustness under evasion attack, as a pioneer, we firstly define the attackability of an multi-label classifier as the expected maximum number of flipped decision outputs by injecting budgeted perturbations to the feature distribution of data. Denote the attackability of a multi-label classifier as C∗, and the empirical evaluation of C∗ is an NP-hard problem. We thus develop a method named Greedy Attack Space Exploration (GASE) to estimate C∗ efficiently. More interestingly, we derive an information-theoretic upper bound for the adversarial risk faced by multi-label classifiers. The bound unveils the key factors determining the attackability of multi-label classifiers and points out the negative role of LD in multi-label classifiers’ adversarial robustness, i.e. LD helps the transfer of attack across labels, which makes multi-label classifiers more attackable. One step forward, inspired by the derived bound, we propose a Soft Attackability Estimator (SAE) and further develop Adversarial Robust Multi-label learning with regularized SAE (ARM-SAE) to improve the adversarial robustness of multi-label classifiers. This work gives a more comprehensive understanding of LD in multi-label learning. The exploiting of LD should be encouraged since its positive role in models’ generalization on clean data, but be restricted because of its negative role in models’ adversarial robustness.
  • Charging Techniques for UAV-Assisted Data Collection: Is Laser Power Beaming the Answer?

    Lahmeri, Mohamed-Amine; Kishk, Mustafa A.; Alouini, Mohamed-Slim (IEEE Communications Magazine, Institute of Electrical and Electronics Engineers (IEEE), 2022-05-17) [Article]
    As COVID-19 has increased the need for connectivity around the world, researchers are targeting new technologies that could improve coverage and connect the unconnected in order to make progress toward the United Nations Sustainable Development Goals. In this context, drones are seen as one of the key features of 6G wireless networks that could extend the coverage of previous wireless network generations. That said, limited onboard energy seems to be the main drawback that hinders the use of drones for wireless coverage. Therefore, different wireless and wired charging techniques, such as laser beaming, charging stations, and tether stations, are proposed. In this article, we analyze and compare these different charging techniques by performing extensive simulations for the scenario of drone-assisted data collection from ground-based Internet of Things devices. We analyze the strengths and weaknesses of each charging technique, and finally show that laser-powered drones strongly compete with, and outperform in some scenarios, other charging techniques.
  • Reconfigurable Intelligent Surface Enabled Interference Nulling and Signal Power Maximization in mmWave bands

    Ye, Jia; Kammoun, Abla; Alouini, Mohamed-Slim (IEEE Transactions on Wireless Communications, Institute of Electrical and Electronics Engineers (IEEE), 2022-05-17) [Article]
    Reconfigurable intelligent surface (RIS) has emerged as a promising mean to enhance wireless transmission. The effective reflected paths provided by RIS are able to alleviate the susceptibility to blockage effects, especially in high-frequency band communications, where signals experience severe path loss and high directivity. This paper is concerned with an RIS-assisted system over the millimeter wave (mmWave) channel characterized by sparse propagation paths. A base station tries to connect with the desired user through an RIS, while the undesired user can also receive the signal transmitted from BS unavoidably, which is treated as the interference signal. All terminals are assumed to be equipped with a single antenna for the sake of simplicity. The paper aims to propose an appropriate design of the phase shifts of each element at the RIS so as to maximize the received signal power transmitted from the base station (BS) at the desired user, while nulling the received interference signal power at the undesired user. The proposed reflecting design relies on the decomposition of the reflecting beamforming vectors and all channel path vectors into Kronecker product of factors being uni-modulus vectors. By exploiting characteristics of Kronecker mixed products, different factors of the reflecting are designed for either nulling the interference signal at the undesired user, or coherently combining data paths at the desired user. Furthermore, a channel estimation strategy is proposed to enable the proposed reflecting beamforming design. The magnitude, azimuth, and elevation arrival and departure angles of desired and undesired paths are estimated by an efficient 2-dimension (2-D) line spectrum optimization technique based on the atomic norm minimization (ANM) framework. The performance of the reflecting designs and channel estimation scheme is analyzed and demonstrated by simulation results.
  • Ergodic Capacity Analysis of UAV-based FSO Links over Foggy Channels

    Jung, Kug-Jin; Nam, Sung Sik; Alouini, Mohamed-Slim; Ko, Young-Chai (IEEE Wireless Communications Letters, Institute of Electrical and Electronics Engineers (IEEE), 2022-05-17) [Article]
    In this paper, we investigate the ergodic capacity of unmanned aerial vehicle (UAV)-based free space optics (FSO) links over random foggy channel. More specifically, we derive composite probability density function (PDF) and close approximation for the moments of the composite PDF using the statistical model of a UAV-based 3D pointing error and a random foggy channel. With it, we obtain upper bound and asymptotic approximation of the ergodic capacity for the two possible detection techniques of intensity modulation/direct detection (IM/DD) and heterodyne detection at high and low signal-to-noise ratio (SNR) regimes. The numerical results confirm all the presented analytic results via computer-based Monte-Carlo simulations.
  • Microstructural analysis of N-polar InGaN directly grown on a ScAlMgO4(0001) substrate

    Velazquez-Rizo, Martin; Najmi, Mohammed A.; Iida, Daisuke; Kirilenko, Pavel; Ohkawa, Kazuhiro (Applied Physics Express, IOP Publishing, 2022-05-17) [Article]
    We report the characterization of a N-polar InGaN layer deposited by metalorganic vapor-phase epitaxy on a ScAlMgO4(0001) (SAM) substrate without a low-temperature buffer layer. The InGaN layer was tensile-strained, and its stoichiometry corresponded to In0.13Ga0.87N. We also present the microstructural observation of the InGaN/SAM interface via integrated differential phase contrast-scanning transmission electron microscopy. The results show that the interface between N-polar InGaN and SAM occurs between the O atoms of the O–Sc SAM surface and the (Ga,In) atoms of InGaN.
  • Spatio-Temporal Cross-Covariance Functions under the Lagrangian Framework with Multiple Advections

    Salvaña, Mary Lai O.; Lenzi, Amanda; Genton, Marc G. (Journal of the American Statistical Association, Informa UK Limited, 2022-05-17) [Article]
    When analyzing the spatio-temporal dependence in most environmental and earth sciences variables such as pollutant concentrations at different levels of the atmosphere, a special property is observed: the covariances and cross-covariances are stronger in certain directions. This property is attributed to the presence of natural forces, such as wind, which cause the transport and dispersion of these variables. This spatio-temporal dynamics prompted the use of the Lagrangian reference frame alongside any Gaussian spatio-temporal geostatistical model. Under this modeling framework, a whole new class was birthed and was known as the class of spatio-temporal covariance functions under the Lagrangian framework, with several developments already established in the univariate setting, in both stationary and nonstationary formulations, but less so in the multivariate case. Despite the many advances in this modeling approach, efforts have yet to be directed to probing the case for the use of multiple advections, especially when several variables are involved. Accounting for multiple advections would make the Lagrangian framework a more viable approach in modeling realistic multivariate transport scenarios. In this work, we establish a class of Lagrangian spatio-temporal cross-covariance functions with multiple advections, study its properties, and demonstrate its use on a bivariate pollutant dataset of particulate matter in Saudi Arabia.
  • Role of C-Reactive Protein in Diabetic Inflammation

    Stanimirovic, Julijana; Radovanovic, Jelena; Banjac, Katarina; Obradovic, Milan; Essack, Magbubah; Zafirovic, Sonja; Gluvic, Zoran; Gojobori, Takashi; Isenovic, Esma (Mediators of Inflammation, Hindawi Limited, 2022-05-17) [Article]
    Even though type 2 diabetes mellitus (T2DM) represents a worldwide chronic health issue that affects about 462 million people, specific underlying determinants of insulin resistance (IR) and impaired insulin secretion are still unknown. There is growing evidence that chronic subclinical inflammation is a triggering factor in the origin of T2DM. Increased C-reactive protein (CRP) levels have been linked to excess body weight since adipocytes produce tumor necrosis factor α (TNF-α) and interleukin 6 (IL-6), which are pivotal factors for CRP stimulation. Furthermore, it is known that hepatocytes produce relatively low rates of CRP in physiological conditions compared to T2DM patients, in which elevated levels of inflammatory markers are reported, including CRP. CRP also participates in endothelial dysfunction, the production of vasodilators, and vascular remodeling, and increased CRP level is closely associated with vascular system pathology and metabolic syndrome. In addition, insulin-based therapies may alter CRP levels in T2DM. Therefore, determining and clarifying the underlying CRP mechanism of T2DM is imperative for novel preventive and diagnostic procedures. Overall, CRP is one of the possible targets for T2DM progression and understanding the connection between insulin and inflammation may be helpful in clinical treatment and prevention approaches.
  • Acoustic Beam Splitting and Cloaking Based on a Compressibility-Near-Zero Medium

    Xu, Changqing; Huang, Sibo; Guo, Zhiwei; Jiang, Haitao; Li, Yong; Wu, Ying; Chen, Hong (Physical Review Applied, American Physical Society (APS), 2022-05-16) [Article]
    We report an artificial acoustic compressibility-near-zero medium made of a phononic crystal composed of epoxy blocks arranged in a square lattice. Its anisotropic effective density leads to a linear cross in its isofrequency contour in the vicinity of the Brillouin zone center, as its effective compressibility approaches zero. When a Gaussian beam is normally incident on the phononic crystal, a splitting effect is achieved at the frequency of the crossing point. Based on such a beam-splitting effect, an acoustic cloaking of an irregular-shaped object embedded in the phononic crystal is demonstrated both theoretically and experimentally. Such an anisotropic zero-index material offers a potential method to control acoustic waves.
  • Enhancing QoS Through Fluid Antenna Systems over Correlated Nakagami-m Fading Channels

    Tlebaldiyeva, Leila; Nauryzbayev, Galymzhan; Arzykulov, Sultangali; Eltawil, Ahmed; Tsiftsis, Theodoros (IEEE, 2022-05-16) [Conference Paper]
    Fluid antenna systems (FAS) enable mechanically flexible antennas that offer adaptability and flexibility for modern communication devices. In this work, we present a conceptual model for a single-antenna N-port (SANP) FAS over spatially correlated Nakagami-m fading channels and compare it with the traditional diversity schemes in terms of outage probability. The proposed FAS model switches to the best antenna port and resembles the operation of a selection combining (SC) diversity. FAS improves the quality of service (QoS) of the network through antenna port selection. The advantage of FAS is the ability to fit hundreds of antenna ports into a half-wavelength antenna size at the cost of spatial channel correlation. Simulation results demonstrate the superior outage probability performance of FAS at several tens of antenna ports compared to the traditional diversity schemes such as maximum ratio combining, equal gain combining, and SC. Moreover, the novel probability and cumulative density functions for the land mobile correlated Nakagami-m random variates are evaluated in this paper.
  • Joint Beamforming and Clustering for Energy Efficient Multi-Cloud Radio Access Networks

    Reifert, Robert-Jeron; Ahmad, Alaa Alameer; Dahrouj, Hayssam; Chaaban, Anas; Sezgin, Aydin; Al-Naffouri, Tareq Y.; Alouini, Mohamed-Slim (IEEE, 2022-05-16) [Conference Paper]
    The tremendous growth of data traffic in mobile communication networks (MCNs) and the associated exponential increase in mobile devices’ numbers necessitate the use of multi-cloud radio access networks (MC-RANs) as a viable solution to cope with the requirements of next-generation MCNs (6G). In MC-RANs, each central processor (CP) manages the signal processing of its own set of base stations (BSs), and so the system performance becomes a function of the joint intra-cloud and inter-cloud interference mitigation techniques. To this end, this paper considers the problem of maximizing the network-wide energy efficiency (EE) subject to user-to-cloud association, fronthaul capacity, maximum transmit power, and achievable rate constraints, so as to determine the joint beamforming vector of each user and the user-to-cloud association strategy. The paper tackles the non-convex and mixed discrete-continuous nature of the problem formulation using fractional programming (FP) and inner-convex approximation (ICA) techniques, as well as l 0 -norm relaxation heuristics, and shows how the proposed approach can be implemented in a distributed fashion via a reasonable amount of information exchange across the CPs. The paper simulations highlight the appreciable algorithmic efficiency of the proposed approach over state-of-the-art schemes.
  • Existence and weak-strong uniqueness for Maxwell-Stefan-Cahn-Hilliard systems

    Huo, Xiaokai; Jüngel, Ansgar; Tzavaras, Athanasios (arXiv, 2022-05-13) [Preprint]
    A Maxwell-Stefan system for fluid mixtures with driving forces depending on Cahn-Hilliard-type chemical potentials is analyzed. The corresponding parabolic cross-diffusion equations contain fourth-order derivatives and are considered in a bounded domain with no-flux boundary conditions. The main difficulty of the analysis is the degeneracy of the diffusion matrix, which is overcome by proving the positive definiteness of the matrix on a subspace and using the Bott--Duffin matrix inverse. The global existence of weak solutions and a weak-strong uniqueness property are shown by a careful combination of (relative) energy and entropy estimates, yielding H2(Ω) bounds for the densities, which cannot be obtained from the energy or entropy inequalities alone
  • Structural engineering approach for designing foil-based flexible capacitive pressure sensors

    Mishra, Rishabh B.; Al-Modaf, Fhad; Babatain, Wedyan; Hussain, Aftab M.; Elatab, Nazek (IEEE Sensors Journal, IEEE, 2022-05-11) [Article]
    Structural engineering plays an essential role in designing, improving, and optimizing an electromechanical system, instinctively affecting its performance. In this study, design optimization, finite element analysis, and experimental evaluation of capacitive pressure sensors were conducted. The air pressure sensing application was demonstrated to characterize different sensors, which include a combination of multiple rectangular cantilevers and diaphragms (square and circular-shaped). After the design improvement, we found that the square and circular diaphragms each with two trapezoidal cantilevers exhibited highest sensitivity to air pressure monitoring among the different investigated designs which combine the square and circular diaphragms with cantilevers. These designs were then selected for further analysis for acoustic pressure monitoring. The sensors were fabricated using the do-it-yourself technique with household materials such as post-it paper, posted tape, and foil. Our approach offers an alternative to the conventional cleanroom fabrication technique and uses easily available materials to fabricate affordable sensors. Therefore, this is the first step toward the development of democratized and sustainable electronic devices that are affordable and available to everyone on the internet.
  • Drone Charging Stations Deployment in Rural Areas for Better Wireless Coverage: Challenges and Solutions

    Qin, Yujie; Kishk, Mustafa Abdelsalam; Alouini, Mohamed-Slim (IEEE Internet of Things Magazine, Institute of Electrical and Electronics Engineers (IEEE), 2022-05-11) [Article]
    While fifth-generation (5G) cellular is meant to deliver gigabit peak data speeds, low latency, and connection to billions of devices, and 6G is already on the way, half of the world population living in rural areas are still facing challenges connecting to the internet. Compared to urban areas, users in rural areas are greatly impacted by low income, high cost of backhaul connectivity, limited resources, extreme weather, and natural geographical limitations. Therefore, how to connect the rural areas and the difficulties of providing connectivity draw great attention. This article first provides a brief discussion about existing technologies and strategies for enhancing the network coverage in rural areas, and their advantages, limitations, and cost. Next, we mainly focus on the UAV-assisted network in resource-limited regions. Considering the limitation of the onboard battery of UAVs and the electricity supply scarcity in some rural regions, we investigate the possibility and performance enhancement of the deployment of renewable energy (RE) charging stations. We outline three practical scenarios, and use simulation results to demonstrate that RE charging stations can be a possible solution to address the limited onboard battery of UAVs in rural areas, especially when they can harvest and store enough energy. Finally, future works and challenges are discussed.
  • A fully-screen printed, multi-layer process for bendable mm-wave antennas

    Akhter, Zubair; Li, Weiwei; Yu, Yiyang; Shamim, Atif (IEEE, 2022-05-11) [Conference Paper]
    In the era of Internet of Things (IoT) and wearable electronics; printing technique, such as screen printing, is becoming popular because of their lower costs and mass manufacturing abilities. However, most of the previous work has been done on printing metallic patterns and not the printing substrates. In this paper, we introduce a custom screen printable dielectric ink (polymer mixed with ceramics), which provides lower loss even at millimeter-wave (mm-wave) bands. With the help of dielectric ink and custom silver nanowires (AgNW) based metallic ink, a multilayer, fully screen-printed fabrication process has been developed. To demonstrate the efficacy of the proposed inks and the multilayer printing process, a stacked patch antenna with 4-parasitic patches in the superstrate is designed, fabricated, and tested for the mm-wave band (5G band). Despite a new fabrication process, the measured results show a decent antenna performance (both in flat and bent positions) where the input impedance is matched from 26.5-30 GHz and a maximum gain of 7.8 dBi has been attained.
  • Co-Design of Dual-Purpose Heatsink Antenna for Multi-Source Ambient Energy Harvesting

    Bakytbekov, Azamat; Shamim, Atif (IEEE, 2022-05-11) [Conference Paper]
    IoT infrastructure involves billions of devices that must be self-sustainable. Using ambient energy sources to power IoT devices is a promising solution. Ambient RF and thermal energy (diurnal temperature fluctuations) harvesters have great potential since both are available continuously. Smart integration is required for these two harvesters to create synergy and collect more energy. Here, a dual-purpose triple-band heatsink antenna for multi-source ambient energy harvesting is presented. Heatsink antenna serves as a receiving antenna for the RF energy harvester and serves as a heatsink for the thermal energy harvester (TEH). Co-optimization of the heatsink antenna is performed in Ansys HFSS and Ansys Fluent simultaneously. Heatsink antenna operates at GSM900, GSM1800, 3G bands with measured gains of 3.8dB, 4dB, 5.3dB respectively. Antenna gain is doubled (~3dB) and the TEH performance is tripled (200%) when the heatsink fins are integrated, emphasizing the benefit of the co-design and smart integration via heatsink antenna.
  • The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization

    Kovalev, Dmitry; Gasnikov, Alexander (arXiv, 2022-05-11) [Preprint]
    In this paper, we revisit the smooth and strongly-convex-strongly-concave minimax optimization problem. Zhang et al. (2021) and Ibrahim et al. (2020) established the lower bound Ω √κxκy log 1 on the number of gradient evaluations required to find an -accurate solution, where κx and κy are condition numbers for the strong convexity and strong concavity assumptions. However, the existing stateof-the-art methods do not match this lower bound: algorithms of Lin et al. (2020) and Wang and Li (2020) have gradient evaluation complexity O √κxκy log3 1 and O √κxκy log3 (κxκy) log 1 , respectively. We fix this fundamental issue by providing the first algorithm with O √κxκy log 1 gradient evaluation complexity. We design our algorithm in three steps: (i) we reformulate the original problem as a minimization problem via the pointwise conjugate function; (ii) we apply a specific variant of the proximal point algorithm to the reformulated problem; (iii) we compute the proximal operator inexactly using the optimal algorithm for operator norm reduction in monotone inclusions.
  • Efficient Video Grounding with Which-Where Reading Comprehension

    Gao, Jialin; Sun, Xin; Ghanem, Bernard; Zhou, Xi; Ge, Shiming (IEEE Transactions on Circuits and Systems for Video Technology, Institute of Electrical and Electronics Engineers (IEEE), 2022-05-10) [Article]
    Video grounding aims at localizing the temporal moment related to the given language description, which is very helpful to many cross-modal content understanding applications like visual question answering and sentence-video search. Existing approaches usually directly regress the temporal boundaries of an event described by a query sentence in the video sequence. This direct regression manner often encounters a large decision space due to diverse target events and variable video durations, leading to inaccurate localization as well as inefficient grounding. This paper presents an efficient framework termed from which to where to facilitate video grounding. The core idea is imitating the reading comprehension process to gradually narrow the decision space, in what we decompose the direct regression into two steps. The “which" step first roughly selects a candidate area by evaluating which video segment in the predefined set is closest to the ground truth. To this end, we formulate this step into a multi-choice reading comprehension problem and propose a criterion to select the best-matched segment. In this way, the excessive decision space is effectively reduced. The “where" step aims to precisely regress the temporal boundary of the selected video segment from the shrunk decision space. We thus introduce a triple-span representation for each candidate video segment to use the regional context for better boundary regression. The “which" and “where" steps can be combined into a unified framework and learned end-to-end, leading to an efficient video grounding system. Extensive experiments on Charades-STA, ActivityNet-Captions, and TACoS benchmarks clearly demonstrate the effectiveness of our framework.
  • On the Performance Optimization of Two-way Hybrid VLC/RF based IoT System over Cellular Spectrum

    Ghosh, Sutanu; Alouini, Mohamed-Slim (arXiv, 2022-05-10) [Preprint]
    This paper investigates the system outage performance of a useful architecture of two-way hybrid visible light communication/radio frequency (VLC/RF) communication using overlay mode of cooperative cognitive radio network (CCRN). The demand of high data rate application can be fulfilled using VLC link and communication over a wide area of coverage with high reliability can be achieved through RF link. In the proposed architecture, cooperative communication between two licensed user (LU) nodes is accomplished via an aggregation agent (AA). AA can perform like a relay node and in return, it can access the LU spectrum for two-way communications with Internet-of-Things (IoT) device. First, closed form expressions of outage probability of both LU and IoT communication are established. On the basis of these expressions, optimization problems are formulated to achieve minimum outage probability of both LU and IoT network. The impacts of both VLC and RF system parameters on these systems outage probability and throughput are finally shown in simulation results.
  • Federated Random Reshuffling with Compression and Variance Reduction

    Malinovsky, Grigory; Richtarik, Peter (arXiv, 2022-05-10) [Preprint]
    Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the compression parameter, and applying additional controlled linear perturbations for Random Reshuffling, introduced by Malinovsky et al.(2021) helps to eliminate variance at the optimum. We provide the first analysis of compressed local methods under standard assumptions without bounded gradient assumptions and for heterogeneous data, overcoming the limitations of the compression operator. We corroborate our theoretical results with experiments on synthetic and real data sets.
  • H2Opus: a distributed-memory multi-GPU software package for non-local operators

    Zampini, Stefano; Boukaram, Wagih Halim; Turkiyyah, George; Knio, Omar; Keyes, David E. (ADVANCES IN COMPUTATIONAL MATHEMATICS, Springer Science and Business Media LLC, 2022-05-10) [Article]
    Hierarchical H2-matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their O(N) complexity in both memory and operator application makes them particularly suited for large-scale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow large-scale problems to be represented. In this paper, we present high-performance, distributed-memory GPU-accelerated algorithms and implementations for matrix-vector multiplication and matrix recompression of hierarchical matrices in the H2 format. The algorithms are a new module of H2Opus, a performance-oriented package that supports a broad variety of H2 matrix operations on CPUs and GPUs. Performance in the distributed GPU setting is achieved by marshaling the tree data of the hierarchical matrix representation to allow batched kernels to be executed on the individual GPUs. MPI is used for inter-process communication. We optimize the communication data volume and hide much of the communication cost with local compute phases of the algorithms. Results show near-ideal scalability up to 1024 NVIDIA V100 GPUs on Summit, with performance exceeding 2.3 Tflop/s/GPU for the matrix-vector multiplication, and 670 Gflop/s/GPU for matrix compression, which involves batched QR and SVD operations. We illustrate the flexibility and efficiency of the library by solving a 2D variable diffusivity integral fractional diffusion problem with an algebraic multigrid-preconditioned Krylov solver and demonstrate scalability up to 16M degrees of freedom problems on 64 GPUs.

View more