THE KAUST Repository is an initiative of the University Library to expand the impact of conference papers, technical reports, peer-reviewed articles, preprints, theses, images, data sets, and other research-related works of King Abdullah University of Science and Technology (KAUST). 

Theses and DissertationsResearch Publications

Files in the repository are accessible through popular web search engines and are given persistent web addresses so links will not become broken over time.

KAUST researchers: To add your research to the repository, click on Deposit your Research, log in with your KAUST user name and password, and deposit the item in the appropriate collection.

Deposit your Research

If you have any questions, please contact

  • High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

    Kamalakkannan, Kamalavasan; Mudalige, Gihan R.; Reguly, Istvan Z.; Fahmy, Suhaib A. (IEEE, 2021-05-17) [Conference Paper]
    This paper presents a workflow for synthesizing near-optimal FPGA implementations of structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class and its computation-communication pattern and the architectural capabilities of the FPGA to accelerate solvers for high-performance computing applications. Key new features of the workflow are (1) the unification of standard state-of-the-art techniques with a number of highgain optimizations such as batching and spatial blocking/tiling, motivated by increasing throughput for real-world workloads and (2) the development and use of a predictive analytical model to explore the design space, and obtain resource and performance estimates. Three representative applications are implemented using the design workflow on a Xilinx Alveo U280 FPGA, demonstrating near-optimal performance and over 85% predictive model accuracy. These are compared with equivalent highly-optimized implementations of the same applications on modern HPC-grade GPUs (Nvidia V100), analyzing time to solution, bandwidth, and energy consumption. Performance results indicate comparable runtimes with the V100 GPU, with over 2× energy savings for the largest non-trivial application on the FPGA. Our investigation shows the challenges of achieving high performance on current generation FPGAs compared to traditional architectures. We discuss determinants for a given stencil code to be amenable to FPGA implementation, providing insights into the feasibility and profitability of a design and its resulting performance.
  • Runtime Abstraction for Autonomous Adaptive Systems on Reconfigurable Hardware

    Bucknall, Alex R.; Fahmy, Suhaib A. (IEEE, 2021-02-01) [Conference Paper]
    Autonomous systems increasingly rely on on-board computation to avoid the latency overheads of offloading to more powerful remote computing. This requires the integration of hardware accelerators to handle the complex computations demanded by date-intensive sensors. FPGAs offer hardware acceleration with ample flexibility and interfacing capabilities when paired with general purpose processors, with the ability to reconfigure at runtime using partial reconfiguration. Managing dynamic hardware is complex and has been left to designers to address in an ad-hoc manner without first-class integration in autonomous software frameworks. This paper presents an abstracted runtime for managing adaptation of FPGA accelerators, including partial reconfiguration and parametric changes, that presents as a typical interface used in autonomous software systems. We present a demonstration using the Robot Operating System (ROS), showing negligible latency overhead as a result of the abstraction.
  • Protein-protein interactions decoys datasets for machine learning algorithm development

    Barradas Bautista, Didier; Almajed, Ali; Cavallo, Luigi; Kalnis, Panos; Oliva, Romina (KAUST Research Repository, 2021-01-20) [Dataset]
    This is the most complete and diverse protein docking decoys set derived from the Benchmark5, Scorers_set. We used three different rigid-body docking programs to generate the decoys for the Bechmark5. We analyzed all docking decoys with more than 150 different scoring functions from different sources ( CCharppi, FreeSASA, CIPS, CONSRANK). We provide a balanced and unbalanced version of the data. This balanced data is intended for the training and test of machine learning algorithms. the unbalanced data is provided to simulated the real-world scenario. We also provide a
  • Protein-protein benchmark5 decoys balanced dataset

    Barradas Bautista, Didier; Oliva, Romina; Cavallo, Luigi (KAUST Research Repository, 2021-01-19) [Dataset]
    This is the most complete and diverse protein docking decoys set derived from the Benchmark5. We used three different rigid-body docking programs to generate the decoys. We analyzed these docking decoys with more than 150 different scoring functions from different sources ( CCharppi, FreeSASA, CIPS, CONSRANK). This version of the dataset is balanced with the raw values from the scoring functions. This data is intended for the training of Machine learning algorithms.
  • Symmetry-dependent field-free switching of perpendicular magnetization

    Liu, Liang; Zhou, Chenghang; Shu, Xinyu; Li, Changjian; Zhao, Tieyang; Lin, Weinan; Deng, Jinyu; Xie, Qidong; Chen, Shaohai; Zhou, Jing; Guo, Rui; Wang, Han; Yu, Jihang; Shi, Shu; Yang, Ping; Pennycook, S. J.; Manchon, Aurelien; Chen, Jingsheng (Nature Nanotechnology, Springer Science and Business Media LLC, 2021-01-18) [Article]
    Modern magnetic-memory technology requires all-electric control of perpendicular magnetization with low energy consumption. While spin–orbit torque (SOT) in heavy metal/ferromagnet (HM/FM) heterostructures1,2,3,4,5 holds promise for applications in magnetic random access memory, until today, it has been limited to the in-plane direction. Such in-plane torque can switch perpendicular magnetization only deterministically with the help of additional symmetry breaking, for example, through the application of an external magnetic field2,4, an interlayer/exchange coupling6,7,8,9 or an asymmetric design10,11,12,13,14. Instead, an out-of-plane SOT15 could directly switch perpendicular magnetization. Here we observe an out-of-plane SOT in an HM/FM bilayer of L11-ordered CuPt/CoPt and demonstrate field-free switching of the perpendicular magnetization of the CoPt layer. The low-symmetry point group (3m1) at the CuPt/CoPt interface gives rise to this spin torque, hereinafter referred to as 3m torque, which strongly depends on the relative orientation of the current flow and the crystal symmetry. We observe a three-fold angular dependence in both the field-free switching and the current-induced out-of-plane effective field. Because of the intrinsic nature of the 3m torque, the field-free switching in CuPt/CoPt shows good endurance in cycling experiments. Experiments involving a wide variety of SOT bilayers with low-symmetry point groups16,17 at the interface may reveal further unconventional spin torques in the future.

View more