Recent Submissions

  • Optimization Specifications for CUDA Code Restructuring Tool

    Khan, Ayaz (2017-03-13)
    In this work we have developed a restructuring software tool (RT-CUDA) following the proposed optimization specifications to bridge the gap between high-level languages and the machine dependent CUDA environment. RT-CUDA takes a C program and convert it into an optimized CUDA kernel with user directives in a configuration file for guiding the compiler. RTCUDA also allows transparent invocation of the most optimized external math libraries like cuSparse and cuBLAS enabling efficient design of linear algebra solvers. We expect RT-CUDA to be needed by many KSA industries dealing with science and engineering simulation on massively parallel computers like NVIDIA GPUs.
  • d3f: Parallel Simulation of Large-scale Groundwater Flow with ug4

    Wittum, Gabriel; Logashenko, Dmitry; Hoffer, Michael; Lampe, Michael; Nägel, Arne; Reiter, Sebastian; Vogel, Andreas (2017-03-13)
  • SPARTex: A Vertex-Centric Framework for RDF Data Analytics

    Abdelaziz, Ibrahim; Al-Harbi, Razen; Salihoglu, Semih; Kalnis, Panos; Mamoulis, Nikos (2017-03-13)
  • ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph

    Abdelhamid, Ehab; Abdelaziz, Ibrahim; Kalnis, Panos; Khayyat, Zuhair; Jamour, Fuad Tarek (2017-03-13)
  • Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

    Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E. (2017-03-13)
  • Earthquake Ground Motion Analysis and extreme computing on multi-Petaflops machine

    De Martin, Florent; Dupros, Fabrice; Thierry, Philippe; Paciucci, Gabriele; Sochala, Pierre; Boulahya, Faïza; Benaichouche, Abed; Chaljub, Emmanuel; Hadri, Bilel; Ltaief, Hatem; Keyes, David E. (2017-03-13)
  • Batched Triangular DLA for Very Small Matrices on GPUs

    Charara, Ali; Keyes, David E.; Ltaief, Hatem (2017-03-13)
    In several scientific applications, like tensor contractions in deep learning computation or data compression in hierarchical low rank matrix approximation, the bulk of computation typically resides in performing thousands of independent dense linear algebra operations on very small matrix sizes (usually less than 100). Batched dense linear algebra kernels are becoming ubiquitous for such scientific computations. Within a single API call, these kernels are capable of simultaneously launching a large number of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the utilization of the underlying hardware.
  • High-resolution seismic wave propagation using local time stepping

    Peter, Daniel; Rietmann, Max; Galvez, Percy; Ampuero, Jean Paul (2017-03-13)
    High-resolution seismic wave simulations often require local refinements in numerical meshes to accurately capture e.g. steep topography or complex fault geometry. Together with explicit time schemes, this dramatically reduces the global time step size for ground-motion simulations due to numerical stability conditions. To alleviate this problem, local time stepping (LTS) algorithms allow an explicit time stepping scheme to adapt the time step to the element size, allowing nearoptimal time steps everywhere in the mesh. This can potentially lead to significantly faster simulation runtimes.
  • Implicit Unstructured Aerodynamics on Emerging Multi- and Many-Core HPC Architectures

    Al Farhan, Mohammed A.; Kaushik, Dinesh K.; Keyes, David E. (2017-03-13)
    Shared memory parallelization of PETSc-FUN3D, an unstructured tetrahedral mesh Euler code previously characterized for distributed memory Single Program, Multiple Data (SPMD) for thousands of nodes, is hybridized with shared memory Single Instruction, Multiple Data (SIMD) for hundreds of threads per node. We explore thread-level performance optimizations on state-of-the-art multi- and many-core Intel processors, including the second generation of Xeon Phi, Knights Landing (KNL). We study the performance on the KNL with different configurations of memory and cluster modes, with code optimizations to minimize indirect addressing and enhance the cache locality. The optimizations employed are expected to be of value other unstructured applications as many-core architecture evolves.
  • Toward a fault-tolerant operational ensemble data assimilation forecasting system for the Red Sea

    Toye, Habib; Kortas, Samuel; Zhan, Peng; Hoteit, Imbrahim (2017-03-13)
  • Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    Hamam, Alwaleed A.; Khan, Ayaz H. (2017-03-13)
    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
  • Performance Results using ANSYS HPC

    Karim, Abbass; Ramon, Jose (2017-03-13)
  • Abnormal Behavior Detection in Arial Video Surveillance

    Walha, Ahlem; Wali, Ali; Alimi, Adel (2017-03-13)
  • HPL and STREAM Benchmarks on SANAM Supercomputer

    Bin Sulaiman, Riman A. (2017-03-13)
    SANAM supercomputer was jointly built by KACST and FIAS in 2012 ranking second that year in the Green500 list with a power efficiency of 2.3 GFLOPS/W (Rohr et al., 2014). It is a heterogeneous accelerator-based HPC system that has 300 compute nodes. Each node includes two Intel Xeon E5?2650 CPUs, two AMD FirePro S10000 dual GPUs and 128 GiB of main memory. In this work, the seven benchmarks of HPCC were installed and configured to reassess the performance of SANAM, as part of an unpublished master thesis, after it was reassembled in the Kingdom of Saudi Arabia. We present here detailed results of HPL and STREAM benchmarks.
  • Scalable Relevance re-ranking using nature-inspired meta-heuristic optimization algorithms

    Ksibi, Amel; Hadj Taieb, Mohamed Amin; Ben Ammar, Anis; Ben Amar, Chokri (2017-03-13)
  • Simulation of Cycle-to-Cycle Variation in Dual-Fuel Engines

    Jaasim, Mohammed; Pasunurthi, Shyamsundar; Jupudi, Ravichandra S.; Gubba, Sreenivasa Rao; Primus, Roy; Klingbeil, Adam; Wijeyakulasuriya, Sameera; Im, Hong G. (2017-03-13)
    Standard practices of internal combustion (IC) engine experiments are to conduct the measurements of quantities averaged over a large number of cycles. Depending on the operating conditions, the cycle-to-cycle variation (CCV) of quantities, such as the indicated mean effective pressure (IMEP) are observed at different levels. Accurate prediction of CCV in IC engines is an important but challenging task. Computational fluid dynamics (CFD) simulations using high performance computing (HPC) can be used effectively to visualize such 3D spatial distributions. In the present study, a dual fuel large engine is considered, with natural gas injected into the manifold accompanied with direct injection of diesel pilot fuel to trigger ignition. Multiple engine cycles in 3D are simulated in series as in the experiments to investigate the potential of HPC based high fidelity simulations to accurately capture the cycle to cycle variation in dual fuel engines. Open cycle simulations are conducted to predict the combined effect of the stratification of fuel-air mixture, temperature and turbulence on the CCV of pressure. The predicted coefficient of variation (COV) of pressure compared to the results from closed cycle simulations and the experiments.
  • Investigation on syngas/air turbulent nonpremixed flames sensitivity on pressure

    Ciottoli, Pietro Paolo; Lee, Bok Jik; Lapenna, Pasquale Eduardo; Galassi, Riccardo Malpica; Martelli, Emanuele; Valorani, Mauro; Im, Hong G. (2017-03-13)

View more