SANAM supercomputer was jointly built by KACST and FIAS in 2012 ranking second that year in the Green500 list with a power efficiency of 2.3 GFLOPS/W (Rohr et al., 2014). It is a heterogeneous accelerator-based HPC system that has 300 compute nodes. Each node includes two Intel Xeon E5?2650 CPUs, two AMD FirePro S10000 dual GPUs and 128 GiB of main memory. In this work, the seven benchmarks of HPCC were installed and configured to reassess the performance of SANAM, as part of an unpublished master thesis, after it was reassembled in the Kingdom of Saudi Arabia. We present here detailed results of HPL and STREAM benchmarks.
In this work we have developed a restructuring software tool (RT-CUDA) following the proposed optimization specifications to bridge the gap between high-level languages and the machine dependent CUDA environment. RT-CUDA takes a C program and convert it into an optimized CUDA kernel with user directives in a configuration file for guiding the compiler. RTCUDA also allows transparent invocation of the most optimized external math libraries like cuSparse and cuBLAS enabling efficient design of linear algebra solvers. We expect RT-CUDA to be needed by many KSA industries dealing with science and engineering simulation on massively parallel computers like NVIDIA GPUs.
Hamam, Alwaleed A.; Khan, Ayaz H.(2017-03-13)[Poster]
Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
We investigate the problem of secure broadcasting over fast fading channels with imperfect main channel state information (CSI) at the transmitter. In particular, we analyze the effect of the noisy estimation of the main CSI on the throughput of a broadcast channel where the transmission is intended for multiple legitimate receivers in the presence of an eavesdropper. Besides, we consider the realistic case where the transmitter is only aware of the statistics of the eavesdropper's CSI and not of its channel's realizations. First, we discuss the common message transmission case where the source broadcasts the same information to all the receivers, and we provide an upper and a lower bounds on the ergodic secrecy capacity. For this case, we show that the secrecy rate is limited by the legitimate receiver having, on average, the worst main channel link and we prove that a non-zero secrecy rate can still be achieved even when the CSI at the transmitter is noisy. Then, we look at the independent messages case where the transmitter broadcasts multiple messages to the receivers, and each intended user is interested in an independent message. For this case, we present an expression for the achievable secrecy sum-rate and an upper bound on the secrecy sum-capacity and we show that, in the limit of large number of legitimate receivers K, our achievable secrecy sum-rate follows the scaling law log((1-a ) log(K)), where is the estimation error variance of the main CSI. The special cases of high SNR, perfect and no-main CSI are also analyzed. Analytical derivations and numerical results are presented to illustrate the obtained expressions for the case of independent and identically distributed Rayleigh fading channels.
Peter, Daniel; Rietmann, Max; Galvez, Percy; Ampuero, Jean Paul(2017-03-13)[Poster]
High-resolution seismic wave simulations often require local refinements in numerical meshes to accurately capture e.g. steep topography or complex fault geometry. Together with explicit time schemes, this dramatically reduces the global time step size for ground-motion simulations due to numerical stability conditions. To alleviate this problem, local time stepping (LTS) algorithms allow an explicit time stepping scheme to adapt the time step to the element size, allowing nearoptimal time steps everywhere in the mesh. This can potentially lead to significantly faster simulation runtimes.
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.