Abdelfattah, Ahmad; Keyes, David E.; Ltaief, Hatem(2014-05-04)[Poster]
KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.
European Extreme Large Telescope (E-ELT) is a high priority project in ground based astronomy that aims at constructing the largest telescope ever built. MOSAIC is an instrument proposed for E-ELT using Multi- Object Adaptive Optics (MOAO) technique for astronomical telescopes, which compensates for effects of atmospheric turbulence on image quality, and operates on patches across a large FoV.
Kriging algorithms based on FFT, the separability of certain covariance functions and low-rank representations of covariance functions have been investigated. The current study combines these ideas, and so combines the individual speedup factors of all ideas. The reduced computational complexity is O(dLlogL), where L := max ini, i = 1
Many real world networks have inherent community structures, including social networks, transportation networks, biological networks, etc. For large scale networks with millions or billions of nodes in real-world applications, accelerating current community detection algorithms is in demand, and we present two approaches to tackle this issue
-A K-core based framework that can accelerate existing community detection algorithms significantly;
-A parallel inference algorithm via stochastic block models that can distribute the workload.
Hierarchical matrix approximations are a promising tool for approximating low-rank matrices given the compactness of their representation and the economy of the operations between them. Integral and differential operators have been the major applications of this technology, but they can be applied into other areas where low-rank properties exist. Such is the case of the Block Cyclic Reduction algorithm, which is used as a direct solver for the constant-coefficient Poisson quation. We explore the variable-coefficient case, also using Block Cyclic reduction, with the addition of Hierarchical Matrices to represent matrix blocks, hence improving the otherwise O(N2) algorithm, into an efficient O(N) algorithm.
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.