Scalable Hierarchical Algorithms for eXtreme Computing (SHAXC2) Workshop 2014
Recent Submissions

Hierarchical matrix techniques for the solution of elliptic equations(20140504)Hierarchical matrix approximations are a promising tool for approximating lowrank matrices given the compactness of their representation and the economy of the operations between them. Integral and differential operators have been the major applications of this technology, but they can be applied into other areas where lowrank properties exist. Such is the case of the Block Cyclic Reduction algorithm, which is used as a direct solver for the constantcoefficient Poisson quation. We explore the variablecoefficient case, also using Block Cyclic reduction, with the addition of Hierarchical Matrices to represent matrix blocks, hence improving the otherwise O(N2) algorithm, into an efficient O(N) algorithm.

Pipelining Computational Stages of the Tomographic Reconstructor for MultiObject Adaptive Optics on a Multi?GPU System(20140504)European Extreme Large Telescope (EELT) is a high priority project in ground based astronomy that aims at constructing the largest telescope ever built. MOSAIC is an instrument proposed for EELT using Multi Object Adaptive Optics (MOAO) technique for astronomical telescopes, which compensates for effects of atmospheric turbulence on image quality, and operates on patches across a large FoV.

Kriging accelerated by orders of magnitude: combining lowrank with FFT techniques(20140504)Kriging algorithms based on FFT, the separability of certain covariance functions and lowrank representations of covariance functions have been investigated. The current study combines these ideas, and so combines the individual speedup factors of all ideas. The reduced computational complexity is O(dLlogL), where L := max ini, i = 1

Fast MultipoleBased Preconditioner for Sparse Iterative Solvers(20140504)Among optimal hierarchical algorithms for the computational solution of elliptic problems, the Fast Multipole Method (FMM) stands out for its adaptability to emerging architectures, having high arithmetic intensity, tunable accuracy, and relaxed global synchronization requirements. We demonstrate that, beyond its traditional use as a solver in problems for which explicit freespace kernel representations are available, the FMM has applicability as a preconditioner in finite domain elliptic boundary value problems, by equipping it with boundary integral capability for finite boundaries and by wrapping it in a Krylov method for extensibility to more general operators. Compared with multilevel methods, it is capable of comparable algebraic convergence rates down to the truncation error of the discretized PDE, and it has superior multicore and distributed memory scalability properties on commodity architecture supercomputers.

Community Detection for Large Graphs(20140504)Many real world networks have inherent community structures, including social networks, transportation networks, biological networks, etc. For large scale networks with millions or billions of nodes in realworld applications, accelerating current community detection algorithms is in demand, and we present two approaches to tackle this issue A Kcore based framework that can accelerate existing community detection algorithms significantly; A parallel inference algorithm via stochastic block models that can distribute the workload.

Predictive Performance Tuning of OpenACC Accelerated Applications(20140504)Graphics Processing Units (GPUs) are gradually becoming mainstream in supercomputing as their capabilities to significantly accelerate a large spectrum of scientific applications have been clearly identified and proven. Moreover, with the introduction of high level programming models such as OpenACC [1] and OpenMP 4.0 [2], these devices are becoming more accessible and practical to use by a larger scientific community. However, performance optimization of OpenACC accelerated applications usually requires an indepth knowledge of the hardware and software specifications. We suggest a predictionbased performance tuning mechanism [3] to quickly tune OpenACC parameters for a given application to dynamically adapt to the execution environment on a given system. This approach is applied to a finite difference kernel to tune the OpenACC gang and vector clauses for mapping the compute kernels into the underlying accelerator architecture. Our experiments show a significant performance improvement against the default compiler parameters and a faster tuning by an order of magnitude compared to the brute force search tuning.

Fast Fourier Transform Pricing Method for Exponential Lévy Processes(20140504)We describe a set of partialintegrodifferential equations (PIDE) whose solutions represent the prices of european options when the underlying asset is driven by an exponential L´evy process. Exploiting the L´evy Khintchine formula, we give a Fourier based method for solving this class of PIDEs. We present a novel L1 error bound for solving a range of PIDEs in asset pricing and use this bound to set parameters for numerical methods.

Preconditioned Inexact Newton for Nonlinear Sparse Electromagnetic Imaging(20140504)Newtontype algorithms have been extensively studied in nonlinear microwave imaging due to their quadratic convergence rate and ability to recover images with high contrast values. In the past, Newton methods have been implemented in conjunction with smoothness promoting optimization/regularization schemes. However, this type of regularization schemes are known to perform poorly when applied in imagining domains with sparse content or sharp variations. In this work, an inexact Newton algorithm is formulated and implemented in conjunction with a linear sparse optimization scheme. A novel preconditioning technique is proposed to increase the convergence rate of the optimization problem. Numerical results demonstrate that the proposed framework produces sharper and more accurate images when applied in sparse/sparsified domains.