Scalable Hierarchical Algorithms for eXtreme Computing (SHAXC-2) Workshop 2014
Recent Submissions
-
Preconditioned Inexact Newton for Nonlinear Sparse Electromagnetic Imaging(2014-05-04) [Poster]Newton-type algorithms have been extensively studied in nonlinear microwave imaging due to their quadratic convergence rate and ability to recover images with high contrast values. In the past, Newton methods have been implemented in conjunction with smoothness promoting optimization/regularization schemes. However, this type of regularization schemes are known to perform poorly when applied in imagining domains with sparse content or sharp variations. In this work, an inexact Newton algorithm is formulated and implemented in conjunction with a linear sparse optimization scheme. A novel preconditioning technique is proposed to increase the convergence rate of the optimization problem. Numerical results demonstrate that the proposed framework produces sharper and more accurate images when applied in sparse/sparsified domains.
-
Highly Accurate Discontinuous Galerkin Method for the Maxwell Equations(2014-05-04) [Poster]
-
A Novel Time Domain Method for Simulating Dissipative Electromagnetic Field Interactions(2014-05-04) [Poster]
-
Kriging accelerated by orders of magnitude: combining low-rank with FFT techniques(2014-05-04) [Poster]Kriging algorithms based on FFT, the separability of certain covariance functions and low-rank representations of covariance functions have been investigated. The current study combines these ideas, and so combines the individual speedup factors of all ideas. The reduced computational complexity is O(dLlogL), where L := max ini, i = 1
-
Implicit Unstructured Computational Aerodynamics on Many-Integrated Core Architecture(2014-05-04) [Poster]This research aims to understand the performance of PETSc-FUN3D, a fully nonlinear implicit unstructured grid incompressible or compressible Euler code with origins at NASA and the U.S. DOE, on many-integrated core architecture and how a hybridprogramming paradigm (MPI+OpenMP) can exploit Intel Xeon Phi hardware with upwards of 60 cores per node and 4 threads per core. For the current contribution, we focus on strong scaling with many-integrated core hardware. In most implicit PDE-based codes, while the linear algebraic kernel is limited by the bottleneck of memory bandwidth, the flux kernel arising in control volume discretization of the conservation law residuals and the preconditioner for the Jacobian exploits the Phi hardware well.
-
Fast Multipole-Based Preconditioner for Sparse Iterative Solvers(2014-05-04) [Poster]Among optimal hierarchical algorithms for the computational solution of elliptic problems, the Fast Multipole Method (FMM) stands out for its adaptability to emerging architectures, having high arithmetic intensity, tunable accuracy, and relaxed global synchronization requirements. We demonstrate that, beyond its traditional use as a solver in problems for which explicit free-space kernel representations are available, the FMM has applicability as a preconditioner in finite domain elliptic boundary value problems, by equipping it with boundary integral capability for finite boundaries and by wrapping it in a Krylov method for extensibility to more general operators. Compared with multilevel methods, it is capable of comparable algebraic convergence rates down to the truncation error of the discretized PDE, and it has superior multicore and distributed memory scalability properties on commodity architecture supercomputers.
-
Asynchronous Execution of the Fast Multipole Method Using Charm++(2014-05-04) [Poster]
-
Fast Fourier Transform Pricing Method for Exponential Lévy Processes(2014-05-04) [Poster]We describe a set of partial-integro-differential equations (PIDE) whose solutions represent the prices of european options when the underlying asset is driven by an exponential L´evy process. Exploiting the L´evy -Khintchine formula, we give a Fourier based method for solving this class of PIDEs. We present a novel L1 error bound for solving a range of PIDEs in asset pricing and use this bound to set parameters for numerical methods.
-
Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi?GPU System(2014-05-04) [Poster]European Extreme Large Telescope (E-ELT) is a high priority project in ground based astronomy that aims at constructing the largest telescope ever built. MOSAIC is an instrument proposed for E-ELT using Multi- Object Adaptive Optics (MOAO) technique for astronomical telescopes, which compensates for effects of atmospheric turbulence on image quality, and operates on patches across a large FoV.
-
Enabling High Performance Large Scale Dense Problems through KBLAS(2014-05-04) [Poster]KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.
-
Community Detection for Large Graphs(2014-05-04) [Poster]Many real world networks have inherent community structures, including social networks, transportation networks, biological networks, etc. For large scale networks with millions or billions of nodes in real-world applications, accelerating current community detection algorithms is in demand, and we present two approaches to tackle this issue -A K-core based framework that can accelerate existing community detection algorithms significantly; -A parallel inference algorithm via stochastic block models that can distribute the workload.
-
Nyström-discretized Magnetic Field Integral Equation for 2D Electromagnetic Scattering(2014-05-04) [Poster]