AbdulJabbar, Mustafa Abdulmajeed; Al Farhan, Mohammed; Al-Harthi, Noha A.; Chen, Rui; Yokota, Rio; Bagci, Hakan; Keyes, David E.(SIAM Journal on Scientific Computing, Society for Industrial & Applied Mathematics (SIAM), 2019-06-20)[Article]
We present a boundary integral equation solver for wave scattering suited for manycore processors, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic and architecture-oriented optimizations are essential for achieving worthy performance. We use the GMRES iterative method and the fast multipole method (FMM) to implement the matrix-vector product (MatVec) kernel. We implement highly optimized kernels for both shared- and distributed-memory architectures and provide various performance models for tuning the task-based tree traversal implementation of FMM. We develop optimal architecture-specific and algorithm-aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. Task-based traversal achieves roughly 77\% and 60\% of the peak single precision floating point performance of a 56-core Skylake Scalable processor and a 72-core Knights Landing processor, respectively. These represent approximately sevenfold speedups compared to the baseline scalar code. We report weak scalability in accordance with the best possible O(log P) communication complexity and the theoretical scaling complexity of FMM. These result in up to 85\% efficiency in strong scaling while computing in excess of 2 billion degrees of freedom (DoFs) on the full-scale Cray XC40. To demonstrate the applicability of the proposed implementation of FMM, we use it in the solution of the acoustic scattering problem involving soft targets. This calls for the discretization of an integral equation with an oscillatory kernel and (weak) 1/R singularity. Applications of FMM to the problems involving hard targets, which call for implementation of more complicated singularity extraction/cancellation schemes to account for 1/R2 singularity, or to Burton-Miller surface integral formulation, which results in a well-conditioned matrix system, are not considered in this work, but the efficiency and scalability results demonstrated here are expected to be observed for these cases as well. The numerical results are converged to within 1.0e-4 relative 2-norm residual accuracy of the analytical solution for the sphere, and a multiple flower-shaped scatterer displays comparable algebraic convergence in a more general setting.
Export search results
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.