Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations

Abstract

This paper presents efforts to improve the hierarchical parallelism of a two scale simulation code. Two methods to improve the GPU parallel performance were developed and compared. The first used the NVIDIA Multi-Process Serviceand the second moved the entire sub-problem loop into a single kernel using Kokkos hierarchical parallelism and a PackedView data structure. Both approaches improved parallel performance with the second method providing the greatest improvements.

Publication
arXiv
Jacob Merson
Jacob Merson
Assistant Professor of Mechanical Engineering

loves to scale multiphysics simulations onto leadership class supercomputers