Speaker
Description
This paper addresses key computational challenges inherent to the Gray-Scott reaction-diffusion model, such as memory hotspots and prolonged execution times, challenges that underscore the limitations of relying solely on CPU resources. To overcome these issues, a hybrid CUDA-aware MPI approach is proposed, leveraging the Kokkos library to effectively distribute computation across both CPU and GPU, with an implementation that integrates the ADIOS2 library for efficient I/O and employs Kokkos to manage the computational workload, thereby shifting the primary execution focus from the host to the device processor.
Optimization efforts concentrated on CUDA-specific execution led to an average speedup exceeding 100×, alongside a marked improvement in GPU thread utilization, ensuring maximal exploitation of allocated resources throughout runtime. Furthermore, by employing Kokkos from the ground up - for both initial code development and subsequent optimization - the solution achieves high portability, enabling these performance gains to generalize across diverse hardware platforms, from consumer-grade systems to high-end supercomputers, the backbone of HPC, irrespective of the underlying CPU or GPU architecture.