Article

 

Comparison between Pure MPI and Hybrid MPI-OpenMP Parallelism for Discrete Element Method (DEM) of Ellipsoidal and Poly-ellipsoidal Particles Public Deposited

Downloadable Content

Download PDF
https://scholar.colorado.edu/concern/articles/f1881n23v
Abstract
  • Parallel computing of 3D Discrete Element Method (DEM) simulations can be achieved in different modes, and two of them are pure MPI and hybrid MPI-OpenMP. The hybrid MPI-OpenMP mode allows flexibly combined mapping schemes on contemporary multiprocessing supercomputers. This paper profiles computational components and floating-point operation features of complex-shaped 3D DEM, develops a space decomposition-based MPI parallelism and various thread-based OpenMP parallelism, and carries out performance comparison and analysis from intranode to internode scales across four orders of magnitude of problem size (namely, number of particles). The influences of memory/cache hierarchy, processes/threads pinning, variation of hybrid MPI-OpenMP mapping scheme, ellipsoid versus poly-ellipsoid are carefully examined. It is found that OpenMP is able to achieve high efficiency in interparticle contact detection, but the unparallelizable code prevents it from achieving the same high efficiency for overall performance; pure MPI achieves not only lower computational granularity (thus higher spatial locality of particles) but also lower communication granularity (thus faster MPI transmission) than hybrid MPI-OpenMP using the same computational resources; the cache miss rate is sensitive to the memory consumption shrinkage per processor, and the last level cache contributes most significantly to the strong superlinear speedup among all of the three cache levels of modern microprocessors; in hybrid MPI-OpenMPI mode, as the number of MPI processes increases (and the number of threads per MPI processes decreases accordingly), the total execution time decreases, until the maximum performance is obtained at pure MPI mode; the processes/threads pinning on NUMA architectures improves performance significantly when there are multiple threads per process, whereas the improvement becomes less pronounced when the number of threads per process decreases; both the communication time and computation time increase substantially from ellipsoids to poly-ellipsoids. Overall, pure MPI outperforms hybrid MPI-OpenMP in 3D DEM modeling of ellipsoidal and poly-ellipsoidal particles.

Creator
Academic Affiliation
Journal Title
Journal Volume
  • 6
Last Modified
  • 2021-07-19
Resource Type
Rights Statement
DOI
ISSN
  • 2196-4386
Language

Relationships

Items