C9
HPC 2

Back to overview

17:35
conference time (CEST, Berlin)
Improving the Performance of Engineering Codes
26/10/2021 17:35 conference time (CEST, Berlin)
Room: C
F. Panichi, F. Hosseini (Numerical Algorithms Group Ltd, GBR)
F. Panichi, F. Hosseini (Numerical Algorithms Group Ltd, GBR)
Parallel computing is an essential tool for engineering simulation codes, whether they run on desktops with a few computing cores, use accelerator hardware such as GPUs, or require High Performance Computing (HPC) capabilities. Improving the efficiency of codes running on these facilities either speeds up time to solution, allows for larger, more challenging problems to be solved or reduces compute costs. However, the task of understanding the performance bottlenecks of parallel codes and making improvements often ends up being a daunting trial and error process. Our experience shows that there is often a lack of a quantitative understanding of the actual behaviour of HPC applications. The Performance Optimisation and Productivity (POP) Centre of Excellence, funded by the EU under the Horizon 2020 Research and Innovation Programme, fills this gap by promoting a set of hierarchical metrics which provide a standard, objective way to characterise different aspects of the performance of parallel codes. These metrics are quick to compute. They identify issues such as memory bottlenecks, communication inefficiencies and load imbalances and enable a better understanding of program efficiency and the identification of target kernels for code refactoring. We can work on these computational kernels and advise how to roll out improvements to your whole application. In this talk, we will describe how to apply the POP performance assessment methodology using open-source tools. We will also review examples of performance assessments for engineering codes and the improvements which were then made. POP has the tools and expertise to analyse all aspects of performance from single processor efficiency to the scalability of large parallel codes. We work with programs written in most languages and parallel paradigms, including MPI, OpenMP, CUDA, OpenCL and OpenACC. Funded by the EU, POP services are available to EU and UK organizations, whether academic or commercial, free of charge.
High Performance Computing, Parallel Code, Software Performance
17:55
conference time (CEST, Berlin)
The Effect of HDR InfiniBand and In-Network Computing on CAE Simulations
26/10/2021 17:55 conference time (CEST, Berlin)
Room: C
O. Maor (HPC-AI Advisory Council, USA)
O. Maor (HPC-AI Advisory Council, USA)
High-performance computing (HPC) technologies are used in the engineering and automotive design and manufacturing industry. One of the applications is the computer-aided engineering (CAE), from component-level design to full analyses such as: crash simulations, structure integrity, thermal management, climate control, modeling, acoustics, and much more. HPC helps drive faster time-to-market, realizing significant cost reductions over laboratory testing and tremendous flexibility. HPC’s strength and efficiency depend on the ability to achieve sustained top performance by driving the CPU performance toward its limits. The motivation for high-performance computing has long been its tremendous cost savings and product improvements; the cost of a high-performance compute cluster can be just a fraction of the price of a single crash test for example, and the same cluster can serve as the platform for every test simulation going forward. The recent trends in cluster environments, such as multi-core CPUs, GPUs, and advanced high speed, low latency interconnect with offloading capabilities, are changing the dynamics of cluster-based simulations. Software applications are being reshaped for higher degrees of parallelism and multithreading, and hardware is being reconfigured to solve new emerging bottlenecks to maintain high scalability and efficiency. CAE Applications are widely used and provide better flexibility, scalability, and efficiency for such simulations, allowing for larger problem sizes and speeding up time to results. HPC Applications relies on Message Passing Interface (MPI), the de-facto messaging library for high performance clusters that is used for node-to-node inter-process communication (IPC). MPI relies on a fast, unified server and storage interconnect to provide low latency and high messaging rate. Performance demands from the cluster interconnect increase exponentially with scale due in part to all-to-all communication patterns. This demand is even more dramatic as simulations involve greater complexity to properly simulate physical model behaviours. In this paper we will focus on the value of In-Network computing for HDR InfiniBand Networks for CAE application for few CPU architecture.
infiniband, hdr, networking, in-network
18:15
conference time (CEST, Berlin)
GPU Developments of an Open Source CFD Software
26/10/2021 18:15 conference time (CEST, Berlin)
Room: C
S. Posey (NVIDIA, USA); M. Martineau (NVIDIA Ltd., GBR)
S. Posey (NVIDIA, USA); M. Martineau (NVIDIA Ltd., GBR)
Current trends in high performance computing include the use of graphics processing units (GPUs) as massively parallel co-processors to CPUs that can accelerate numerical operations common to computational fluid dynamics (CFD) solvers. GPU-parallel CFD achieves speedups with CPUs from additional fine grain, or second-level parallelism under existing CPU-based distributed memory, or first-level scalable parallelism. For most CFD implementations, the GPU focus is on implicit sparse iterative solvers whereby linear algebra matrix operations that would be processed on the CPU are offloaded to the GPU for numerical acceleration, resulting in an overall simulation speedup. During 2019, the OpenCFD OpenFOAM HPC Technical Committee introduced the PETSc4FOAM library that permits the plug-in of external solvers that conform to PETSc formats. A collaboration among members of the HPC TC developed an external solver for GPU offload of OpenFOAM pressure solves based on the Open Source AmgX solver library that was first introduced in 2012. In the initial implementation, the OpenFOAM system matrix is copied from the CPU to GPU, and the AmgX library applies an AMG preconditioner to a preconditioned conjugate gradient (PCG) linear solve for acceleration of the pressure solve. Results are copied back to the CPU which completes the OpenFOAM simulation on the CPU as the end-user normally observes. This work will preview details of the AmgX development and its integration with OpenFOAM for multi-GPU and multi-node computations. Initial experiments with the standard benchmarks of (i) 3d lid-driven cavity and (ii) motorbike cases demonstrate that AmgX can achieve as much as a 9x speedup of the pressure solve on a the GPU vs. an OpenFOAM GAMG-PCG solve on a dual-CPU server node. With this OpenCFD community-supported solution, frequent software updates made to community-based libraries like PETSc, AmgX, and PETSc4FOAM will ensure that OpenFOAM users naturally benefit from the latest system software, compilers, system and processor hardware architectures, and OpenFOAM future releases. Future work will be described which will include investigations of more complex geometries and turbulence treatment such as LES. In addition the TC plans parallel scaling optimizations for strong scaling across computational nodes of multiple GPUs, and to explore the potential for more OpenFOAM code applied to GPU acceleration such as matrix assembly procedures. The overall objective of the TC collaboration and current and future contributions is towards a community supported GPU-enabled OpenFOAM for CFD practice at industry scale.
OpenFOAM, CFD, GPU, AmgX, PETSc
×

[TITLE]

[LISTING

[ABSTRACT]

[DATE]

[ROOM]

[KEYWORDS]