Slides of the presentation Tensor Processing Primitives on Arm Processors at ISC22.

Last week was the time of two major parallel computing and HPC events, the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS) and ISC High Performance 2022 (ISC22).

Our lab presented the research paper Next-Generation Local Time Stepping for the ADER-DG Finite Element Method at IPDPS (slides). Further, we presented the current status of bringing tensor processing primitives to Arm processors at the 4th Annual Arm HPC Users Group Workshop. The discussed results include the performance of JITted small matrix multiplication kernels for a large range of processors, i.e., Fujitsu’s A64FX (ASIMD and SVE), Ampere’s Altra (ASIMD), Amazon’s Graviton2 (ASIMD) and Graviton3 (ASIMD and SVE), and Apple’s M1 (ASIMD and AMX). As can be seen in the slides of the presentation, our added support for the best-suited extensions of the Arm Architecture is crucial for unleashing the full potential of the respective processors.