Changelog ========= This class is dynamic, i.e., its contents are continuously updated. 2023 Updates ------------ In this year we will put a special emphasis on the `Neoverse V1 `__ microarchitecture. V1 is `Armv8.4-A `__ and used in the `Graviton3 `__. It is among the first server processors which supports the Scalable Vector Extension (SVE). Compared to `A64FX `__'s 512-bit SVE vector units, V1 has 256-bit SVE vector units. Further, V1 implements `SVE Bfloat16 `__ instructions which can be used to speedup small matrix-matrix multiplications (a key workload of the class) with BF16 inputs and FP32 accumulation. Students of the 2023 class will also have access to the `Ookami `__ machine at Stony Brook University. Ookami has 176 A64FX compute nodes which allows for testing at-scale. 2022 Updates ------------ This year's biggest change is a strong focus on Scalable Vector Extension (SVE). SVE is one of the most mature vector extension sets to-date. For example, A64FX, Graviton 3, Snapdragon 8 Gen 1 and Exynos 2200 are important processors already relying on SVE for performance. More SVE-based designs will be released soon. While SVE was already a topic in the 2021 class, we had to use an instruction emulator for this part. This means that the students only ran SVE instructions in software and could only test their programs' functional behavior but not assess hardware performance. In this year, we'll use the `Future Technologies Partition `__ of `NHR@KIT `__ to develop SVE kernels for A64FX.