Introduction

Contents

1. Introduction#

1.1. Hardware#

The tensor compiler developed in this book relies on a few primitive operations that are translated by a code generator into highly optimized kernels. In fact, the first few chapters cover the code generators themselves for AArch64, the 64-bit execution state of the Arm architecture. This means that we can run our code on many of the latest smartphone, notebook, desktop and server chips.

In general, the code generation concepts discussed here also apply to other types of computer architectures, such as x86 processors. However, these are not discussed within the scope of this book, so it is recommended recommended to follow along with an AArch64 system. In addition, most of the descriptions are provided only for the Linux operating system, which is recommended for development.

Table 1.1.1 Recommended and recent AArch64 development platforms.#

Vendor

Device

SoC/CPU

#Cores

Microarchitecture

Amazon

c8g

AWS Graviton4

96

Arm Neoverse-V2

Ampere

AmpereOne

96-192

Custom

NVIDIA

Grace CPU

72

Arm Neoverse-V2

Raspberry Pi Ltd

Raspberry Pi 5

Broadcom BCM2712

4

Arm Cortex-A76

Apple

Mac Mini

M4

4+6

Custom

Apple

MacBook Air

M4

4+6

Custom

Table 1.1.1 lists some hardware platforms that are suitable for development. All hardware platforms support Neon for the vector processing described in Section 4. However, only Apple M4 supports the Scalable Matrix Extension (SME) described in Section 5. Also note, that running natively under MacOS requires some changes to the assembly code structure and just-in-time code generation described. Therefore, the easiest way to get started is to use virtualization for development under MacOS, for example by using podman containers.