1. Introduction#

1.1. Hardware#

The tensor compiler developed in this book relies on a few primitive operations that are translated by a code generator into highly optimized kernels. In fact, the first few chapters cover the code generators themselves for AArch64, the 64-bit execution state of the Arm architecture. This means that we can run our code on many of the latest smartphone, notebook, desktop and server chips.

In general, the code generation concepts discussed here also apply to other types of computer architectures, such as x86 processors. However, these are not discussed within the scope of this book, so it is recommended to follow along with an AArch64 system. In addition, most of the descriptions are provided only for the Linux operating system, which is recommended for development.

Table 1.1.1 Recommended and recent AArch64 development platforms.#
Vendor	Device	SoC/CPU	#Cores	Microarchitecture
Amazon	c8g	AWS Graviton4	96	Arm Neoverse-V2
Ampere	–	AmpereOne	96-192	Custom
NVIDIA	–	Grace CPU	72	Arm Neoverse-V2
Raspberry Pi Ltd	Raspberry Pi 5	Broadcom BCM2712	4	Arm Cortex-A76
Apple	Mac Mini	M4	4+6	Custom
Apple	MacBook Air	M4	4+6	Custom

Table 1.1.1 lists some hardware platforms that are suitable for development. All hardware platforms support Neon for the vector processing described in Section 4. However, only Apple M4 supports the Scalable Matrix Extension (SME) described in Section 5. Also note that running natively under MacOS requires some changes to the assembly code structure and just-in-time code generation described. Therefore, the easiest way to get started is to use virtualization for development under MacOS, for example by using Podman containers.

Introduction

Contents

1. Introduction#

1.1. Hardware#