1. Introduction#
1.1. Hardware#
The tensor compiler developed in this book relies on a few primitive operations that are translated by a code generator into highly optimized kernels. In fact, the first few chapters cover the code generators themselves for AArch64, the 64-bit execution state of the Arm architecture. This means that we can run our code on many of the latest smartphone, notebook, desktop and server chips.
In general, the code generation concepts discussed here also apply to other types of computer architectures, such as x86 processors. However, these are not discussed within the scope of this book, so it is recommended recommended to follow along with an AArch64 system. In addition, most of the descriptions are provided only for the Linux operating system, which is recommended for development.
Vendor |
Device |
SoC/CPU |
#Cores |
Microarchitecture |
---|---|---|---|---|
Amazon |
c8g |
AWS Graviton4 |
96 |
Arm Neoverse-V2 |
Ampere |
– |
AmpereOne |
96-192 |
Custom |
NVIDIA |
– |
Grace CPU |
72 |
Arm Neoverse-V2 |
Raspberry Pi Ltd |
Raspberry Pi 5 |
Broadcom BCM2712 |
4 |
Arm Cortex-A76 |
Apple |
Mac Mini |
M4 |
4+6 |
Custom |
Apple |
MacBook Air |
M4 |
4+6 |
Custom |
Table 1.1.1 lists some hardware platforms that are suitable for development. All hardware platforms support Neon for the vector processing described in Section 4. However, only Apple M4 supports the Scalable Matrix Extension (SME) described in Section 5. Also note, that running natively under MacOS requires some changes to the assembly code structure and just-in-time code generation described. Therefore, the easiest way to get started is to use virtualization for development under MacOS, for example by using podman containers.