11. Back to the Compiler

In this section we’ll move back to high-level C/C++ code. Specifically we’ll study the compilers’ automatic vectorization capabilities. As discussed in the lectures, almost all compilers support automatic vectorization. Auto-vectorization is an optimization step where the compiler analyzes our code, mostly in terms of inner loops, and inserts vector instructions if possible. Our running example will be the triad given in a simple implementation through:

Listing 11.1 File triad.cpp which implements the triad function in C/C++.
 1#include "triad.h"
 2
 3void triad_simple( uint64_t         i_n_values,
 4                   float    const * i_a,
 5                   float    const * i_b,
 6                   float          * o_c ) {
 7  for( uint64_t l_va = 0; l_va < i_n_values; l_va++ ) {
 8    o_c[l_va] = i_a[l_va] + 2.0f * i_b[l_va];
 9  }
10}

To ease your developments a code frame is provided in the archive triad.tar.xz. It contains a simple implementation of the triad and supports two command-line arguments:

  • The first argument gives the number of entries in each of the three arrays.

  • The second argument specifies the number of repeated executions.

The code frame benchmarks the triad example and reports the following performance metrics:

  • The time it took to execute the benchmark.

  • The sustained floating point performance in terms of FP32 GFLOPS.

  • The sustained memory bandwidth in terms of the requested memory transfers. Note, that this might, depending on your testbed, differ from actual memory transfers in hardware.

Tasks

  1. Read the Coding Considerations of the Arm Compiler Scalable Vector Extension User Guide Version. Name three hints which you consider most helpful for your future work and explain why. Explain at least one pragma which guides auto-vectorization.

  2. Build the code with the GCC and LLVM toolchains. Generate vectorization reports with both toolchains. Convince at least one compiler to generate SVE code and disassemble it.

    Hint

    Documentation on vectorization reports is available in the respective GCC and LLVM documents.

  3. Verify optimization levels and flags for enabling/disabling vectorization discussed in the lectures.

  4. Illustrate the impact of auto-vectorization by using an 1024 values for each of the three arrays. As always, repeat the experiment often enough such that the runtime exceeds one second.

  5. Identify requirements on loops such that they are vectorizable. Now, deliberately break the auto-vectorization of the compiler by rewriting the triad function. Do this at least by making the loop uncountable and by using an external function inside of the loop. Confirm your experiments through respective vectorization reports and by disassembling the generated code.