.. _ch:alu: Arithmetic Logic Unit ====================== This lab implements an Arithmetic Logic Unit (ALU). ALUs build the computational heart of most processors. As shown in :numref:`tab:alu_funcs`, our ALU will support a series of functions. We can choose the ALU's desired functionality through a two-bit control signal. .. _tab:alu_funcs: .. table:: Functions supported by our ALU depending on a two-bit control signal. +----------------+----------+ | Control Signal | Function | +================+==========+ | 2'b00 | add | +----------------+----------+ | 2'b01 | sub | +----------------+----------+ | 2'b10 | and | +----------------+----------+ | 2'b11 | or | +----------------+----------+ In :numref:`ch:alu_basic` we'll get started by designing the pieces of the ALU which are required for the functions in :numref:`tab:alu_funcs`. Once all basic features are in place and well-tested, we'll move ahead and implement a basic ALU. :numref:`ch:alu_basic_fpga` tasks will bring our design to the DE10-Lite board and allow us to control the ALU through the board's switch buttons. We'll then extend our initial ALU by adding condition flags in :numref:`ch:alu_nzcv_design`. As discussed further in :numref:`ch:alu_condition_flags`, the condition flags allow us to interpret the results of our operations. Lastly, :numref:`ch:alu_nzcv_fpga` deploys the extended ALU to the DE10-Lite board. .. _ch:alu_basic: Designing a Basic ALU --------------------- .. figure:: /chapters/data_alu/alu_basic.svg :name: fig:alu_basic :width: 99% Illustration of our ALU's basic part for 64-bit inputs ``i_a`` and ``i_b``, and the 2-bit control signal ``i_alu_ctrl``. This section designs the initial part of our ALU which is shown in :numref:`fig:alu_basic`. The two N-bit values ``i_a`` and ``i_b``, and the control signal ``i_alu_ctrl`` represent the inputs of our basic ALU. Outputs are given through the N-bit value ``o_result`` and the carry out ``o_carry_out``. Building Blocks ^^^^^^^^^^^^^^^ We identify three submodules, which we'll design with the following declarations before tackling the entire ALU: #. An N-bit adder. .. code-block:: systemverilog module adder #(parameter N) ( input logic [N-1:0] i_a, i_b, input logic i_carry_in, output logic [N-1:0] o_s, output logic o_carry_out ); #. A 2:1 multiplexer which is used as an input to the N-bit adder. .. code-block:: systemverilog module mux_2 #(parameter N) ( input logic [N-1:0] i_in0, i_in1, input logic i_s, output logic [N-1:0] o_out ); #. A 4:1 multiplexer which is used to select the final result. .. code-block:: systemverilog module mux_4 #(parameter N) ( input logic [N-1:0] i_in0, i_in1, i_in2, i_in3, input logic [1:0] i_s, output logic [N-1:0] o_out ); .. admonition:: Tasks #. Implement the module ``adder`` in the file ``adder.sv``. Test your implementation in the testbench ``adder_tb`` in the file ``adder_tb.sv``. Check at least three test cases! #. Implement the module ``mux_2`` in the file ``mux_2.sv``. Test your implementation in the testbench ``mux_2_tb`` in the file ``mux_2_tb.sv``. Check at least three test cases! #. Implement the module ``mux_4`` in the file ``mux_4.sv``. Use the previously implemented module ``mux_2`` in your implementation of ``mux_4``! Test your implementation in the testbench ``mux_4_tb`` in the file ``mux_4_tb.sv``. Check at least three test cases! Putting the Parts Together ^^^^^^^^^^^^^^^^^^^^^^^^^^ We have all building blocks at hand to implement the basic version of our ALU. We'll use the following SystemVerilog declaration (see also :numref:`fig:alu_basic`): .. code-block:: systemverilog :name: lst:alu_basic :caption: Declaration of our basic ALU. module alu #(parameter N) ( input logic [N-1:0] i_a, i_b, input logic [1:0] i_alu_ctrl, output logic [N-1:0] o_result, output logic o_carry_out ); Before going into the details of the design, let's derive some example inputs and outputs. These also build the minimum example values for our ALU's testbench. .. _tab:basic_alu_exs: .. table:: Example inputs and outputs for our basic ALU. +--------------+--------------+------------+--------------+-------------+ | i_a | i_b | i_alu_ctrl | o_result | o_carry_out | +==============+==============+============+==============+=============+ | 8'b0000_0000 | 8'b0000_0000 | 2'b00 | 8'b0000_0000 | 1'b0 | +--------------+--------------+------------+--------------+-------------+ | 8'b1011_1101 | 8'b1010_0101 | 2'b00 | 8'b0110_0010 | 1'b1 | +--------------+--------------+------------+--------------+-------------+ | 8'b1011_1101 | 8'b1010_0101 | 2'b01 | | | +--------------+--------------+------------+--------------+-------------+ | 8'b1011_1101 | 8'b1010_0101 | 2'b10 | | | +--------------+--------------+------------+--------------+-------------+ | 8'b1011_1101 | 8'b1010_0101 | 2'b11 | | | +--------------+--------------+------------+--------------+-------------+ .. admonition:: Tasks #. Fill in the missing parts of :numref:`tab:basic_alu_exs`. #. Implement the module ``alu`` in the file ``alu.sv``. Test your implementation in the testbench ``alu_tb`` in the file ``alu_tb.sv``. Add respective tests for all examples in :numref:`tab:basic_alu_exs` to ``alu_tb``. #. Generate a waveform plot illustrating the application of your ALU w.r.t. :numref:`tab:basic_alu_exs`'s inputs. Limit your plot to these inputs and change the input every 10ps, i.e., visualize 50ps total. The plot shall show all inputs, i.e., ``i_a``, ``i_b`` and ``i_alu_ctrl``, and all outputs, i.e., ``o_result``, ``o_carry_out``. .. _ch:alu_basic_fpga: Basic ALU in Praxis -------------------- .. figure:: /chapters/data_alu/pic_alu_basic_0.jpg :name: fig:pic_alu_basic_0 :width: 100% Picture of the deployed arithmetic logic unit. Shown is the configuration for inputs ``i_a[3:0]=4'b0011``, ``i_b[3:0]=4'b0010`` and ``i_alu_ctrl[1:0]=2'b00``. Our design is finished and the simulations look promising: Lets put the ALU into production! An example configuration of a deployed design is shown in :numref:`fig:pic_alu_basic_0`. For this we write a top-level module ``alu_de10_lite`` for the DE10-Lite board. The module instantiates a 4-bit version of our module ``alu`` and maps the board's switch buttons to the inputs ``i_a[3:0]``, ``i_b[3:0]`` and ``i_alu_ctrl[1:0]``. Specifically, we wire ``SW[3:0]`` to ``i_a[3:0]``, ``SW[7:4]`` to ``i_b[3:0]``, and ``SW[9:8]`` to ``i_alu_ctrl[1:0]``. As done for the tiny calculator in :numref:`ch:tiny_calculator`, we show the input ``i_a[3:0]`` on display ``HEX0`` and ``i_b[3:0]`` on display ``HEX1``. The result ``o_result[3:0]`` goes to display ``HEX2``. However, we use a different strategy for the control signal ``i_alu_ctrl[1:0]`` and the carry out ``o_carry_out``. We show the first bit of the control signal, i.e., ``i_alu_ctrl[0:0]``, by activating ``LEDR0`` if the signal is 1, and the second one, i.e., ``i_alu_ctrl[1:1]``, through ``LEDR1``. Further, we illustrate a non-zero carry out by wiring ``o_carry_out`` to ``LEDR7``. .. _tab:basic_alu_fpga: .. table:: Example inputs and outputs for our 4-bit ALU when deployed on a DE10-Lite board. +--------------+--------------+------------+--------------+-------------+ | i_a | i_b | i_alu_ctrl | o_result | o_carry_out | +==============+==============+============+==============+=============+ | 4'b0011 | 4'b0010 | 2'b00 | 4'b0101 | 1'b0 | +--------------+--------------+------------+--------------+-------------+ | 4'b1011 | 4'b1010 | 2'b01 | 4'b0001 | 1'b1 | +--------------+--------------+------------+--------------+-------------+ | 4'b1001 | 4'b1110 | 2'b10 | 4'b1000 | 1'b1 | +--------------+--------------+------------+--------------+-------------+ | 4'b1001 | 4'b1110 | 2'b11 | 4'b1111 | 1'b0 | +--------------+--------------+------------+--------------+-------------+ .. literalinclude:: data_alu/alu_de10_lite.sv :linenos: :language: systemverilog :caption: Template for the module ``alu_de10_lite``. :name: lst:alu_de10_lite .. admonition:: Tasks #. Implement the top-level module ``alu_de10_lite``. Use the template in :numref:`lst:alu_de10_lite`. #. Compile your finished ALU in Quartus Prime and program the FPGA of a DE10-Lite board. #. Make sure that the boards shows the correct results for the inputs in :numref:`tab:basic_alu_fpga`. Provide a picture of the board for each of the inputs. .. _ch:alu_nzcv_design: NZCV-Extended ALU ----------------- Now, let's extend our ALU by a set of condition flags which provide information about our ALU's results. Specifically, we'll introduce the NZCV flags whose meaning is given in :numref:`tab:nzcv_flags` These flags are commonly used in praxis and often a large variety of instructions relies on them. For example, the Arm ISA uses `NZCV flags <https://developer.arm.com/documentation/ddi0601/2021-09/AArch64-Registers/NZCV--Condition-Flags?lang=en>`_ extensively for `conditional branching <https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/condition-codes-1-condition-flags-and-codes>`_. .. _tab:nzcv_flags: .. table:: Meaning of the NZCV flags. +----------+------------------------------------+ | Flag | Meaning | +==========+====================================+ | Negative | The output of the ALU is negative. | +----------+------------------------------------+ | Zero | The output of the ALU is zero. | +----------+------------------------------------+ | Carry | The adder produced a carry out. | +----------+------------------------------------+ | oVerflow | The adder overflowed. | +----------+------------------------------------+ .. code-block:: systemverilog :name: lst:alu_nzcv :caption: Declaration of the extended ALU. module alu_nzcv #(parameter N) ( input logic [N-1:0] i_a, i_b, input logic [1:0] i_alu_ctrl, output logic [N-1:0] o_result, output logic [3:0] o_nzcv ); We implement the NZCV flags by extending our original module shown in :numref:`fig:alu_basic`. The SystemVerilog declaration of the extended design is given in :numref:`lst:alu_nzcv`. Compared to the declaration of our initial ALU given in :numref:`lst:alu_basic`, the module ``alu_nzcv`` does not output the carry out ``o_carry_out`` directly anymore. Instead we output four bits for the NZCV flags in ``o_nzcv``. We extend the ALU shown in :numref:`fig:alu_basic` by introducing the flags one-by-one: .. figure:: /chapters/data_alu/alu_nzcv_0.svg :name: fig:alu_nzcv_0 :width: 99% Illustration of our partially extended ALU after adding support for the Zero flag. .. figure:: /chapters/data_alu/alu_nzcv_1.svg :name: fig:alu_nzcv_1 :width: 99% Illustration of our partially extended ALU after adding support for the Zero and Negative flags. .. figure:: /chapters/data_alu/alu_nzcv_2.svg :name: fig:alu_nzcv_2 :width: 99% Illustration of our partially extended ALU after adding support for the Zero, Negative and Carry flags. .. figure:: /chapters/data_alu/alu_nzcv_3.svg :name: fig:alu_nzcv_3 :width: 99% Illustration of our extended ALU with full support for the NZCV flags. Once again, we identify building blocks before tackling the entire module. Of course the biggest building block, required already in :numref:`fig:alu_nzcv_0`, is given through :numref:`ch:alu_basic`'s basic ALU which we already designed in the module ``alu``. We only introduce one additional standalone module before tackling the NZCV-extended ALU. This module is given by the three-input XOR which we require for the extension with the oVerflow flag in :numref:`fig:alu_nzcv_3`. Our XOR should have the following declaration: .. code-block:: systemverilog module xor_3 ( input logic i_in0, i_in1, i_in2, output logic o_res ) Great! We are ready to implement our full-fledged ALU with support for the NZCV flags. As before, we have to implement extensive tests in a testbench -- the ALU is is the â¤ï¸ of our processor after all. This time, as shown in :numref:`tab:ext_alu_exs`, we harness our parametrized implementation of the ALU by using 32-bit test inputs ``i_a`` and ``i_b``. .. _tab:ext_alu_exs: .. table:: Example inputs and outputs for our extended ALU. +---------------+---------------+------------+---------------+---------+ | i_a | i_b | i_alu_ctrl | o_result | o_nzcv | +===============+===============+============+===============+=========+ | 32'h0000_0000 | 32'h0000_0000 | 2'b00 | 32'h0000_0000 | 4'b0100 | +---------------+---------------+------------+---------------+---------+ | 32'h0000_0000 | 32'hffff_ffff | 2'b00 | 32'hffff_ffff | 4'b1000 | +---------------+---------------+------------+---------------+---------+ | 32'h0000_0001 | 32'hffff_ffff | 2'b00 | | 4'b0110 | +---------------+---------------+------------+---------------+---------+ | 32'h0000_ffff | 32'h0000_0001 | 2'b00 | | 4'b0000 | +---------------+---------------+------------+---------------+---------+ | 32'h0000_0000 | 32'h0000_0000 | 2'b01 | | 4'b0110 | +---------------+---------------+------------+---------------+---------+ | 32'h0001_0000 | 32'h0000_0001 | 2'b01 | | 4'b0010 | +---------------+---------------+------------+---------------+---------+ | 32'hffff_ffff | 32'hffff_ffff | 2'b10 | | 4'b1000 | +---------------+---------------+------------+---------------+---------+ | 32'hffff_ffff | 32'h7743_3477 | 2'b10 | | 4'b0000 | +---------------+---------------+------------+---------------+---------+ | 32'h0000_0000 | 32'hffff_ffff | 2'b10 | | 4'b0100 | +---------------+---------------+------------+---------------+---------+ | 32'h0000_0000 | 32'hffff_ffff | 2'b11 | | 4'b1000 | +---------------+---------------+------------+---------------+---------+ .. admonition:: Tasks #. Implement the module ``xor_3`` in the file ``xor_3.sv``. Test your implementation in the testbench ``xor_3_tb`` in the file ``xor_3_tb.sv``. #. Fill in the missing parts of :numref:`tab:ext_alu_exs`. #. Implement the module ``alu_nzcv`` in the file ``alu_nzcv.sv``. Test your implementation in the testbench ``alu_nzcv_tb`` in the file ``alu_nzcv_tb.sv``. Add tests for the examples in :numref:`tab:ext_alu_exs` to ``alu_nzcv_tb``. #. Generate a waveform plot illustrating the application of your extended ALU to :numref:`tab:ext_alu_exs`'s inputs. Apply each input for 10ps and thus limit the plot to a total time of 100ps. Visualize all inputs (``i_a``, ``i_b`` and ``i_alu_ctrl``) and all outputs (``o_result``, ``o_nzcv``). .. _ch:alu_condition_flags: Condition Flags --------------- In this part we think about a few examples and the respective meaning of the obtained flags. :numref:`tab:ext_alu_int_reps` makes the beginning. We recapitulate that the meaning of raw bits depends on the interpretation. Specifically, we consider the inputs to our ALU to be either signed or unsigned integers. Next, we have a closer look at the meaning of the condition flags in a series of examples. The examples are given in :numref:`tab:ext_alu_fpga`. Compared to :numref:`tab:ext_alu_exs`, we derive the flags on paper first and explain their meaning w.r.t. to the given inputs and expected outputs. .. _tab:ext_alu_int_reps: .. table:: Different integer interpretations of four bits. The second column assumes an unsigned interpretation. The third column assumes a two's complement representation. +----------------+--------------------+------------------------------+ | Raw (bits) | Unsigned | Signed (two's complement) | +================+====================+==============================+ | :math:`0000_2` | :math:`0_{10}` | :math:`0_{10}` | +----------------+--------------------+------------------------------+ | :math:`0001_2` | :math:`1_{10}` | :math:`1_{10}` | +----------------+--------------------+------------------------------+ | :math:`0010_2` | :math:`2_{10}` | :math:`2_{10}` | +----------------+--------------------+------------------------------+ | :math:`0011_2` | :math:`3_{10}` | :math:`3_{10}` | +----------------+--------------------+------------------------------+ | :math:`0100_2` | | | +----------------+--------------------+------------------------------+ | :math:`0101_2` | | | +----------------+--------------------+------------------------------+ | :math:`0110_2` | | | +----------------+--------------------+------------------------------+ | :math:`1000_2` | :math:`8_{10}` | :math:`-(2^3 - 0) = -8_{10}` | +----------------+--------------------+------------------------------+ | :math:`1001_2` | :math:`9_{10}` | :math:`-(2^3 - 1) = -7_{10}` | +----------------+--------------------+------------------------------+ | :math:`1010_2` | | | +----------------+--------------------+------------------------------+ | :math:`1101_2` | | | +----------------+--------------------+------------------------------+ .. _tab:ext_alu_fpga: .. table:: Example inputs and outputs for our extended 4-bit ALU. +--------------+--------------+------------+--------------+---------+ | i_a | i_b | i_alu_ctrl | o_result | o_nzcv | +==============+==============+============+==============+=========+ | 4'b0100 | 4'b0100 | 2'b00 | | | +--------------+--------------+------------+--------------+---------+ | 4'b1101 | 4'b0011 | 2'b00 | | | +--------------+--------------+------------+--------------+---------+ | 4'b0100 | 4'b1010 | 2'b01 | | | +--------------+--------------+------------+--------------+---------+ | 4'b0110 | 4'b1001 | 2'b10 | | | +--------------+--------------+------------+--------------+---------+ | 4'b0110 | 4'b0101 | 2'b11 | | | +--------------+--------------+------------+--------------+---------+ .. admonition:: Tasks #. Complete :numref:`tab:ext_alu_int_reps`. Provide all numbers in base 10. #. Complete :numref:`tab:ext_alu_fpga`. Explain *briefly* what the respective condition flags indicate when assuming the following two cases: * ``i_a`` and ``i_b`` represent unsigned integers * ``i_a`` and ``i_b`` represent signed integers #. `ANDS (immediate) <https://developer.arm.com/documentation/ddi0602/2022-03/Base-Instructions/ANDS--immediate---Bitwise-AND--immediate---setting-flags->`_ is a flag-setting base instruction of the A64 Instruction Set Architecture. In assembly code one would use the syntax ``ANDS <Xd>, <Xn>, #<imm>`` when working with the 64-bit view of the general purpose registers. Locate and name a flag-setting A64 base instruction which performs an addition and one which performs a subtraction. Look up the assembly syntax for the 32-bit and 64-bit variants. .. hint:: A `short description <https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers/NZCV--Condition-Flags>`__ of the NZCV condition flags is given in the documentation of AArch64's system registers. .. _ch:alu_nzcv_fpga: NZCV-Extended ALU in Praxis --------------------------- .. figure:: /chapters/data_alu/pic_alu_nzcv_3.jpg :name: fig:pic_alu_nzcv_3 :width: 100% Picture of the deployed arithmetic logic unit. Shown is the configuration for inputs ``i_a[3:0]=4'b0100``, ``i_b[3:0]=4'b1010`` and ``i_alu_ctrl[1:0]=2'b01``. Let's also use our NZCV-extended ALU ``alu_nzcv`` to program the FPGA of a DE10-Lite board. An example configuration of the programmed board is shown in :numref:`fig:pic_alu_nzcv_3`. Once again we require a top-level module. This time we use the name ``alu_nzcv_de10_lite`` and the template given in :numref:`lst:alu_nzcv_de10_lite`. Pretty much everything works analogously to the corresponding top-level module of :numref:`ch:alu_basic_fpga`. From a high level perspective the main difference between the modules ``alu`` and ``alu_nzcv`` lies in the outputs: ``alu`` sets the carry out ``o_carry_out`` while ``alu_nzcv`` sets the condition flags ``o_nzcv[3:0]``. Thus, in the module ``alu_nzcv_de10_lite`` we wire the NZCV flags to the LEDs ``LEDR9`` (Negative), ``LEDR8`` (Zero), ``LEDR7`` (Carry) and ``LEDR6`` (oVerflow). .. literalinclude:: data_alu/alu_nzcv_de10_lite.sv :linenos: :language: systemverilog :caption: Template for the module ``alu__nzcv_de10_lite``. :name: lst:alu_nzcv_de10_lite .. admonition:: Tasks #. Implement the top-level module ``alu_nzcv_de10_lite``. Use the template in :numref:`lst:alu_nzcv_de10_lite`. #. Compile your finished ALU in Quartus Prime and program the FPGA of a DE10-Lite board. #. Make sure that the boards shows the correct results for the inputs in :numref:`tab:ext_alu_fpga`. Provide a picture of the board for each of the inputs.