.. _ch:single_cycle:

Single Cycle Processor
======================
In theory we have all parts together and could start discussing the design of our own processors using SystemVerilog.
Most importantly, we developed an NZCV-extended ALU in :numref:`ch:alu` which could build the centerpiece of our processor's datapath.
Our FPGA skills would then prove extremely helpful to conduct first tests and take our processor designs for a spin.
If only there was some more time.. 😅

This chapter is split into two parts.
First, in :numref:`ch:single_cycle_machine_code` we'll study machine code and the underlying instruction encoding used in Arm architecture.
An Arm processor is able to parse this machine code and behaves accordingly.
Second, in :numref:`ch:single_cycle_isa_sim` we'll have a look at a very simple simulated Arm processor.
This allows us to get an overview on how instructions are fetched, decoded and executed by a CPU.

.. _ch:single_cycle_machine_code:

Machine Code
------------
In :numref:`ch:aarch64` we wrote simple functions in assembly language.
Until this point the assembler took care of translating our assembly code to machine code.
The machine code is how our executable programs are actually stored in memory.
In this exercise we'll manually translate a program from assembly language to machine code.
This is nothing one would typically do in practice but crucial to understand the encoding of A64 instructions.

AArch64 has a fixed instruction size of 32 bits.
Thus, a function with 10 instructions fits in 10 :math:`\times` 32 bits = 320 bits = 40 bytes.
Logically an AArch64 processor reads bits 0-31, decodes the instruction and executes a corresponding operation.
Next, the processor reads and decodes bits 32-63 and executes the respective operation.
After that bits 64-95 are processed and so on.
Branches operate differently from that.
Here, potentially based on the result of a previous instruction, the processor might not simply jump to the next 32 bits but to another program position.

.. admonition:: Optional Note

   Certain recent extensions of the Arm architecture violate the idea that every operation is encoded in 32 bits.
   An example for more "complex" operations are those using prefix instructions of the Scalable Vector Extension (SVE).
   `MOVPRFX (unpredicated) <https://developer.arm.com/documentation/ddi0602/2021-09/SVE-Instructions/MOVPRFX--unpredicated---Move-prefix--unpredicated-->`__, for example, allows us to perform a four-operand Fused-Multiply-Add (FMA4) operation.
   Effectively one would use two instructions, i.e., 2 :math:`\times` 32 bits = 64 bits, to realize FMA4.
   Some details on this are available from Fujitsu's 2018 Hot Chips (`HC30 <https://hc30.hotchips.org>`__) `presentation <https://old.hotchips.org/hc30/2conf/2.13_Fujitsu_HC30.Fujitsu.Yoshida.rev1.2.pdf>`__ on A64FX.

:numref:`listing:machine_code_driver`, :numref:`listing:machine_code_c` and :numref:`listing:machine_code_asm` contain our usual structure: A C++ driver, a C reference implementation and the corresponding part in assembly language.
This time, however, the assembly code is already provided in :numref:`listing:machine_code_asm`'s function ``machine_code_asm_0``.
Our goal is perform the manual translation of every single human-readable instruction to machine code.
This allows us to re-implement the function in ``machine_code_asm_1`` by using only machine code.

We'll do this by looking up the instructions of  ``machine_code_asm_0`` in the `ISA <https://developer.arm.com/documentation/ddi0602/2021-12/>`__ and writing down the respective machine code in :numref:`tab:machine_code_instructions`.
Good news, you are in luck!
:numref:`tab:machine_code_encodings` already contains most of the relevant links to the ISA and a short form of respective instruction encoding.
Only SUBS (immediate) is missing.


.. literalinclude:: data_single_cycle/driver_machine_code.cpp
    :language: cpp
    :caption: C++ driver for the machine code example.
    :name: listing:machine_code_driver

.. literalinclude:: data_single_cycle/machine_code_c.c
    :language: c
    :caption: C kernel which provides the high-level reference of our machine code example.
    :name: listing:machine_code_c

.. literalinclude:: data_single_cycle/machine_code_asm.s
    :language: asm
    :caption: Assembly kernel ``machine_code_asm_0`` which serves as the reference implementation.
    :name: listing:machine_code_asm

.. _tab:machine_code_encodings:

.. table:: Instruction encodings for the function ``machine_code_asm_0``.
   :widths: 1 1

   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
   | Instruction                                                                                                                                                               | Instruction Encoding                        |
   +===========================================================================================================================================================================+=============================================+
   | `EOR (shifted register) <https://developer.arm.com/documentation/ddi0602/2021-09/Base-Instructions/EOR--shifted-register---Bitwise-Exclusive-OR--shifted-register-->`_    | ``f100 1010 ss0m mmmm iiii iinn nnnd dddd`` |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
   | `ADDS (immediate) <https://developer.arm.com/documentation/ddi0602/2021-09/Base-Instructions/ADDS--immediate---Add--immediate---setting-flags->`_                         | ``f011 0001 0hii iiii iiii iinn nnnd dddd`` |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
   | SUBS (immediate)                                                                                                                                                          | ``???? ???? ???? ???? ???? ???? ???? ????`` |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
   | `B.cond <https://developer.arm.com/documentation/ddi0602/2021-09/Base-Instructions/B-cond--Branch-conditionally->`_                                                       | ``0101 0100 iiii iiii iiii iiii iii0 cccc`` |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
   | `ADDS (shifted register) <https://developer.arm.com/documentation/ddi0602/2021-09/Base-Instructions/ADDS--shifted-register---Add--shifted-register---setting-flags->`_    | ``f010 1011 ss0m mmmm iiii iinn nnnd dddd`` |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
   | `RET <https://developer.arm.com/documentation/ddi0602/2021-09/Base-Instructions/RET--Return-from-subroutine->`_                                                           | ``1101 0110 0101 1111 0000 00nn nnn0 0000`` |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+

.. _tab:machine_code_instructions:

.. table:: Instructions written in assembly language and machine code for the function ``machine_code_asm_0``.
   :widths: 1 2 1

   +---------------------+---------------------------------------------+--------------------+
   | Assembly Language   | Machine Code (binary)                       | Machine Code (hex) |
   +=====================+=============================================+====================+
   | ``eor x0, x0, x0``  | ``1100 1010 0000 0000 0000 0000 0000 0000`` | ``ca000000``       |
   +---------------------+---------------------------------------------+--------------------+
   | ``eor x1, x1, x1``  | ``1100 1010 0000 0001 0000 0000 0010 0001`` | ``ca010021``       |
   +---------------------+---------------------------------------------+--------------------+
   | ``eor x2, x2, x2``  |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``adds x0, x0, #5`` |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``adds x1, x1, #3`` |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``adds x2, x2, #7`` |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``subs x0, x0, #1`` |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``b.ne my_loop``    |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``adds x0, x1, x2`` |                                             |                    |
   +---------------------+---------------------------------------------+--------------------+
   | ``ret``             | ``1101 0110 0101 1111 0000 0011 1100 0000`` | ``d65f03c0``       |
   +---------------------+---------------------------------------------+--------------------+

.. admonition:: Tasks

  #. Complete :numref:`tab:machine_code_encodings` by looking up the instruction encoding of SUBS (immediate).
     Use a short format similar to the already provided one for ADDS (immediate).

  #. Complete :numref:`tab:machine_code_instructions` by providing the binary and hexadecimal machine code of all instructions.

     .. hint::

        * The immediate imm19 of B.cond is in two's complement representation.
          This means that ``111 1111 1111 1111 1111`` represents the decimal value -1.

        * A description of the condition codes is available from the `Arm Architecture Reference Manual Armv8, for A-profile architecture <https://developer.arm.com/documentation/ddi0487/gb>`_ in Chapter C1.2.4 / Table C1-1.


  #. Implement the function ``machine_code_asm_1`` in :numref:`listing:machine_code_asm` by only using your hex version of the machine code.

    .. hint::

       The directive ``.inst`` allows you to provide a numeric opcode in your assembly code.
       For example, for the first instruction ``eor x0, x0, x0`` one may alternatively write ``.inst 0xca000000``.

.. hint::

   You may assemble a single instruction by using the tool llvm-mc.
   For example, you could use the following command to assemble the instruction ``adds x0, x0, #5``:

   .. code-block::

      echo "adds x0, x0, #5" | llvm-mc -triple=aarch64 --show-encoding

   Conversely, you may also use llvm-mc to disassemble a single instruction.
   For example, to disassemble ``0xb1001400`` you could use:

   .. code-block::

      echo "0x00 0x14 0x00 0xb1" | llvm-mc -triple=aarch64 -disassemble --show-encoding

.. _ch:single_cycle_isa_sim:

ISA Simulator
-------------
This subsection studies how our programs are interpreted and executed by a (very) simple processor.
For this we'll use the `Graphical Micro-Architecture Simulator <https://github.com/arm-university/Graphical-Micro-Architecture-Simulator>`__.
The simulator supports a subset of the Armv8-A ISA called the `LEGv8 instructions <https://booksite.elsevier.com/9780128017333/content/Green%20Card.pdf>`__.
All instructions used in :numref:`ch:single_cycle_isa_sim` are part of LEGv8.

The real value of the simulator lies in the visualization of the implemented processors' register states, control and data paths.
In this subsection we'll use the single-cycle execution mode of the simulator.
This means that within one cycle the processor fetches an instruction from instruction memory, decodes it and executes it entirely.

.. admonition:: Optional Note

   Modern processors work on many instructions in parallel and often require more than a single cycle to execute a single instruction.
   One heavily used mechanism for Instruction-Level Parallelism (ILP) is called pipelining.
   You may switch to a pipelined variant of the simulator by selecting it as "Execution Mode".

.. figure:: /chapters/data_single_cycle/legv8_simulator_small.jpg
   :name: fig:legv8_simulator
   :width: 100%

   Arm's Graphical Micro-Architecture Simulator (v1.0.0) in action.
   Shown is the simulated data path and control for the fourth instruction ``adds x0, x0, #5`` of the code (excluding ``ret``) in :numref:`listing:machine_code_asm`.
   Active parts of the datapath are highlighted in red.
   Only the left side of the NZCV condition flags is highlighted in red to indicate that these are written.

:numref:`fig:legv8_simulator` shows the active parts of the datapath when executing the instruction ``adds x0, x0, #5`` in red.
Further, respective bits of the control are given.
We see that the current value of the Program Counter (PC) is used as an input to a red adder which uses the value 4 as second input.
The output of the adder is then passed on to a multiplexer which selects the output of this adder since the multiplexer's control bit is 0.
The multiplexer's output is the program counter's value in the next cycle.
In summary, the program counter is simply incremented by 4 when executing ``adds x0, x0, #5`` as we would expect based on the discussion in :numref:`ch:single_cycle_machine_code`.

For the remainder of this subsection we'll study how the branches in the execution of the code in :numref:`listing:machine_code_asm` manipulate the program counter.

.. admonition:: Tasks

   #. Use the Graphical Micro-Architecture Simulator to assemble the code in :numref:`listing:machine_code_asm`.
      Exclude the ``ret`` instruction.
      Use ``addis`` instead of ``adds`` and ``subis`` instead of ``subs`` when entering the assembly code.
      Run the code until you reach instruction ``addis x0, x1, x2``.
      Provide screenshots with ascending names, i.e., ``bne_0.png``, ``bne_1.png`` and so on, which show the visualized processor after every executed conditional branch.

   #. Briefly describe the changes of the program counter after every execution of a branch instruction.
      Briefly explain the reasons for the observed changes.