.. _ch:autograd:

Autograd
========
All major machine learning frameworks include automatic differentiation engines.
The engines are, for example, used heavily to realize backpropagation when training neural networks.
Respective details are available from the frameworks' documentation: `PyTorch <https://pytorch.org/docs/stable/autograd.html>`__, `TensorFlow <https://www.tensorflow.org/guide/autodiff>`__, `JAX <https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html>`__.

In this lab we'll have a closer look at reverse-mode automatic differentiation by implementing our own (tiny) engine.
The goal is to get a high-level idea of how the respective pieces are connected and what information of the forward pass is required in the backward pass.
This understanding will lay the groundwork to understand PyTorch's automatic differentiation package torch.autograd and for defining our own custom extensions under torch.autograd's umbrella in :numref:`ch:custom_extensions`.

.. _ch:autograd_examples:

Examples
--------
This part of the lab decomposes functions with increasing complexity into simpler ones.
These simple functions are then used to formulate the forward pass and the backward pass.
For now, we will perform the decompositions manually and recursively apply the chain rule in the backward pass.
In :numref:`ch:autograd_engine` we will then change our strategy by defining more complex functions out of elementary building blocks which we can handle automatically.

Our first function :math:`f` is a very simple one:

.. math::
  f(x,y,z) = x ( y + z )


We identify two expressions which allow us to compute the forward pass: :math:`a=y+z` and :math:`b=xa`.
Therefore, a piece of Python code realizing the forward pass of function :math:`f` could read as follows:

.. code:: python

   def forward( i_x,
                i_y,
                i_z ):
      l_a = i_y + i_z
      l_b = i_x * l_a

      return l_b

The backward pass is a bit more challenging:
We are interested in computing the partial derivatives for the three inputs, i.e., :math:`\frac{\partial f}{\partial x}`, :math:`\frac{\partial f}{\partial y}` and :math:`\frac{\partial f}{\partial z}`.
Function :math:`f` is simple enough such that we see the solution without thinking too much about it:

.. math::
  \frac{\partial f}{\partial x} = y+z, \; \frac{\partial f}{\partial y} = x, \; \frac{\partial f}{\partial z} = x

The approach of formulating the derivatives explicitly gets increasingly more complex for more complex functions.
We follow a different idea in software and formulate the partial derivatives by means of the chain rule.
This only requires us to code partial derivatives for the identified building blocks  which we use in a structured procedure for the composed function (enabling automation down the road).

In the example, we start with :math:`b` and observe the following:

.. math::

  \begin{aligned}
    \frac{\partial b}{\partial x} = a, \;
    \frac{\partial b}{\partial a} = x, \\
    \frac{\partial a}{\partial y} = 1, \;
    \frac{\partial a}{\partial z} = 1.
  \end{aligned}

Application of the chain rule for :math:`\frac{\partial b}{\partial y}` and :math:`\frac{\partial b}{\partial z}` then gives us:

.. math::

  \begin{aligned}
    \frac{\partial f}{\partial x} = \frac{\partial b}{\partial x} = a = y + z \\
    \frac{\partial f}{\partial y} = \frac{\partial b}{\partial y} =
    \frac{\partial b}{\partial a} \frac{\partial a}{\partial y} =
    x \cdot 1 = x \\
    \frac{\partial f}{\partial z} = \frac{\partial b}{\partial z} =
    \frac{\partial b}{\partial a} \frac{\partial a}{\partial z} =
    x \cdot 1 = x
  \end{aligned}

Once again, we can formulate the backward pass in a single piece of Python code:

.. code:: python

   def backward( i_x,
                 i_y,
                 i_z ):
     l_a = i_y + i_z

     l_dbda = i_x
     l_dbdx = l_a

     l_dady = 1
     l_dadz = 1

     l_dbdy = l_dbda * l_dady
     l_dbdz = l_dbda * l_dadz

     return l_dbdx, l_dbdy, l_dbdz

.. admonition:: Tasks

   #. Implement the two given methods ``forward`` and ``backward`` for :math:`f(x,y,z) = x ( y + z )`.
      Test your implementation in appropriate unit tests using the library `unittest <https://docs.python.org/3/library/unittest.html>`__.

   #. Implement the forward and backward pass for the following function:

      .. math::
        g(w_0,w_1,w_2,x_0,x_1) = \frac{1}{1 + e^{-(w_0 x_0 + w_1 x_1 + w_2)} }

      Follow the described procedure, i.e., harness the chain rule in the backward pass!
      Test your implementation in appropriate unit tests.

   #. Implement the forward and backward pass for the following function:

      .. math::
        h(x,y) = \frac{ \sin(xy) + \cos(x+y) }{ e^{x-y} }

      Follow the described procedure, i.e., harness the chain rule in the backward pass!
      Test your implementation in appropriate unit tests.

.. _ch:autograd_engine:

Engine
------
In :numref:`ch:autograd_examples` we formulated the forward pass for "complex" functions by splitting them into simpler expressions.
The chain rule then allowed us to formulate the respective backward pass.
This part of the lab defines a set of building blocks which can then be used to assemble complex composite functions.

Roughly following the `approach <https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function>`__ taken in torch.autograd, we first define modules for our building blocks.
A single module has a ``forward`` and a ``backward`` method.
For example, to realize scalar additions we could define the module ``Add.py`` as follows:

.. literalinclude:: data_autograd/Add.py
   :language: python
   :linenos:

Similarly, for scalar multiplications we could define ``Mul.py``:

.. literalinclude:: data_autograd/Mul.py
   :language: python
   :linenos:

As done in PyTorch we use the context object to pass data from the forward method to the backward method.
For example, in ``Mul.py`` (lines 9 and 10), the two input values ``i_a`` and ``i_b`` are temporarily stored as part of the context object and then read (line 20) and used in the backward method.

We continue the development of our tiny autograd engine by embedding the function classes in a ``Node`` class.
``Node`` keeps track of the elementary functions used in the forward pass and allows our users to conveniently trigger the backward pass for the composite function.
The first version of the class could read as follows:

.. literalinclude:: data_autograd/node.py
   :language: python
   :linenos:

We see that ``Node`` defines the two functions ``__add__`` and ``__sub__``.
This `emulates <https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`__ a numeric object and allows our users to simply add two nodes using the binary ``+`` operator and multiply two nodes using the binary ``*`` operator.
However, in background, much more than simply adding two numeric values is done.
First, in lines 48 and 62 the two methods store the required functions which have to be executed in the backward pass.
Second, in lines 49 and 63, the child nodes which require the gradient of the newly create ``l_node`` object in the backward pass are stored.
Last, in lines 50-52 and 64-66, the forward method is executed and the result is stored as part of the ``Node`` object.

The stored information, i.e., the children and the respective method for the backward pass is then used in the method ``backward`` defined in lines 26-33.
If ``backward`` is called for a ``Node`` object, the input gradient is added to the node's member variable ``m_grad`` in line 28.
Next, the (previously stored) gradient function ``m_grad_fn`` is executed in lines 29-30.
The output of this function is then passed on recursively to the children for which the backward method is called in lines 32-33.

In summary, calling ``backward`` of a node object will traverse backward through the computation graph which was assembled in the forward pass.
For each node, the respective derivatives are computed and passed on to the node's children.
The procedure completes once all dependents have been reached.
Corresponding leaf nodes which were used to initiate the forward pass do not have any children.
For these the dummy function ``function.Nop.backward`` is called which is set in the constructor (line 11).

Note, that we are adding to a node's member variable ``m_grad`` in line 28.
This means that we could, e.g., initiate the backward pass multiple times and accumulate the gradients internally.
To reset the gradients, one has to call the function ``zero_grad`` defined in lines 36-39.

.. admonition:: Tasks

   #. Make yourself familiar with the :download:`code frame <data_autograd/autograd.tar.xz>`.
      Add unit tests for the module ``eml/autograd/functions/Mul.py``.
      Implement a unit test in the file ``eml/autograd/functions/test_node.py`` which realizes the function :math:`f(x,y,z)` of :numref:`ch:autograd_examples`.
  
   #. Extend the code by adding the modules ``Reciprocal.py``, i.e., :math:`\frac{1}{x}` for a scalar :math:`x` and ``Exp.py``, i.e., :math:`e^x` to ``eml/autograd/functions``.
      Define the respective methods ``__truediv__`` and ``exp`` in ``eml.autograd.node.Node``.
      Write appropriate unit tests!
      Test your extended code by realizing the function :math:`g(w_0,w_1,w_2,x_0,x_1)` of :numref:`ch:autograd_examples`.
  
   #. Proceed similarly for :math:`\sin` and :math:`\cos`.
      Test your implementations and realize the function :math:`h(x,y)` of :numref:`ch:autograd_examples`.