Tensors
=======
Tensors are *the* basic data structure in PyTorch.
This lab has a look at their implementation.
First, we'll use the Python interface which is known by most PyTorch users.
Next, we'll study PyTorch's ATen, short for "A Tensor Library", directly:

    ATen is fundamentally a tensor library, on top of which almost all other Python and C++ interfaces in PyTorch are built. It provides a core Tensor class, on which many hundreds of operations are defined. Most of these operations have both CPU and GPU implementations, to which the Tensor class will dynamically dispatch based on its type

    -- https://pytorch.org/cppdocs/#aten

Python
------
Tensors in PyTorch will accompany us whenever we use the framework.
A deeper understanding of their implementation is especially required when touching advanced topics.
For example, tensors are the go-to way for passing data between the Python frontend and PyTorch's C++ API or when sharing data through one of PyTorch's distributed memory backends.

.. note::

   Conceptually tensors in PyTorch are very similar to `ndarrays in NumPy <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`_ or `tensors in Tensorflow <https://www.tensorflow.org/guide/tensor>`_.
   Differences are largely under-the-hood, meaning that PyTorch's tensors are backed by the library `ATen <https://github.com/pytorch/pytorch/tree/master/aten/src/ATen>`_.

PyTorch's documentation is a good way to get started and to find information outside of our class.
There is a `short tutorial <https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html>`_ on tensors which covers the very basics.
The documentation of `torch.Tensor <https://pytorch.org/docs/stable/tensors.html>`__ is more revealing but less convenient to read.
One excellent presentation on the inner workings of PyTorch including tensors is shared by Edward Z. Yang in his `blog <http://blog.ezyang.com/2019/05/pytorch-internals/>`_.
Be aware that the blog post is from 2019 and some details might have changed since then.

Creation
^^^^^^^^
We may create tensors by calling generating functions, e.g., `torch.rand <https://pytorch.org/docs/stable/generated/torch.rand.html#torch.rand>`__ or `torch.ones <https://pytorch.org/docs/stable/generated/torch.ones.html#torch-ones>`__.
Another option is to go through convenience helpers which allow us to bridge to `NumPy <https://numpy.org/>`_ or standard Python lists.
Let's create some tensors and print them.
We'll use a rank-3 tensor with shape (4, 2, 3) as our running example.
For example, the tensor :math:`T = [T_0, T_1, T_2, T_3]^T` might contain the following data:

.. math::
  :label: eq:tensor_init

  T_0 =
  \begin{bmatrix}
    \hphantom{1}0 & \hphantom{1}1 & \hphantom{1}2 \\
    \hphantom{1}3 & \hphantom{1}4 & \hphantom{1}5
  \end{bmatrix},

  T_1 =
  \begin{bmatrix}
    \hphantom{1}6 &  \hphantom{1}7 &  \hphantom{1}8 \\
    \hphantom{1}9 & 10 & 11
  \end{bmatrix},

  T_2 =
  \begin{bmatrix}
    12 & 13 & 14 \\
    15 & 16 & 17
  \end{bmatrix},

  T_3 =
  \begin{bmatrix}
    18 & 19 & 20 \\
    21 & 22 & 23
  \end{bmatrix}.

.. admonition:: Tasks

   #. Try different tensor-generating functions and illustrate their behavior.
      Include `torch.zeros <https://pytorch.org/docs/stable/generated/torch.zeros.html>`_, `torch.ones <https://pytorch.org/docs/stable/generated/torch.ones.html>`_, `torch.rand <https://pytorch.org/docs/stable/generated/torch.rand.html>`_ and `torch.ones_like <https://pytorch.org/docs/stable/generated/torch.ones_like.html>`_ in your tests.

   #. Use a "list of lists of lists" data structure in Python to allocate memory for tensor :math:`T` with shape (4, 2, 3) and initialize it to the values in Eq. :eq:`eq:tensor_init`.
      Use `torch.tensor <https://pytorch.org/docs/stable/generated/torch.tensor.html>`_ to convert your Python-native data structure to a PyTorch tensor and print it.

   #. Once again start with your Python-native representation of :math:`T`.
      This time use `numpy.array <https://numpy.org/doc/stable/reference/generated/numpy.array.html>`_ to convert it to a NumPy array first.
      Then create a PyTorch tensor from the NumPy array and print both.

Operations
^^^^^^^^^^
We successfully created some tensors and printed them.
Luckily this wasn't too hard.
Now let's do something with our tensors.
This parts studies basic operations on tensors.
Of course, later on, we'll define and apply some heavy operations as well.
At the end of the day, the application of a neural net is nothing else than a series of chained more-basic tensor operations.
For now we'll use two simple rank-2 tensors :math:`P` and :math:`Q` in our examples:

.. math::

  P =
  \begin{bmatrix}
    \hphantom{1}0 & \hphantom{1}1 & \hphantom{1}2 \\
    \hphantom{1}3 & \hphantom{1}4 & \hphantom{1}5
  \end{bmatrix},

  Q =
  \begin{bmatrix}
    \hphantom{1}6 & \hphantom{1}7 & \hphantom{1}8 \\
    9 & 10 & 11
  \end{bmatrix}.

.. admonition:: Tasks

   #. Generate the rank-2 tensors :math:`P` and :math:`Q` in PyTorch.
      Illustrate the behavior of element-wise operations on :math:`P` and :math:`Q`.
      Try at least `torch.add <https://pytorch.org/docs/stable/generated/torch.add.html>`__ and `torch.mul <https://pytorch.org/docs/stable/generated/torch.mul.html>`__.
      Show that you may also perform element-wise addition or multiplication through the `overloaded <https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_ binary operators ``+`` and ``*``.

   #. Compute the matrix-matrix product of :math:`P` and :math:`Q^T` by using `torch.matmul <https://pytorch.org/docs/stable/generated/torch.matmul.html>`__. Show that you may achieve the same through the overloaded ``@`` operator.

   #. Illustrate the behavior of reduction operations, e.g., `torch.sum <https://pytorch.org/docs/stable/generated/torch.sum.html>`__ or `torch.max <https://pytorch.org/docs/stable/generated/torch.max.html>`__.

   #. Given two tensors ``l_tensor_0`` and ``l_tensor_1``, explain and illustrate the difference of the following two code snippets:

      .. code-block:: python
         :linenos:

         l_tmp = l_tensor_0
         l_tmp[:] = 0

      .. code-block:: python
         :linenos:

         l_tmp = l_tensor_1.clone().detach()
         l_tmp[:] = 0

Storage
^^^^^^^
Internally a PyTorch tensor consists of the raw data stored in memory and metadata describing the data.
For example, assume that you have the following matrix :math:`A \in \mathbb{R}^{2\times3}`:

.. math::

  A =
  \begin{bmatrix}
    0 & 1 & 2 \\
    3 & 4 & 5
  \end{bmatrix}.

Let's say that you further decided to store your data using 32-bit floating point numbers and in the memory attached to your CPU:

* How would you store the data internally?
* Is a row-major format, e.g., :math:`[0, 1, 2, 3, 4, 5]` better than a column-major format, e.g., :math:`[0, 3, 1, 4, 2, 5]`?
* What about :math:`[0, 1, 2, *, *, 3, 4, 5]` or :math:`[0, 3, *, *, *, 1, 4, *, *, *, 2, 5]`? Is it possible to have "holes"?

Simply put all of the options are possible and there might be good reasons to choose one internal format over another one.
PyTorch uses respective internal formats but hides the underlying details when one is using the frontend.
In this part we'll have a look at some of those details.
A detailed understanding becomes essential when we'll pass our tensors to C/C++ and operate on the raw data.

.. admonition:: Tasks

   #. Create a PyTorch tensor from the rank-3 tensor :math:`T` given in Eq. :eq:`eq:tensor_init`.
      Print the tensor's `size <https://pytorch.org/docs/stable/generated/torch.Tensor.size.html>`__ and `stride <https://pytorch.org/docs/stable/generated/torch.Tensor.stride.html>`__.
      Print the tensor's `attributes <https://pytorch.org/docs/stable/tensor_attributes.html>`__, i.e., its dtype, layout and device.
   #. Create a new tensor ``l_tensor_float``  from :math:`T` but use ``torch.float32`` as its ``dtype``.
   #. Fix the second dimension of ``l_tensor_float``, i.e., assign ``l_tensor_fixed`` to:

      .. code-block:: python

         l_tensor_fixed = l_tensor_float[:,0,:]

      Which metadata of the tensor (size, stride, dtype, layout, device) changed?
      Which stayed the same?
   #. Create an even more complex `view <https://pytorch.org/docs/stable/tensor_view.html>`__ of ``l_tensor_float``:

      .. code-block:: python

        l_tensor_complex_view = l_tensor_float[::2,1,:]

      Explain the changes in size and stride.

   #. Apply the `contiguous <https://pytorch.org/docs/stable/generated/torch.Tensor.contiguous.html>`__ function to ``l_tensor_complex_view``.
      Explain the changes in the stride.
   #. Illustrate the internal storage of a tensor by printing corresponding internal data directly.

      .. hint::

         The function `data_ptr <https://pytorch.org/docs/stable/generated/torch.Tensor.data_ptr.html>`_ returns the memory address of the internal data.
         `ctypes <https://docs.python.org/3/library/ctypes.html>`_ allows you to directly load data from memory.
         For example, the following code loads four bytes from address ``l_data_ptr``, interprets the result as a 32-bit floating point value and writes the data to ``l_data_raw``:

         .. code-block:: python

            l_data_raw = (ctypes.c_float).from_address( l_data_ptr )

ATen
----
Let's leave PyTorch's Python fronted for a moment and have a look at ATen.
This time a :download:`code frame <data_tensors/aten.tar.xz>` is provided to kickstart your developments.
On the programming side not much is done.
However, the value of the code frame lies in the included Makefile.
The Makefile automatically discovers PyTorch headers and libraries which are required to build ATen-based C++ code.
For this discovery to work, you have to issue ``make`` from a conda environment with a PyTorch installation.

Storage
^^^^^^^

.. literalinclude:: data_tensors/aten.cpp
    :linenos:
    :language: cpp
    :caption: File ``src/aten.cpp`` of the provided code frame.
    :name: lst:tensors_aten


The code frame contains the single C++ file ``src/aten.cpp`` given in :numref:`lst:tensors_aten`.
Inside of ``main``, the one-dimensional array ``l_data`` is allocated on the stack and assigned some values.
In this task, we'll initially use ``l_data`` as the memory for our tensors' data.

.. admonition:: Tasks

  #. Listen to the `Torch vs ATen APIs <https://pytorch-dev-podcast.simplecast.com/episodes/torch-vs-aten-apis>`__ episode of the PyTorch Developer Podcast.
  #. Create the rank-3 tensor :math:`T = [T_0, T_1, T_2, T_3]^T` (see Eq. :eq:`eq:tensor_init`) using FP32 elements.
     Use the array ``l_data`` for your the storage and ``at::from_blob()`` to create the tensor.
     Use ``l_tensor`` as the name of your tensor.
  #. Print the tensor itself and respective metadata (data_ptr, dtype, sizes, strides, storage_offset, device, layout, is_contiguous).
  #. Demonstrate that you may manipulate the tensor's data by either using the raw C pointer ``l_data`` or by going through ATen.
  #. Create a new view ``l_view`` which assumes the index 1 for the second dimension of ``l_tensor``, i.e., in Python you could write ``l_tensor[:,1:]``.
     Illustrate that the two tensors use the same memory for their data.
  #. Create a new tensor ``l_cont`` by calling ``l_view.contiguous()``.
     How is the result ``l_cont`` different from ``l_view``?

.. hint::

   PyTorch's documentation might miss some of ATen's operators and classes.
   Therefore, our homepage `hosts <../_static/doxygen_html>`__ documentation generated from ATen's source code.
   The following list provides links to respective ATen functions which are required to solve the tasks of this section:

   * `at::from_blob() <../_static/doxygen_html/namespaceat.html#aeabf8fc52709f5f3507fbaf8d69d721d>`__
   * `at::TensorBase::dtype <../_static/doxygen_html/classat_1_1_tensor_base.html#add4f01eb93ef2d0e39e5b37296eb7119>`__
   * `at::TensorBase::sizes <../_static/doxygen_html/classat_1_1_tensor_base.html#af02cd1581f1fda84e6db2d5f2764383d>`__
   * `at::TensorBase::strides <../_static/doxygen_html/classat_1_1_tensor_base.html#a024f81bb2d593442c39e1e91afdeea31>`__
   * `at::TensorBase::storage_offset <../_static/doxygen_html/classat_1_1_tensor_base.html#a23f17eb73f8426cd1255a895d01e8415>`__
   * `at::TensorBase::device <../_static/doxygen_html/classat_1_1_tensor_base.html#ac7f167a30733c85ac699c23d99774b3b>`__
   * `at::TensorBase::layout <../_static/doxygen_html/classat_1_1_tensor_base.html#aaf9ce7d6957ca729c54f1fb6d375b557>`__
   * `at::TensorBase::data_ptr <../_static/doxygen_html/classat_1_1_tensor_base.html#ad7ebeb23f28692336a13b8ca9e70ecd7>`__
   * `at::Tensor::select <../_static/doxygen_html/classat_1_1_tensor.html#a0c02246802113cfb084fee4e13cf7dd3>`__
   * `at::TensorBase::is_contiguous <../_static/doxygen_html/classat_1_1_tensor_base.html#abdeb4da00fcff898da2e128df6cfd87e>`__
   * `at::Tensor::contiguous <../_static/doxygen_html/classat_1_1_tensor.html#a02ba246001dcbee043822d98c788d87d>`__
   * `at::rand <../_static/doxygen_html/namespaceat.html#a09880f023805007213bb572145ebbf83>`__
   * `at::matmul <../_static/doxygen_html/namespaceat.html#ad9839a3922fa9ec87e838b703822df72>`__
   * `at::bmm <../_static/doxygen_html/namespaceat.html#a854b1b19549a17f87a69b5f6b1134e22>`__

   The documentation was created from the PyTorch sources using tag v2.0.0.
   `Doxygen <https://www.doxygen.nl/>`__ is the tool which created the documentation based on a modified `Doxyfile <https://github.com/pytorch/pytorch/tree/v2.0.0/docs/cpp/source>`__.

Operations
^^^^^^^^^^

Now, let's have a look at the ATen-native operations ``matmul`` and ``bmm``.
Given two matrices :math:`A` and :math:`B`, the ``matmul`` functions computes the matrix-matrix product :math:`C = AB`.
The name ``bmm`` is short for ''batched matrix-matrix multiplication''.
Assume a series of input matrices :math:`A_i` and :math:`B_i` with :math:`i \in 1 \ldots N`.
All matrices :math:`A_i` are assumed to have the same shape and all matrices :math:`B_i` are assumed to have the same shape.
Then the batched matrix-matrix product is given as :math:`C_i = A_i B_i` with :math:`i \in 1 \ldots N`.

.. admonition:: Tasks

   #. Use ``at::rand`` to create a randomly initialized :math:`16 \times 4` matrix :math:`A` and :math:`4 \times 16` matrix :math:`B`.
   #. Use ``at::matmul`` to multiply the two matrices :math:`A` and :math:`B`.
   #. Use ``at::rand`` to create a randomly initialized :math:`16 \times 4 \times 2` tensor :math:`T_0` and a :math:`16 \times 2 \times 4` tensor :math:`T_1`.
   #. Use ``at::bmm`` to multiply the respective sixteen :math:`4 \times 2` matrices of :math:`T_0` with the sixteen :math:`2 \times 4` matrices of :math:`T_1`.