2. Tensors

Tensors are the basic data structure in PyTorch. This lab has a look at their implementation. First, we’ll use the Python interface which is known by most PyTorch users. Next, we’ll study PyTorch’s ATen, short for “A Tensor Library”, directly:

ATen is fundamentally a tensor library, on top of which almost all other Python and C++ interfaces in PyTorch are built. It provides a core Tensor class, on which many hundreds of operations are defined. Most of these operations have both CPU and GPU implementations, to which the Tensor class will dynamically dispatch based on its type

—https://pytorch.org/cppdocs/#aten

2.1. Python

Tensors in PyTorch will accompany us whenever we use the framework. A deeper understanding of their implementation is especially required when touching advanced topics. For example, tensors are the go-to way for passing data between the Python frontend and PyTorch’s C++ API or when sharing data through one of PyTorch’s distributed memory backends.

Note

Conceptually tensors in PyTorch are very similar to ndarrays in NumPy or tensors in Tensorflow. Differences are largely under-the-hood, meaning that PyTorch’s tensors are backed by the library ATen.

PyTorch’s documentation is a good way to get started and to find information outside of our class. There is a short tutorial on tensors which covers the very basics. The documentation of torch.Tensor is more revealing but less convenient to read. One excellent presentation on the inner workings of PyTorch including tensors is shared by Edward Z. Yang in his blog. Be aware that the blog post is from 2019 and some details might have changed since then.

Creation

We may create tensors by calling generating functions, e.g., torch.rand or torch.ones. Another option is to go through convenience helpers which allow us to bridge to NumPy or standard Python lists. Let’s create some tensors and print them. We’ll use a rank-3 tensor with shape (4, 2, 3) as our running example. For example, the tensor $T = [T_{0}, T_{1}, T_{2}, T_{3}]^{T}$ might contain the following data:

(2.1.1)

\begin{array}{r} \begin{matrix} \begin{matrix} T_{0} = [\begin{array}{c} 0 & 1 & 2 \\ 3 & 4 & 5 \end{array}], \end{matrix} \\ \begin{matrix} T_{1} = [\begin{array}{c} 6 & 7 & 8 \\ 9 & 10 & 11 \end{array}], \end{matrix} \\ \begin{matrix} T_{2} = [\begin{array}{c} 12 & 13 & 14 \\ 15 & 16 & 17 \end{array}], \end{matrix} \\ \begin{matrix} T_{3} = [\begin{array}{c} 18 & 19 & 20 \\ 21 & 22 & 23 \end{array}] . \end{matrix} \end{matrix} \end{array}

Tasks

Try different tensor-generating functions and illustrate their behavior. Include torch.zeros, torch.ones, torch.rand and torch.ones_like in your tests.
Use a “list of lists of lists” data structure in Python to allocate memory for tensor $T$ with shape (4, 2, 3) and initialize it to the values in Eq. (2.1.1). Use torch.tensor to convert your Python-native data structure to a PyTorch tensor and print it.
Once again start with your Python-native representation of $T$ . This time use numpy.array to convert it to a NumPy array first. Then create a PyTorch tensor from the NumPy array and print both.

Operations

We successfully created some tensors and printed them. Luckily this wasn’t too hard. Now let’s do something with our tensors. This parts studies basic operations on tensors. Of course, later on, we’ll define and apply some heavy operations as well. At the end of the day, the application of a neural net is nothing else than a series of chained more-basic tensor operations. For now we’ll use two simple rank-2 tensors $P$ and $Q$ in our examples:

\begin{array}{r} \begin{matrix} \begin{matrix} P = [\begin{array}{c} 0 & 1 & 2 \\ 3 & 4 & 5 \end{array}], \end{matrix} \\ \begin{matrix} Q = [\begin{array}{c} 6 & 7 & 8 \\ 9 & 10 & 11 \end{array}] . \end{matrix} \end{matrix} \end{array}

Tasks

Generate the rank-2 tensors $P$ and $Q$ in PyTorch. Illustrate the behavior of element-wise operations on $P$ and $Q$ . Try at least torch.add and torch.mul. Show that you may also perform element-wise addition or multiplication through the overloaded binary operators + and *.
Compute the matrix-matrix product of $P$ and $Q^{T}$ by using torch.matmul. Show that you may achieve the same through the overloaded @ operator.
Illustrate the behavior of reduction operations, e.g., torch.sum or torch.max.
Given two tensors l_tensor_0 and l_tensor_1, explain and illustrate the difference of the following two code snippets:
```
1l_tmp = l_tensor_0
2l_tmp[:] = 0
```
```
1l_tmp = l_tensor_1.clone().detach()
2l_tmp[:] = 0
```

Storage

Internally a PyTorch tensor consists of the raw data stored in memory and metadata describing the data. For example, assume that you have the following matrix $A \in R^{2 \times 3}$ :

\begin{array}{r} A = [\begin{array}{c} 0 & 1 & 2 \\ 3 & 4 & 5 \end{array}] . \end{array}

Let’s say that you further decided to store your data using 32-bit floating point numbers and in the memory attached to your CPU:

How would you store the data internally?
Is a row-major format, e.g., $[0, 1, 2, 3, 4, 5]$ better than a column-major format, e.g., $[0, 3, 1, 4, 2, 5]$ ?
What about $[0, 1, 2, *, *, 3, 4, 5]$ or $[0, 3, *, *, *, 1, 4, *, *, *, 2, 5]$ ? Is it possible to have “holes”?

Simply put all of the options are possible and there might be good reasons to choose one internal format over another one. PyTorch uses respective internal formats but hides the underlying details when one is using the frontend. In this part we’ll have a look at some of those details. A detailed understanding becomes essential when we’ll pass our tensors to C/C++ and operate on the raw data.

Tasks

Create a PyTorch tensor from the rank-3 tensor $T$ given in Eq. (2.1.1). Print the tensor’s size and stride. Print the tensor’s attributes, i.e., its dtype, layout and device.
Create a new tensor l_tensor_float from $T$ but use torch.float32 as its dtype.
Fix the second dimension of l_tensor_float, i.e., assign l_tensor_fixed to:
```
l_tensor_fixed = l_tensor_float[:,0,:]
```
Which metadata of the tensor (size, stride, dtype, layout, device) changed? Which stayed the same?
Create an even more complex view of l_tensor_float:
```
l_tensor_complex_view = l_tensor_float[::2,1,:]
```
Explain the changes in size and stride.
Apply the contiguous function to l_tensor_complex_view. Explain the changes in the stride.
Illustrate the internal storage of a tensor by printing corresponding internal data directly.
Hint

The function data_ptr returns the memory address of the internal data. ctypes allows you to directly load data from memory. For example, the following code loads four bytes from address l_data_ptr, interprets the result as a 32-bit floating point value and writes the data to l_data_raw:
```
l_data_raw = (ctypes.c_float).from_address( l_data_ptr )
```

2.2. ATen

Let’s leave PyTorch’s Python fronted for a moment and have a look at ATen. This time a code frame is provided to kickstart your developments. On the programming side not much is done. However, the value of the code frame lies in the included Makefile. The Makefile automatically discovers PyTorch headers and libraries which are required to build ATen-based C++ code. For this discovery to work, you have to issue make from a conda environment with a PyTorch installation.

Storage

Listing 2.2.1 File src/aten.cpp of the provided code frame.

#include <cstdlib>
#include <ATen/ATen.h>
#include <iostream>

int main() {
  std::cout << "running the ATen examples" << std::endl;

  float l_data[4*2*3] = {  0.0f,  1.0f,  2.0f, 
                           3.0f,  4.0f,  5.0f,

                           6.0f,  7.0f,  8.0f, 
                           9.0f, 10.0f, 11.0f,
                           
                          12.0f, 13.0f, 14.0f,
                          15.0f, 16.0f, 17.0f,
                          
                          18.0f, 19.0f, 20.0f,
                          21.0f, 22.0f, 23.0f };

  std::cout << "l_data (ptr): " << l_data << std::endl;

  // TODO: Add ATen code

  std::cout << "finished running ATen examples" << std::endl;

  return EXIT_SUCCESS;
}

The code frame contains the single C++ file src/aten.cpp given in Listing 2.2.1. Inside of main, the one-dimensional array l_data is allocated on the stack and assigned some values. In this task, we’ll initially use l_data as the memory for our tensors’ data.

Tasks

Listen to the Torch vs ATen APIs episode of the PyTorch Developer Podcast.
Create the rank-3 tensor $T = [T_{0}, T_{1}, T_{2}, T_{3}]^{T}$ (see Eq. (2.1.1)) using FP32 elements. Use the array l_data for your the storage and at::from_blob() to create the tensor. Use l_tensor as the name of your tensor.
Print the tensor itself and respective metadata (data_ptr, dtype, sizes, strides, storage_offset, device, layout, is_contiguous).
Demonstrate that you may manipulate the tensor’s data by either using the raw C pointer l_data or by going through ATen.
Create a new view l_view which assumes the index 1 for the second dimension of l_tensor, i.e., in Python you could write l_tensor[:,1:]. Illustrate that the two tensors use the same memory for their data.
Create a new tensor l_cont by calling l_view.contiguous(). How is the result l_cont different from l_view?

Hint

PyTorch’s documentation might miss some of ATen’s operators and classes. Therefore, our homepage hosts documentation generated from ATen’s source code. The following list provides links to respective ATen functions which are required to solve the tasks of this section:

The documentation was created from the PyTorch sources using tag v2.0.0. Doxygen is the tool which created the documentation based on a modified Doxyfile.

Operations

Now, let’s have a look at the ATen-native operations matmul and bmm. Given two matrices $A$ and $B$ , the matmul functions computes the matrix-matrix product $C = A B$ . The name bmm is short for ‘’batched matrix-matrix multiplication’’. Assume a series of input matrices $A_{i}$ and $B_{i}$ with $i \in 1 \dots N$ . All matrices $A_{i}$ are assumed to have the same shape and all matrices $B_{i}$ are assumed to have the same shape. Then the batched matrix-matrix product is given as $C_{i} = A_{i} B_{i}$ with $i \in 1 \dots N$ .

Tasks

Use at::rand to create a randomly initialized $16 \times 4$ matrix $A$ and $4 \times 16$ matrix $B$ .
Use at::matmul to multiply the two matrices $A$ and $B$ .
Use at::rand to create a randomly initialized $16 \times 4 \times 2$ tensor $T_{0}$ and a $16 \times 2 \times 4$ tensor $T_{1}$ .
Use at::bmm to multiply the respective sixteen $4 \times 2$ matrices of $T_{0}$ with the sixteen $2 \times 4$ matrices of $T_{1}$ .