2. Tensors
Tensors are the basic data structure in PyTorch. This lab has a look at their implementation. First, we’ll use the Python interface which is known by most PyTorch users. Next, we’ll study PyTorch’s ATen, short for “A Tensor Library”, directly:
ATen is fundamentally a tensor library, on top of which almost all other Python and C++ interfaces in PyTorch are built. It provides a core Tensor class, on which many hundreds of operations are defined. Most of these operations have both CPU and GPU implementations, to which the Tensor class will dynamically dispatch based on its type
2.1. Python
Tensors in PyTorch will accompany us whenever we use the framework. A deeper understanding of their implementation is especially required when touching advanced topics. For example, tensors are the go-to way for passing data between the Python frontend and PyTorch’s C++ API or when sharing data through one of PyTorch’s distributed memory backends.
Note
Conceptually tensors in PyTorch are very similar to ndarrays in NumPy or tensors in Tensorflow. Differences are largely under-the-hood, meaning that PyTorch’s tensors are backed by the library ATen.
PyTorch’s documentation is a good way to get started and to find information outside of our class. There is a short tutorial on tensors which covers the very basics. The documentation of torch.Tensor is more revealing but less convenient to read. One excellent presentation on the inner workings of PyTorch including tensors is shared by Edward Z. Yang in his blog. Be aware that the blog post is from 2019 and some details might have changed since then.
Creation
We may create tensors by calling generating functions, e.g., torch.rand or torch.ones. Another option is to go through convenience helpers which allow us to bridge to NumPy or standard Python lists. Let’s create some tensors and print them. We’ll use a rank-3 tensor with shape (4, 2, 3) as our running example. For example, the tensor \(T = [T_0, T_1, T_2, T_3]^T\) might contain the following data:
Tasks
Try different tensor-generating functions and illustrate their behavior. Include torch.zeros, torch.ones, torch.rand and torch.ones_like in your tests.
Use a “list of lists of lists” data structure in Python to allocate memory for tensor \(T\) with shape (4, 2, 3) and initialize it to the values in Eq. (2.1.1). Use torch.tensor to convert your Python-native data structure to a PyTorch tensor and print it.
Once again start with your Python-native representation of \(T\). This time use numpy.array to convert it to a NumPy array first. Then create a PyTorch tensor from the NumPy array and print both.
Operations
We successfully created some tensors and printed them. Luckily this wasn’t too hard. Now let’s do something with our tensors. This parts studies basic operations on tensors. Of course, later on, we’ll define and apply some heavy operations as well. At the end of the day, the application of a neural net is nothing else than a series of chained more-basic tensor operations. For now we’ll use two simple rank-2 tensors \(P\) and \(Q\) in our examples:
Tasks
Generate the rank-2 tensors \(P\) and \(Q\) in PyTorch. Illustrate the behavior of element-wise operations on \(P\) and \(Q\). Try at least torch.add and torch.mul. Show that you may also perform element-wise addition or multiplication through the overloaded binary operators
+
and*
.Compute the matrix-matrix product of \(P\) and \(Q^T\) by using torch.matmul. Show that you may achieve the same through the overloaded
@
operator.Illustrate the behavior of reduction operations, e.g., torch.sum or torch.max.
Given two tensors
l_tensor_0
andl_tensor_1
, explain and illustrate the difference of the following two code snippets:1l_tmp = l_tensor_0 2l_tmp[:] = 0
1l_tmp = l_tensor_1.clone().detach() 2l_tmp[:] = 0
Storage
Internally a PyTorch tensor consists of the raw data stored in memory and metadata describing the data. For example, assume that you have the following matrix \(A \in \mathbb{R}^{2\times3}\):
Let’s say that you further decided to store your data using 32-bit floating point numbers and in the memory attached to your CPU:
How would you store the data internally?
Is a row-major format, e.g., \([0, 1, 2, 3, 4, 5]\) better than a column-major format, e.g., \([0, 3, 1, 4, 2, 5]\)?
What about \([0, 1, 2, *, *, 3, 4, 5]\) or \([0, 3, *, *, *, 1, 4, *, *, *, 2, 5]\)? Is it possible to have “holes”?
Simply put all of the options are possible and there might be good reasons to choose one internal format over another one. PyTorch uses respective internal formats but hides the underlying details when one is using the frontend. In this part we’ll have a look at some of those details. A detailed understanding becomes essential when we’ll pass our tensors to C/C++ and operate on the raw data.
Tasks
Create a PyTorch tensor from the rank-3 tensor \(T\) given in Eq. (2.1.1). Print the tensor’s size and stride. Print the tensor’s attributes, i.e., its dtype, layout and device.
Create a new tensor
l_tensor_float
from \(T\) but usetorch.float32
as itsdtype
.Fix the second dimension of
l_tensor_float
, i.e., assignl_tensor_fixed
to:l_tensor_fixed = l_tensor_float[:,0,:]
Which metadata of the tensor (size, stride, dtype, layout, device) changed? Which stayed the same?
Create an even more complex view of
l_tensor_float
:l_tensor_complex_view = l_tensor_float[::2,1,:]
Explain the changes in size and stride.
Apply the contiguous function to
l_tensor_complex_view
. Explain the changes in the stride.Illustrate the internal storage of a tensor by printing corresponding internal data directly.
Hint
The function data_ptr returns the memory address of the internal data. ctypes allows you to directly load data from memory. For example, the following code loads four bytes from address
l_data_ptr
, interprets the result as a 32-bit floating point value and writes the data tol_data_raw
:l_data_raw = (ctypes.c_float).from_address( l_data_ptr )
2.2. ATen
Let’s leave PyTorch’s Python fronted for a moment and have a look at ATen.
This time a code frame
is provided to kickstart your developments.
On the programming side not much is done.
However, the value of the code frame lies in the included Makefile.
The Makefile automatically discovers PyTorch headers and libraries which are required to build ATen-based C++ code.
For this discovery to work, you have to issue make
from a conda environment with a PyTorch installation.
Storage
1#include <cstdlib>
2#include <ATen/ATen.h>
3#include <iostream>
4
5int main() {
6 std::cout << "running the ATen examples" << std::endl;
7
8 float l_data[4*2*3] = { 0.0f, 1.0f, 2.0f,
9 3.0f, 4.0f, 5.0f,
10
11 6.0f, 7.0f, 8.0f,
12 9.0f, 10.0f, 11.0f,
13
14 12.0f, 13.0f, 14.0f,
15 15.0f, 16.0f, 17.0f,
16
17 18.0f, 19.0f, 20.0f,
18 21.0f, 22.0f, 23.0f };
19
20 std::cout << "l_data (ptr): " << l_data << std::endl;
21
22 // TODO: Add ATen code
23
24 std::cout << "finished running ATen examples" << std::endl;
25
26 return EXIT_SUCCESS;
27}
The code frame contains the single C++ file src/aten.cpp
given in Listing 2.2.1.
Inside of main
, the one-dimensional array l_data
is allocated on the stack and assigned some values.
In this task, we’ll initially use l_data
as the memory for our tensors’ data.
Tasks
Listen to the Torch vs ATen APIs episode of the PyTorch Developer Podcast.
Create the rank-3 tensor \(T = [T_0, T_1, T_2, T_3]^T\) (see Eq. (2.1.1)) using FP32 elements. Use the array
l_data
for your the storage andat::from_blob()
to create the tensor. Usel_tensor
as the name of your tensor.Print the tensor itself and respective metadata (data_ptr, dtype, sizes, strides, storage_offset, device, layout, is_contiguous).
Demonstrate that you may manipulate the tensor’s data by either using the raw C pointer
l_data
or by going through ATen.Create a new view
l_view
which assumes the index 1 for the second dimension ofl_tensor
, i.e., in Python you could writel_tensor[:,1:]
. Illustrate that the two tensors use the same memory for their data.Create a new tensor
l_cont
by callingl_view.contiguous()
. How is the resultl_cont
different froml_view
?
Hint
PyTorch’s documentation might miss some of ATen’s operators and classes. Therefore, our homepage hosts documentation generated from ATen’s source code. The following list provides links to respective ATen functions which are required to solve the tasks of this section:
The documentation was created from the PyTorch sources using tag v2.0.0. Doxygen is the tool which created the documentation based on a modified Doxyfile.
Operations
Now, let’s have a look at the ATen-native operations matmul
and bmm
.
Given two matrices \(A\) and \(B\), the matmul
functions computes the matrix-matrix product \(C = AB\).
The name bmm
is short for ‘’batched matrix-matrix multiplication’’.
Assume a series of input matrices \(A_i\) and \(B_i\) with \(i \in 1 \ldots N\).
All matrices \(A_i\) are assumed to have the same shape and all matrices \(B_i\) are assumed to have the same shape.
Then the batched matrix-matrix product is given as \(C_i = A_i B_i\) with \(i \in 1 \ldots N\).
Tasks
Use
at::rand
to create a randomly initialized \(16 \times 4\) matrix \(A\) and \(4 \times 16\) matrix \(B\).Use
at::matmul
to multiply the two matrices \(A\) and \(B\).Use
at::rand
to create a randomly initialized \(16 \times 4 \times 2\) tensor \(T_0\) and a \(16 \times 2 \times 4\) tensor \(T_1\).Use
at::bmm
to multiply the respective sixteen \(4 \times 2\) matrices of \(T_0\) with the sixteen \(2 \times 4\) matrices of \(T_1\).