1. Remote Machine Learning
State-of-the-art machine learning is compute hungry. Often times our local notebooks or desktops are insufficient to power recent models. In this class we’ll learn how to use a dedicated compute cluster for our machine learning workloads. This is a crucial skill to have in our toolbox and best learned by starting small. We’ll harness the Draco at Friedrich Schiller University Jena to learn about remote machine learning.
1.1. Raspberry Pis
In class we will use a set of Raspberry Pis which act as our Desktops. To access them you need a KSZ account which you might still have to apply for. For further information, please visit KSZ’s homepage. The KSZ accounts allows you to log into the Raspberry Pis in the lab room:
You should see an Ubuntu login screen. Don’t enter your credentials here. First, press Ctrl + Alt + F2.
Now a shell interface opens. Here, you can provide your username, press enter, put in your KSZ password and press enter again.
A few lines of text will pop up. You can ignore them. Press enter one more time.
Next, type the command
startx
and press enter. Now you are all set up. Have fun! 😀
After finishing your work, you need to log out of the device:
In the bottom right corner of the screen, press the red power button.
A pop-up will open. Press Logout.
Now you are back in the shell. Just type
exit
, press enter and you’re done!
Attention
Please don’t shut the Pis down!
Tasks
Log into one of the lab room’s Raspberry Pis.
Open a terminal and run the two commands
hostname
andlscpu
.Log out from the Pi.
1.2. Draco
Requesting Access
As with all other compute clusters, we have to go through few steps before we are able to log into the machine. We’ll get started on the process early such that our accounts are ready once we need them.
Tasks
Add your URZ username to the list shared in the labs.
Wait for the info that you are good to go and test your account as outlined in the first steps.
First Steps
Perfect, everything is set up. On paper we are good to go: Let’s take the machine for a spin!
Tasks
Use your URZ username to log into Draco:
ssh -X <username>@login1.draco.uni-jena.de
Allocate a compute node for three hours:
salloc -N 1 -p short -t 03:00:00
Get the id of your job and the name of your node through the
squeue
command. Usesqueue -u <username>
to only get your user’s jobs.Connect to the compute node in a second terminal. For example, if you got node
node009
, do the following:ssh node009
Print basic information on the node located in
/proc/cpuinfo
,/proc/meminfo
. Try the toollscpu
.Try the module system.
module help
should get you started.Leave the compute node and (if needed) release your job by providing its id to the
scancel
command.
Installing PyTorch through Pip
In the coming weeks we’ll use PyTorch which is one of the most popular machine learning frameworks. This section makes the beginning and installs PyTorch locally through pip using one of Draco’s compute nodes.
Tasks
Allocate a compute node in the short queue and connect to the node.
Load the
tools/python/3.8
module:module load tools/python/3.8
Install the virtualenv tool which we’ll use for the creation of virtual environments:
pip install virtualenv
Make the local tools available in your session:
export PATH=/home/le34xez/.local/bin:${PATH}
Create a new virtual environment with the name
venv_pytorch
, activate it and install PyTorch:virtualenv venv_pytorch source venv_pytorch/bin/activate pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Start a Python command prompt, load PyTorch, check that you have PyTorch 2.0 and exit the prompt:
python import torch print( torch.__version__ ) exit()