.. _ch:remote_ml: Remote Machine Learning ======================= State-of-the-art machine learning is compute hungry. Often times our local notebooks or desktops are insufficient to power recent models. In this class we'll learn how to use a dedicated compute cluster for our machine learning workloads. This is a crucial skill to have in our toolbox and best learned by starting small. We'll harness the `Draco `__ at Friedrich Schiller University Jena to learn about remote machine learning. Raspberry Pis ------------- In class we will use a set of Raspberry Pis which act as our Desktops. To access them you need a KSZ account which you might still have to apply for. For further information, please visit KSZ's `homepage `__. The KSZ accounts allows you to log into the Raspberry Pis in the lab room: #. You should see an Ubuntu login screen. **Don't enter your credentials here.** First, press **Ctrl + Alt + F2**. #. Now a shell interface opens. Here, you can provide your username, press enter, put in your KSZ password and press enter again. #. A few lines of text will pop up. You can ignore them. **Press enter one more time.** #. Next, type the command ``startx`` and press enter. Now you are all set up. Have fun! 😀 After finishing your work, you need to log out of the device: #. In the bottom right corner of the screen, press the **red power button**. #. A pop-up will open. Press **Logout**. #. Now you are back in the shell. Just type ``exit``, press enter and you're done! .. attention:: Please don't shut the Pis down! .. admonition:: Tasks #. Log into one of the lab room's Raspberry Pis. #. Open a terminal and run the two commands ``hostname`` and ``lscpu``. #. Log out from the Pi. Draco ----- Requesting Access ^^^^^^^^^^^^^^^^^ As with all other compute clusters, we have to go through few steps before we are able to log into the machine. We'll get started on the process early such that our accounts are ready once we need them. .. admonition:: Tasks #. Add your URZ username to the list shared in the labs. #. Wait for the info that you are good to go and test your account as outlined in the first steps. .. _ch:remote_ml_first_steps: First Steps ^^^^^^^^^^^ Perfect, everything is set up. On paper we are good to go: Let's take the machine for a spin! .. admonition:: Tasks #. Use your URZ username to log into Draco: .. code-block:: bash ssh -X @login1.draco.uni-jena.de #. Allocate a compute node for three hours: .. code-block:: bash salloc -N 1 -p short -t 03:00:00 Get the id of your job and the name of your node through the ``squeue`` command. Use ``squeue -u `` to only get your user's jobs. #. Connect to the compute node in a second terminal. For example, if you got node ``node009``, do the following: .. code-block:: bash ssh node009 #. Print basic information on the node located in ``/proc/cpuinfo``, ``/proc/meminfo``. Try the tool ``lscpu``. #. Try the module system. ``module help`` should get you started. #. Leave the compute node and (if needed) release your job by providing its id to the ``scancel`` command. Installing PyTorch through Pip ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In the coming weeks we'll use `PyTorch `_ which is one of the most popular machine learning frameworks. This section makes the beginning and installs PyTorch locally through `pip `__ using one of Draco's compute nodes. .. admonition:: Tasks #. Allocate a compute node in the short queue and connect to the node. #. Load the ``tools/python/3.8`` module: .. code-block:: bash module load tools/python/3.8 #. Install the `virtualenv `__ tool which we'll use for the creation of virtual environments: .. code-block:: bash pip install virtualenv #. Make the local tools available in your session: .. code-block:: bash export PATH=/home/le34xez/.local/bin:${PATH} #. Create a new virtual environment with the name ``venv_pytorch``, activate it and install PyTorch: .. code-block:: bash virtualenv venv_pytorch source venv_pytorch/bin/activate pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #. Start a Python command prompt, load PyTorch, check that you have PyTorch 2.0 and exit the prompt: .. code-block:: python import torch print( torch.__version__ ) exit()