.. _ch:slurm: Slurm ===== Slurm, which stands for "Simple Linux Utility for Resource Management," is an open-source cluster management and job scheduling system. Its primary purpose is to efficiently manage and schedule the allocation of computing resources in high-performance computing (HPC) and cluster computing environments. It is widely used in HPC and cluster computing environments to optimize resource utilization and manage the complex scheduling requirements of large-scale computing clusters. It provides a flexible and extensible framework for cluster management and job scheduling, making it an essential tool for researchers, engineers, and system administrators in such environments. Slurm Commands --------------------- **Check SLURM Version** To verify that SLURM is installed and running, you can check the version with: .. code-block:: bash sacct --version **Viewing Cluster Information** This command provides an overview of the cluster's current state, including node availability, partitions, and node status. .. code-block:: bash sinfo **Node States** The ``--states=`` option in Slurm's ``sinfo`` command allows you to filter nodes based on their state. - ``all``: Displays nodes in all states (default if ``--states`` is not specified). - ``idle``: Shows nodes that are currently available for running jobs. - ``alloc``: Displays nodes that are currently allocated to jobs. - ``drain``: Lists nodes that are marked for maintenance or have a drain state due to issues. - ``fail``: Shows nodes in a failed state. - ``completing``: Lists nodes that are finishing the execution of a job. - ``mix``: Displays nodes in a combination of states, useful for complex filtering. - ``down``: Lists nodes that are marked as down, which may be due to hardware or network issues. - ``unkn``: Shows nodes in an unknown state, typically because Slurm cannot determine their status. **Allocating Resources** Request o allocate resources and create an interactive job session on a compute node. .. code-block:: bash salloc [OPTIONS] **Options** - ``-n, --ntasks=``: Specifies the number of tasks (processes or threads) you want to run. This option is particularly useful for parallel or multi-threaded applications. - ``--cpus-per-task=``: Defines the number of CPU cores or threads per task. - ``-p, --partition=``: Specifies the cluster partition or queue where you want to allocate resources. Different partitions may have varying resource configurations. - ``--time=