Using the queue system on skd-cyclone: Difference between revisions
Line 70: | Line 70: | ||
== Description of each element == | == Description of each element == | ||
The SLURM queue is controlled by a set of parameters at the beginning of a shell script: | |||
#SLURM --<parameter>=<argument> | |||
These parameters control for example the output directory, emails, resource allocation and emailing. | |||
The most common script parameters are explained here in short: | The most common script parameters are explained here in short: |
Revision as of 16:02, 1 November 2016
General information
Purpose
The machine skd-cyclone.klientdrift.uib.no is used as a general-purpose linux computer with several connected storage. Researchers, PostDocs, PhD students and Master students use the machine for calculations. On several occasions, this has lead shortages in calculation resources, concerning both memory and CPU. The machine has started to "swap" memory to the disk, slowing down all calculations on the computer to an extend where it can not be used any more.
The queuing system on skd-cyclone is in place to schedule jobs according to the available resources on the machine. This leads to a more balanced load of the machine, and avoids swapping situations.
Advantages
The main advantage of the queue system is that is makes calculations on skd-cyclone more stable and reliable, and thus overall faster.
Further advantages are:
- Interactive run of programs still possible
- Lightweight wrapper scripts required to run jobs on queue
- Email on fail/completion of job
- Queues with different priority
- The queue at GFI is a way of introducing already MSc students to requirements of a HPC environment
Disadvantages
- No use of screen or similar commands to interactively connect to running jobs
- Waiting time when queue is full
- Need to write "wrapper script" (see below)
Rules and procedures
- The queue system is opt-in basis. That means that there is no obligation to use the queue system. Small jobs, for example, can run directly on the machine. Users who repeatedly run large jobs, however, such as NWP models or diagnostic code, will be asked to submit their job to the queue system.
Short tutorial
1. create a basic wrapper script for your job
# SLURM wrapper script # start this job with sbatch NAME_OF_THIS_FILE directly on the machine #SBATCH --job-name=my_slurm_job #SBATCH --workdir=/scratch/$USER #SBATCH --partition=default #SBATCH --output=results.%j.txt #SBATCH --error=errors.%j.out #SBATCH --mail-type=END #SBATCH --time=12:00:00 #SBATCH --mem-per-cpu=100 <apply any required path settings> <change to your working directory> <command to start your job> # done
2. submit the script
sbatch ./my_queue_job.sh
3. receive an email when the job is finished
From: slurm@klientdrift.uib.no To: <username>@uib.no Subject: SLURM Job_id=34 Name=WaterSip3.0 Ended, Run time 00:01:00
Converting a script to a queue script
- add elements to the header
Description of each element
The SLURM queue is controlled by a set of parameters at the beginning of a shell script:
#SLURM --<parameter>=<argument>
These parameters control for example the output directory, emails, resource allocation and emailing.
The most common script parameters are explained here in short:
- work_dir: output files will be created in this directory.
Example: --workdir=/scratch/$USER
- queue: which queue the output should be processed on.
Example: --partition=default
- job-name:
Example: --job-name=my_slurm_job
- output:
Example: --output=results.%j.txt
- error:
Example: --error=errors.%j.out
- mail-type:
Example: --mail-type=END
- time:
Example: --time=12:00:00
- mem-per-cpu:
Example: --mem-per-cpu=100
Best practices, Q&A
Further information
- Contacts
- Links