The reason you decided to use the cluster system is probably it’s computing resources. However, computing nodes are not accessible directly to users.
Please note: If you have a job running, you can log on to the nodes taking part in that job directly.
With around 250 people using the cluster system for their research every year, there has to be an instance organizing and allocating resources among users. This instance is called the batch system. Currently, it is running as a software package called PBS/Torque (portable batch system). During the course of the year 2020, the whole cluster system will be migrated to use a new scheduling software called SLURM. That software is described in the next chapter. In this (current) chapter, the software used up to now (PBS/Torque) is described, and we would like to recommend to start here for generic usage.
Most work with the cluster system is done as jobs. Jobs are the framework with which the computing resources of the cluster system can be used and they are started by the
qsub command. Generally
qsub has the following form.
qsub <Optionen> <Name des Jobscripts>
The manual page for
qsub can be accessed like this.
You can quit reading the manual page by pressing the ’q’ key.
After logging in to the cluster system you are located on a login node. These machines are not meant to be used for large computations, i.e. simulation runs. In order to keep these nodes accessible their load has to be minimised. Therefore processes will be killed automatically after 30 minutes of elapsed cpu-time. Please use interactive jobs for tasks like pre- or post-processing and even some larger compilations in order to avoid the frustrating experience of sudden shut down of your application.
The simplest way of using the cluster system’s compute power is by starting an interactive job. This can be done by issuing the
qsub command with -I option on any login node.
zzzzsaal@login02:~$ qsub -I ACHTUNG / WARNING: 'mem' parameter not present; the default value will be used (1800mb) 'walltime' parameter not present; the default value will be used (24:00:00) 'nodes' parameter not present; the default value will be used (nodes=1:ppn=1) qsub: waiting for job 1001152.batch.css.lan to start qsub: job 1001152.batch.css.lan ready zzzzsaal@lena-n080:~$
In this example a user by the name of zzzzsaal issues the
qsub command from the node login02. Following this the batch-system warns about missing parameters and starts a job with ID 1001152.batch.css.lan. Using the short JobID 1001152 is more common. Afterwards user zzzzsaal is located on machine lena-n080 which can be seen by looking at the command prompt, which now shows @lena-n080 to indicate this. This is node number 80 of the Lena cluster. From now on this node’s computing power can be utilised.
This simplest form of an interactive job uses default values for all resource specifications. In practice resource specifications should always be adapted to fit one’s needs. This can be done by supplying the
qsub command with options. A listing of possible options can be found in section. The following example illustrates how user zzzzsaal requests specific resources starting from login node login02 inside an interactive job. For this interactive job the user requests one cpu-core on one machine and 2 GB of memory for an hour. Additionally the
-X option is used, which switches on X window forwarding, so applications with graphical user interfaces can be used.
zzzzsaal@login02:~$ qsub -I -X -l nodes=1:ppn=1 -l walltime=01:00:00 -l mem=2GB qsub: waiting for job 1001154.batch.css.lan to start qsub: job 1001154.batch.css.lan ready zzzzsaal@lena-n079:~$
After the job with JobID 1001154 has started, the machine named lena-n079 is ready to be used. An extended example of how to utilise interactive jobs is given in section 6.10.
In preparation for batch jobs interactive jobs should be used. Within interactive jobs all commands can be entered which are later going to make up a batch script, thus testing functionality. Only if everything works should the commands be put into a batch script line by line. This line to line transcript of an interactive session can be used as batch script. In case you were given a batch script by other people, take some time to enter all the commands in an interactive job. This way you familiarise yourself with what the individual commands do.
In order to request the same resources as with the interactive job from section within a batch job, the following can be written to a file.
#!/bin/bash -login # resource specification #PBS -l nodes=1:ppn=1 #PBS -l walltime=01:00:00 #PBS -l mem=2GB # commands to execute date
Generally jobscripts can be divided into two parts, resource specification and commands to be executed. Lines beginning with # are comments with two exceptions. The first line here specifies the shell which is used to interpret the script. In this case the shell is bash. Also lines beginning with #PBS are recognised by PBS, portable batch system, as resource specifications. In this case the resources requested are one cpu-core on one machine and 2 GB of memory for an hour. The section with commands to executed only contains a signle command. The date command returns the current date and time.
This file is saved as batch-job-example.sh and can afterwards be submitted to the batch system by issuing the
zzzzsaal@login02:~$ qsub batch-job-example.sh 1001187.batch.css.lan
After submitting the jobscript a JobID is returned by the batch system, in this case 1001187. After the job has finished two files can be found in the directory which the jobscript was submitted from.
zzzzsaal@login02:~$ ls -lh total 12K -rw-r--r-- 1 zzzzsaal zzzz 137 19. Apr 12:54 batch-job-example.sh -rw------- 1 zzzzsaal zzzz 0 19. Apr 12:59 batch-job-example.sh.e1001187 -rw------- 1 zzzzsaal zzzz 30 19. Apr 12:59 batch-job-example.sh.o1001187
The first file has the extension .e1001187, which holds all error messages which occurred during job execution. In this case this file is empty. The second file has the extension .o1001187 and contains all messages which would have been displayed on the terminal and have been redirected here. By displaying this file’s contents this can be verified.
zzzzsaal@login02:~$ cat batch-job-example.sh.o1001187 Tue Apr 19 12:59:18 CEST 2016
The file contains the output of the
Please note:Jobscripts written under windows need converting, see section
Creating a jobscript under windows and copying it onto the cluster system may create the following error message when submitting that jobscript with
zzzzsaal@login02:~$ qsub WindowsDatei.txt qsub: script is written in DOS/Windows text format
Check this file with the
zzzzsaal@login02:~$ file WindowsDatei.txt WindowsDatei.txt: ASCII text, with CRLF line terminators
Convert the file to Unix format.
zzzzsaal@login02:~$ dos2unix WindowsDatei.txt dos2unix: converting file WindowsDatei.txt to UNIX format ...
Check the file again with the
file command to see if conversion was successful.
zzzzsaal@login02:~$ file WindowsDatei.txt WindowsDatei.txt: ASCII text
Following is a list of selected PBS options, which allow job control. These options are valid for interactive as well as batch jobs.
|declares a name for the job|
|join standard output and error streams|
|request n nodes and p cpu cores per node|
|requested wall clock time in format hh:mm:ss|
|requests RAM according to value, possible suffix of kb, mb or gb|
|list of users to whom mail is sent about the job|
|send mail on (one or multiple selections): a - job abort, b - job beginning, e - job end|
|all environment variables are exported to the job|
|destination queue of the job, see section|
|partition to be used, e.g. to make the job run on a specific cpu architecture. See section|
|job is to be run interactively|
More options can be found on the man page, which can be opened with the command
“When a batch job is started, a number of variables are introduced into the job’s environment that can be used by the batch script in making decisions, creating output files, and so forth. These variables are listed in the following table”1) :
|Job’s submission directory|
|File containing line delimited list on nodes allocated to the job|
|User specified jobname|
|Submit PBS job|
|Show status of PBS jobs|
|Delete Job jobid|
All of the above commands have detailed manual pages, which can be viewed with the following command:
In order to exit the manual page, press
On the login-nodes the
pbsnodes command can be used to obtain information about resources. For example, the amount of RAM of one of the “Terabyte-Machines” in the helena queue can be queried.
At first the output will seem a little bit confusing. It shows, among others, the following parameter.
This output can be converted into gb. 1024kb equals 1mb, 1024mb equals 1gb etc. … This way you know the maximum number of RAM you can request on one machine in queue helena is 1009 gb.
If you get a seemingly strange Error, try first to subtract 128 from it - thus, error 137 becomes error 9, which corresponds to SIGKILL, which in turn may be the result of PBS or Maui intentionally killing your job e.g. because it tried to use more memory than you requested - or because it overstepped its wall time.
You can find out more about Unix signals and their meanings here: https://en.wikipedia.org/wiki/Signal_(IPC)#POSIX_signals
A good page about PBS errors is here: https://www.nas.nasa.gov/hecc/support/kb/PBS-exit-codes_185.html
Currently the cluster system consists of the following compute resources, see table Computing Hardware
Please note: The length of support contracts for individual clusters varies. Should you need an identical hardware platform over the next years, please choose Lena.
There are multiple queues available:
|This is the default queue and does not have to be requested explicitly. PBS will route a job to matching nodes.|
|A queue for jobs with large RAM requirements up to 1 Terabyte.|
|Use this queue in order to utilize GPU ressources.|
|A queue for testing. There is one node with 12 processors and 48 GB of RAM available. Maximum walltime is 6 hours.|
In addition to the queues, several partitions are defined in the cluster by which e.g. a finer control of requests within the queue “all” is possible. A partition usually consists of nodes having identical properties. By selecting a specific partition, you can e.g. choose which computer architecture a job should run on – see table Computing Hardware.
Under PBS/Maui, only one machine equipped with GPUs is available. In order to reach it, you need to use the queue “gpu”. This node can be used only by one person (and therefore only by one job) at a time to avoid interferences, which can lead to significant waiting times. New machines with GPU ressources are therefore integrated into the cluster under control of the SLURM scheduler, which avoids some of the restrictions - see section on how to use those nodes. SLURM will become the standard scheduling system for the cluster in the future.
(FCH) machines are divided into one partition per institute. The partition name mostly corresponds to the institutes‘ abbreviation.
Some maximum values exists, which can not be exceeded. Maximum walltime per job is limited, as well as maximum number of simultaneously running jobs. Furthermore the number of cpus is limited. All these limits apply per user name.
# start an interactive job, what happens? qsub -I # exit this interactive job exit # specify all resource parameters, so no defaults get used qsub -I -X -l nodes=1:ppn=1 -l walltime=01:00:00 -l mem=2GB # load module for octave module load octave/3.8.1 # start octave octave # inside octave the following commands create a plot octave:1> x = 0:10; octave:2> y = x.^2; octave:3> h = figure(1); octave:4> plot(x,y); octave:5> print('-f1','-dpng','myplot') octave:6> exit # display newly created image display myplot.png
Interactive jobs are useful for debugging - always use interactive first
Create a file named MyBatchPlot.m
Create a file named MyFirstBatchJob.sh
#!/bin/bash -login #PBS -l nodes=1:ppn=1 #PBS -l walltime=01:00:00 #PBS -l mem=2GB # load octave module module load octave/3.8.1 # start octave octave MyBatchPlot.m
Submit the job script
Check files MyFirstBatchJob.sh.o* and MyFirstBatchJob.sh.e*