User Tools

Site Tools


guide:working_with_the_cluster_system

Working with the cluster system


The reason you decided to use the cluster system is probably it’s computing resources. However, computing nodes are not accessible directly to users.

Please note: If you have a job running, you can log on to the nodes taking part in that job directly.

With around 250 people using the cluster system for their research every year, there has to be an instance organizing and allocating resources among users. This instance is called the batch system. Currently, it is running as a software package called PBS/Torque (portable batch system). During the course of the year 2020, the whole cluster system will be migrated to use a new scheduling software called SLURM. That software is described in the next chapter. In this (current) chapter, the software used up to now (PBS/Torque) is described, and we would like to recommend to start here for generic usage.

We currently (as of March 2020) use TORQUE as resource manager and MAUI as scheduler, which upon request allocate resources to users.

Most work with the cluster system is done as jobs. Jobs are the framework with which the computing resources of the cluster system can be used and they are started by the qsub command. Generally qsub has the following form.

qsub <Optionen> <Name des Jobscripts>

The manual page for qsub can be accessed like this.

man qsub

You can quit reading the manual page by pressing the ’q’ key.

Login nodes

After logging in to the cluster system you are located on a login node. These machines are not meant to be used for large computations, i.e. simulation runs. In order to keep these nodes accessible their load has to be minimised. Therefore processes will be killed automatically after 30 minutes of elapsed cpu-time. Please use interactive jobs for tasks like pre- or post-processing and even some larger compilations in order to avoid the frustrating experience of sudden shut down of your application.

Interactive jobs

The simplest way of using the cluster system’s compute power is by starting an interactive job. This can be done by issuing the qsub command with -I option on any login node.

zzzzsaal@login02:~$ qsub -I
ACHTUNG / WARNING:
'mem' parameter not present; the default value will be used (1800mb)
'walltime' parameter not present; the default value will be used (24:00:00)
'nodes' parameter not present; the default value will be used (nodes=1:ppn=1)
qsub: waiting for job 1001152.batch.css.lan to start
qsub: job 1001152.batch.css.lan ready

zzzzsaal@lena-n080:~$

In this example a user by the name of zzzzsaal issues the qsub command from the node login02. Following this the batch-system warns about missing parameters and starts a job with ID 1001152.batch.css.lan. Using the short JobID 1001152 is more common. Afterwards user zzzzsaal is located on machine lena-n080 which can be seen by looking at the command prompt, which now shows @lena-n080 to indicate this. This is node number 80 of the Lena cluster. From now on this node’s computing power can be utilised.

This simplest form of an interactive job uses default values for all resource specifications. In practice resource specifications should always be adapted to fit one’s needs. This can be done by supplying the qsub command with options. A listing of possible options can be found in section. The following example illustrates how user zzzzsaal requests specific resources starting from login node login02 inside an interactive job. For this interactive job the user requests one cpu-core on one machine and 2 GB of memory for an hour. Additionally the -X option is used, which switches on X window forwarding, so applications with graphical user interfaces can be used.

zzzzsaal@login02:~$ qsub -I -X -l nodes=1:ppn=1 -l walltime=01:00:00 -l mem=2GB
qsub: waiting for job 1001154.batch.css.lan to start
qsub: job 1001154.batch.css.lan ready

zzzzsaal@lena-n079:~$

After the job with JobID 1001154 has started, the machine named lena-n079 is ready to be used. An extended example of how to utilise interactive jobs is given in section 6.10.

Batch jobs

In preparation for batch jobs interactive jobs should be used. Within interactive jobs all commands can be entered which are later going to make up a batch script, thus testing functionality. Only if everything works should the commands be put into a batch script line by line. This line to line transcript of an interactive session can be used as batch script. In case you were given a batch script by other people, take some time to enter all the commands in an interactive job. This way you familiarise yourself with what the individual commands do.

In order to request the same resources as with the interactive job from section within a batch job, the following can be written to a file.

#!/bin/bash -login

# resource specification
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -l mem=2GB

# commands to execute
date

Generally jobscripts can be divided into two parts, resource specification and commands to be executed. Lines beginning with # are comments with two exceptions. The first line here specifies the shell which is used to interpret the script. In this case the shell is bash. Also lines beginning with #PBS are recognised by PBS, portable batch system, as resource specifications. In this case the resources requested are one cpu-core on one machine and 2 GB of memory for an hour. The section with commands to executed only contains a signle command. The date command returns the current date and time.

This file is saved as batch-job-example.sh and can afterwards be submitted to the batch system by issuing the qsub command.

zzzzsaal@login02:~$ qsub batch-job-example.sh
1001187.batch.css.lan

After submitting the jobscript a JobID is returned by the batch system, in this case 1001187. After the job has finished two files can be found in the directory which the jobscript was submitted from.

zzzzsaal@login02:~$ ls -lh
total 12K
-rw-r--r-- 1 zzzzsaal zzzz 137 19. Apr 12:54 batch-job-example.sh
-rw------- 1 zzzzsaal zzzz   0 19. Apr 12:59 batch-job-example.sh.e1001187
-rw------- 1 zzzzsaal zzzz  30 19. Apr 12:59 batch-job-example.sh.o1001187

The first file has the extension .e1001187, which holds all error messages which occurred during job execution. In this case this file is empty. The second file has the extension .o1001187 and contains all messages which would have been displayed on the terminal and have been redirected here. By displaying this file’s contents this can be verified.

zzzzsaal@login02:~$ cat batch-job-example.sh.o1001187
Tue Apr 19 12:59:18 CEST 2016

The file contains the output of the date command.

Please note:Jobscripts written under windows need converting, see section

Converting jobscripts written under windows

Creating a jobscript under windows and copying it onto the cluster system may create the following error message when submitting that jobscript with qsub.

zzzzsaal@login02:~$ qsub WindowsDatei.txt
qsub:  script is written in DOS/Windows text format

Check this file with the file command.

zzzzsaal@login02:~$ file WindowsDatei.txt
WindowsDatei.txt: ASCII text, with CRLF line terminators

Convert the file to Unix format.

zzzzsaal@login02:~$ dos2unix WindowsDatei.txt
dos2unix: converting file WindowsDatei.txt to UNIX format ...

Check the file again with the file command to see if conversion was successful.

zzzzsaal@login02:~$ file WindowsDatei.txt
WindowsDatei.txt: ASCII text

PBS options

Following is a list of selected PBS options, which allow job control. These options are valid for interactive as well as batch jobs.

-N name declares a name for the job
-j oe join standard output and error streams
-l nodes=n:ppn=p request n nodes and p cpu cores per node
-l walltime=time requested wall clock time in format hh:mm:ss
-l mem=value requests RAM according to value, possible suffix of kb, mb or gb
-M email address list of users to whom mail is sent about the job
-m abe send mail on (one or multiple selections): a - job abort, b - job beginning, e - job end
-V all environment variables are exported to the job
-q queue destination queue of the job, see section
-W x=PARTITION:name partition to be used, e.g. to make the job run on a specific cpu architecture. See section
-I job is to be run interactively

More options can be found on the man page, which can be opened with the command man qsub.

PBS environment variables

“When a batch job is started, a number of variables are introduced into the job’s environment that can be used by the batch script in making decisions, creating output files, and so forth. These variables are listed in the following table”1) :

PBS_O_WORKDIR Job’s submission directory
PBS_NODEFILE File containing line delimited list on nodes allocated to the job
PBS_QUEUE Job queue
PBS_JOBNAME User specified jobname
PBS_JOBID Unique JobID

PBS commands

qsub script Submit PBS job
showq Show status of PBS jobs
qdel jobid Delete Job jobid

All of the above commands have detailed manual pages, which can be viewed with the following command:

man <command>

In order to exit the manual page, press q.

pbsnodes

On the login-nodes the pbsnodes command can be used to obtain information about resources. For example, the amount of RAM of one of the “Terabyte-Machines” in the helena queue can be queried.

pbsnodes smp-n031

At first the output will seem a little bit confusing. It shows, among others, the following parameter.

physmem=1058644176kb

This output can be converted into gb. 1024kb equals 1mb, 1024mb equals 1gb etc. … This way you know the maximum number of RAM you can request on one machine in queue helena is 1009 gb.

PBS errors

If you get a seemingly strange Error, try first to subtract 128 from it - thus, error 137 becomes error 9, which corresponds to SIGKILL, which in turn may be the result of PBS or Maui intentionally killing your job e.g. because it tried to use more memory than you requested - or because it overstepped its wall time.

You can find out more about Unix signals and their meanings here: https://en.wikipedia.org/wiki/Signal_(IPC)#POSIX_signals

A good page about PBS errors is here: https://www.nas.nasa.gov/hecc/support/kb/PBS-exit-codes_185.html

Queues & partitions

Currently the cluster system consists of the following compute resources, see table Computing Hardware

Please note: The length of support contracts for individual clusters varies. Should you need an identical hardware platform over the next years, please choose Lena.

There are multiple queues available:

all This is the default queue and does not have to be requested explicitly. PBS will route a job to matching nodes.
helena A queue for jobs with large RAM requirements up to 1 Terabyte.
gpu Use this queue in order to utilize GPU ressources.
test A queue for testing. There is one node with 12 processors and 48 GB of RAM available. Maximum walltime is 6 hours.

In addition to the queues, several partitions are defined in the cluster by which e.g. a finer control of requests within the queue “all” is possible. A partition usually consists of nodes having identical properties. By selecting a specific partition, you can e.g. choose which computer architecture a job should run on – see table Computing Hardware.

GPU

Under PBS/Maui, only one machine equipped with GPUs is available. In order to reach it, you need to use the queue “gpu”. This node can be used only by one person (and therefore only by one job) at a time to avoid interferences, which can lead to significant waiting times. New machines with GPU ressources are therefore integrated into the cluster under control of the SLURM scheduler, which avoids some of the restrictions - see section on how to use those nodes. SLURM will become the standard scheduling system for the cluster in the future.

Forschungscluster-Housing

(FCH) machines are divided into one partition per institute. The partition name mostly corresponds to the institutes‘ abbreviation.

Maximum resource requests

Some maximum values exists, which can not be exceeded. Maximum walltime per job is limited, as well as maximum number of simultaneously running jobs. Furthermore the number of cpus is limited. All these limits apply per user name.

  • Walltime Maximum walltime is 200 hours per job
  • Jobs The maximum number of running jobs per user is 64
  • CPUs The overall maximum number of CPUs (ppn) all running jobs can use is 768 per user

Excercise: interactive job

# start an interactive job, what happens?
qsub -I

# exit this interactive job
exit

# specify all resource parameters, so no defaults get used
qsub -I -X -l nodes=1:ppn=1 -l walltime=01:00:00 -l mem=2GB

# load module for octave
module load octave/3.8.1

# start octave
octave
# inside octave the following commands create a plot
octave:1> x = 0:10;
octave:2> y = x.^2;
octave:3> h = figure(1);
octave:4> plot(x,y);
octave:5> print('-f1','-dpng','myplot')
octave:6> exit

# display newly created image
display myplot.png

Interactive jobs are useful for debugging - always use interactive first

Excercise: batch job

Create a file named MyBatchPlot.m

MyBatchPlot.m
x = 0:10;
y = x.^2;
h = figure(1);
plot(x,y);
print('-f1','-dpng','MyBatchPlot');

Create a file named MyFirstBatchJob.sh

MyFirstBatchJob.sh
#!/bin/bash -login
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -l mem=2GB
 
# load octave module
module load octave/3.8.1
 
# start octave
octave MyBatchPlot.m

Submit the job script

qsub MyFirstBatchJob.sh

Check files MyFirstBatchJob.sh.o* and MyFirstBatchJob.sh.e*

guide/working_with_the_cluster_system.txt · Last modified: 2021/06/01 09:42 by zzzzgaus