The number of software packages that are installed together with the operating system on cluster nodes is kept light on purpose. Additional packages and applications are provided by a module system, which enables you to easily customise your working environment on the cluster. This module system is called Lmod1) . Furthermore, we can provide different versions of the software which you can use on demand. Loading a module, software specific settings are applied, e.g. changing environment variables like PATH
, LD_LIBRARY_PATH
and MANPATH
.
Alternatively, you can manage software packages on the cluster yourself by building software from source, by means of EasyBuild or by using Singularity containers. Python packages can also be installed using the Conda manager. The first three possibilities, in addition to the module system, are described in the current section, whereas Conda usage in the cluster is explained in this section.
We have adopted a systematic software naming and versioning convention in conjunction with the software installation system EasyBuild
2) .
Software installation on the cluster utilizes a hierarchical software module naming scheme. This means that the command module avail
does not display all installed software modules right away. Instead, only the modules that are immediately available for loading are displayed. More modules become available after their prerequisite modules are loaded. Specifically, loading a compiler module or MPI implementation module will make available all the software built with those applications. This way, he hope the prerequisites for certain software become apparent.
At the top level of the module hierarchy, there are modules for compilers, toolchains and software applications that come as a binary and thus do not depend on compilers. Toolchain modules organize compilers, MPI implementations and numerical libraries. Currently the following toolchain modules are available:
GCC
: GCC updated binutilsiccifort
: Intel compilers, GCCgompi
: GCC
, OpenMPIiimpi
: iccifort
, Intel MPIiompi
: iccifort
, OpenMPIfoss
: gompi
, OpenBLAS, FFTW, ScaLAPACKintel
: iimpi
, Intel MKLiomkl
: iompi
, Intel MKL
Note that Intel compilers newer than 2020.x are provided by the toolchain module intel-compilers
.
It is strongly recommended to use this module as after 2023 the intel compiler modules iccifort
will be removed.
This section explains how to use software modules.
List the entire list of possible modules
module spider
The same in a more compact list
module -t spider
Search for specific modules that have “string” in their name
module spider string
Detailed information about a particular version of a module (including instructions on how to load the module)
module spider name/version
List modules immediately available to load
module avail
Some software modules are hidden from the avail
and spider
commands. These are mostly modules for system library packages that other user applications depend on. To list hidden modules, you may provide the –show-hidden
option to the avail
and spider
commands:
module --show-hidden avail module --show-hidden spider
A hidden module has a dot (.
) in front of its version numbers (eg. zlib/.1.2.8
).
List currently loaded modules
module list
Load a specific version of a module
module load name/version
If only a name is given, the command will load the default version which is marked with a (D) in the module avail
listing (usually the latest version). Loading a module may automatically load other modules it depends on.
It is not possible to load two versions of the same module at the same time.
To switch between different modules
module swap old new
To unload the specified module from the current environment
module unload name
To clean your environment of all loaded modules
module purge
Show what environment variables the module will set
module show name/version
Save the current list of modules to “name” collection for later use
module save name
Restore modules from collection “name”
module restore name
List of saved collections
module savelist
To get the complete list of options provided by Lmod through the command module
type the following
module help
As an example, we show how to load the gnuplot
module.
List loaded modules
module list No modules loaded
Find available gnuplot versions
module -t spider gnuplot gnuplot/4.6.0 gnuplot/5.0.3
Determine how to load the selected gnuplot/5.0.3
module
module spider gnuplot/5.0.3 -------------------------------------------------------------------------------- gnuplot: gnuplot/5.0.3 -------------------------------------------------------------------------------- Description: Portable interactive, function plotting utility - Homepage: http://gnuplot.sourceforge.net/ This module can only be loaded through the following modules: GCC/4.9.3-2.25 OpenMPI/1.10.2 Help: Portable interactive, function plotting utility - Homepage: http://gnuplot.sourceforge.net/
Load required modules
module load GCC/4.9.3-2.25 OpenMPI/1.10.2 Module for GCCcore, version .4.9.3 loaded Module for binutils, version .2.25 loaded Module for GCC, version 4.9.3-2.25 loaded Module for numactl, version .2.0.11 loaded Module for hwloc, version .1.11.2 loaded Module for OpenMPI, version 1.10.2 loaded
And finally load the selected gnuplot
module
module load gnuplot/5.0.3 Module for OpenBLAS, version 0.2.15-LAPACK-3.6.0 loaded Module for FFTW, version 3.3.4 loaded Module for ScaLAPACK, version 2.0.2-OpenBLAS-0.2.15-LAPACK-3.6.0 loaded Module for bzip2, version .1.0.6 loaded Module for zlib, version .1.2.8 loaded ............. .............
In order to simplify the procedure of loading the gnuplot module, the current list of loaded modules can be saved in a “mygnuplot” collection (the name string “mygnuplot” is, of course, arbitrary) and then loaded again when needed as follows:
Save loaded modules to “mygnuplot”
module save mygnuplot Saved current collection of modules to: mygnuplot
If “mygnuplot” not is specified, the name “default” will be used.
Remove all loaded modules (or open a new shell)
module purge Module for gnuplot, version 5.0.3 unloaded Module for Qt, version 4.8.7 unloaded Module for libXt, version .1.1.5 unloaded ............ ............
List currently loaded modules. This selection is empty now.
module list No modules loaded
List saved collections
module savelist Named collection list: 1) mygnuplot
Load gnuplot
module again
module restore mygnuplot Restoring modules to user's mygnuplot Module for GCCcore, version .4.9.3 loaded Module for binutils, version .2.25 loaded Module for GCC, version 4.9.3-2.25 loaded Module for numactl, version .2.0.11 loaded ............. .............
In this section, you will find user guides for some of the software packages installed in the cluster. The guides provided can, of course, not replace documentation that comes with the application. Please read that as well.
A wide variety of application software is available in the cluster system. These applications are located on a central storage system that is accessible by the module system Lmod via an NFS export. Issue the command module spider
on the cluster system or visit the page for a comprehensive list of available software. If you really need a different version of an already installed application, or one that is currently not installed, please get in touch. The main prerequisite for use of a software within the cluster system is its availability for Linux. Furthermore, if the application requires a license, we need to clarify additional questions.
Some select Windows applications can also be executed on the cluster system with the help of Wine or Singularity containers. For information on Singularity, see section.
Note: We recommend using EasyBuild (see next section) if you want to make your software's build process reproducible and accessible through a module environment that EasyBuild automatically creates. EasyBuild comes with pre-configured recipes for installing thousands of scientific applications.
Sub-clusters of the cluster system have different CPU architectures. The command lcpuarchs
(available on the login nodes) lists all available CPU types.
login03:~$ lcpuarchs -v CPU arch names Cluster partitions -------------- ------------------ CPU arch names Cluster partitions -------------- ------------------ haswell LUIS[haku,lena,smp] FCH[ai,gih,isd,iqo,iwes,pci,fi,imuk] nehalem LUIS[smp,helena] FCH[] sandybridge LUIS[dumbo] FCH[iazd,isu,itp] skylake LUIS[gpu,gpu.test,amo,taurus,vis] FCH[tnt,isd,stahl,enos,pcikoe,pcikoe,isu,phd,phdgpu,muk,fi,itp] CPU of this machine: haswell For more verbose output type: lcpuarchs -vv
If a software executable built using architecture specific compiler options runs on a machine with unsuitable CPUs, then the “Illegal instruction” error message may be triggered. For example, if you compile your program on a skylake
node (eg. amo
sub-cluster) using the gcc
compiler option -march=native
, the program may not run on sandybridge
nodes (eg. dumbo
sub-cluster).
This section explains how to build a software on the cluster system to avoid the aforementioned issue and be able to submit jobs to all compute nodes without specifying the CPU type.
In the example below we want to compile the sample software my-soft
in version 3.1
.
In your HOME (or in $SOFTWARE if all project members should have access to the software) directory, create build
/install
directories for each available CPU architecture
listed by the command lcpuarchs -s
, as well as a directory source
for storing the software installation sources
login03:~$ mkdir -p $HOME/sw/source login03:~$ eval "mkdir -p $HOME/sw/{$(lcpuarchs -ss)}/my-soft/3.1/{build,install}"
Copy software installation archive to the source directory
login03:~$ mv my-soft-3.1.tar.gz $HOME/sw/source
Build my-soft
for each available CPU architecture by submitting an interactive job to each compute node type requesting the proper CPU type. For example, to compile my-soft
for skylake
nodes, first submit an interactive job requesting the skylake
feature:
login03:~$ salloc --nodes=1 --constraint=skylake --cpus-per-task=4 --time=6:00:00 --mem=16G
Then unpack and build the software. Note below the environment variable $ARCH
storing the CPU type of reserved compute node.
taurus-n034:~$ tar -zxvf $HOME/sw/source/my-soft-3.1.tgz -C $HOME/sw/$ARCH/my-soft/3.1/build taurus-n034:~$ cd $HOME/sw/$ARCH/my-soft/3.1/build taurus-n034:~$ ./configure --prefix=$HOME/sw/$ARCH/my-soft/3.1/install && make && make install
Finally, use the environment variable $ARCH
in your job scripts to access the correct installation path of
my-soft
executable for the current compute node. Note that you may need to set/update the LD_LIBRARY_PATH
environment variable
to point to the location of your software's shared libraries.
#!/bin/bash -l #SBATCH --job-name=my-soft #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --mem=60G #SBATCH --time=12:00:00 #SBATCH --constraint=[skylake|haswell] #SBATCH --output my-soft-job_%j.out #SBATCH --error my-soft-job_%j.err #SBATCH --mail-user=myemail@....uni-hannover.de #SBATCH --mail-type=BEGIN,END,FAIL # Change to work dir cd $SLURM_SUBMIT_DIR # Load modules module load my_necessary_modules # run my_soft export LD_LIBRARY_PATH=$HOME/sw/$ARCH/my-soft/3.1/install/lib:$LD_LIBRARY_PATH srun $HOME/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe --input file.input
You can certainly consider combining the software build and execution steps into a single batch job script. However, it is recommended that you first perform the build steps interactively before adding them to a job script to ensure the software compiles without errors. For example, such a job script might look like this:
#!/bin/bash -l #SBATCH --job-name=my-soft #SBATCH --nodes=1 #SBATCH --ntasks-per-node=32 #SBATCH --mem=120G #SBATCH --time=12:00:00 #SBATCH --constraint=skylake #SBATCH --output my-soft-job_%j.out #SBATCH --error my-soft-job_%j.err #SBATCH --mail-user=myemail@....uni-hannover.de #SBATCH --mail-type=BEGIN,END,FAIL # Change to work dir cd $SLURM_SUBMIT_DIR # Load modules module load my_necessary_modules # install software if the executable does not exist [ -e "$HOME/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe" ] || { mkdir -p $HOME/sw/$ARCH/mysof/3.1/{build,install} tar -zxvf $HOME/sw/source/my-soft-3.1.tgz -C $HOME/sw/$ARCH/my-soft/3.1/build cd $HOME/sw/$ARCH/my-soft/3.1/build ./configure --prefix=$HOME/sw/$ARCH/my-soft/3.1/install make make install } # run my_soft export LD_LIBRARY_PATH=$HOME/sw/$ARCH/my-soft/3.1/install/lib:$LD_LIBRARY_PATH srun $HOME/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe --input file.input
Note: If you want to manually build the software from source code, please refer to the section above.
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.
The EasyBuild framework is available in the cluster through the module EasyBuild-custom
. This module defines the location of the EasyBuild configuration files, recipes and installation directories. You can load the module using the command:
module load EasyBuild-custom
EasyBuild software and modules will be installed by default under the following directory:
$HOME/my.soft/software/$ARCH $HOME/my.soft/modules/$ARCH
Here, the variable ARCH
, which stores the CPU type of the machine on which the above module load command was executed, will currently be either haswell
, sandybridge
or skylake
. The command lcpuarchs
executed on the cluster login nodes lists all currently available values of ARCH
. You can override the default software and module installation directory, and the location of your EasyBuild configuration files (MY_EASYBUILD_REPOSITORY
) by exporting the following environment variables before loading the EasyBuild module:
export EASYBUILD_INSTALLPATH=/your/preferred/installation/dir export MY_EASYBUILD_REPOSITORY=/your/easybuild/repository/dir module load EasyBuild-custom
If other project members should also have access to the software, the recommended location is a subdirectory in $SOFTWARE.
After you load the EasyBuild environment as explained in the section above, you will have the command eb
available to build your code using EasyBuild. If you want to build the code using a given configuration <filename>.eb
and resolving dependencies, you will use the flag -r
as in the example below:
eb <filename>.eb -r
The build command just needs the configuration file name with the extension .eb
and not the full path, provided that the configuration file is in your search path: the command eb --show-config
will print the variable robot-paths
that holds the search path. More options are available - please have a look at the short help message typing eb -h
. For instance, using the search flag -S
, you can check if any EasyBuild configuration file already exists for a given program name:
eb -S <program_name>
You will be able to load the modules created by EasyBuild in the directory defined by the EASYBUILD_INSTALLPATH
variable using the following commands:
module use $EASYBUILD_INSTALLPATH/modules/$ARCH/all module load <modulename>/version
The command module use
will prepend the selected directory to your MODULEPATH
environment variable, therefore the command module avail
will show modules of your software as well.
If you want the software module to be automatically available when opening a new shell in the cluster, modify your ~/.bashrc
file as follows:
echo 'export EASYBUILD_INSTALLPATH=/your/preferred/installation/dir' >> ~/.bashrc echo 'module use $EASYBUILD_INSTALLPATH/modules/$ARCH/all' >> ~/.bashrc
Note that to preserve the dollar sign in the second line above, the string must be enclosed in single quotes.
Please note: This instruction has been written for Singularity 3.8.x.
Please note: If you would like to fully manage your singularity container images directly on the cluster, including build and/or modify actions, please contact us and ask for the permission “singularity fakeroot” to be added to your account (because you will need it).
Singularity enables users to execute containers on High-Performance Computing (HPC) cluster like they are native programs or scripts on a host computer. For example, if the cluster system is running CentOS Linux, but your application runs in Ubuntu, you can create an Ubuntu container image, install your application into that image, copy the image to an approved location on the cluster and run your application using Singularity in its native Ubuntu environment.
The main advantage of Singularity is that containers are executed as an unprivileged user on the cluster system and, besides the local storage TMPDIR
, they can access the network storage systems like HOME
, BIGWORK
and PROJECT
, as well as GPUs that the host machine is equipped with.
Additionally, Singularity properly integrates with the Message Passing Interface (MPI), and utilizes communication fabrics such as InfiniBand and Intel Omni-Path.
If you want to create a container and set up an environment for your jobs, we recommend that you start by reading the Singularity documentation. The basic steps to get started are described below.
If you already have a pre-build container ready for use, you can simply upload the container image to the cluster and execute it. See the section below about running container images.
Below we will describe how to build a new or modify an existing container directly on the cluster. A container image can be created from scratch using a recipe file, or fetched from some remote container repository. In this sub-section, we will illustrate a recipe file method. In the next one, we will take a glance at remote container repositories.
Using a Singularity recipe file is the recommended way to create containers if you want to build reproducible container images. This example recipe file builds a RockyLinux 8 container:
BootStrap: yum OSVersion: 8 MirrorURL: https://ftp.uni-hannover.de/rocky/%{OSVERSION}/BaseOS/$basearch/os Include: yum wget %setup echo "This section runs on the host outside the container during bootstrap" %post echo "This section runs inside the container during bootstrap" # install packages in the container yum -y groupinstall "Development Tools" yum -y install vim python38 epel-release yum -y install python38-pip # install tensorflow pip3 install --upgrade tensorflow # enable access to BIGWORK and PROJECT storage on the cluster system mkdir -p /bigwork /project %runscript echo "This is what happens when you run the container" echo "Arguments received: $*" exec /bin/python "$@" %test echo "This test will be run at the very end of the bootstrapping process" /bin/python3 --version
This recipe file uses the yum
bootstrap module to bootstrap the core operation system, RockyLinux 8, within the container. For other bootstrap modules (e.g.. docker
) and details on singularity recipe files, refer to the online documentation.
The next step is to build a container image on one of the cluster login servers.
Note: your account must be authorized to use the --fakeroot
option. Please contact us at cluster-help@luis.uni-hannover.de.
Note: Currently, the --fakeroot
option is enabled only on the cluster login nodes.
username@login01$ singularity build --fakeroot rocky8.sif rocky8.def
This creates an image file named rocky8.sif
. By default, singularity containers are built as read-only SIF(Singularity Image Format) image files. Having a container in the form of a file makes it easier to transfer it to other locations both within the cluster and outside of it. Additionally, a SIF file can be signed and verified.
Note that a container as the SIF file can be built on any storage of the cluster you have a write access to. However, it is recommended to build containers either in your $BIGWORK
or in some directory under /tmp
(or use the variable $MY_SINGULARITY
) on the login nodes, because only containers located under these two directories are allowed to be executed using shell
, run
or exec
commands, see the section below,
The latest version of the singularity
command can be used directly on any cluster node without prior activation. Older versions are installed as loadable modules. Execute module spider singularity
to list available versions.
Another easy way to obtain and use a Singularity container is to retrieve pre-build images directly from external repositories. Popular repositories are Docker Hub or Singularity Library. You can go there and search if they have a container that meets your needs. The search
sub-command also lets you search for images at the Singularity Library. For docker images, use the search bock at Docker Hub instead.
In the following example we will pull the latest python container from Docker Hub and save it in a file named python_latest.sif
:
username@login01$ singularity pull docker://python:latest
The build
sub-command can also be used to download images, where you can additionally specify your preferred container file name:
username@login01$ singularity build my-ubuntu20.04.sif library://library/default/ubuntu:20.04
First you should check if you really need to modify the container image. For example, if you are using Python in an image and simply need to add new packages via pip
you can do that without modifying the image by running pip
in the container with the --user
option.
To modify an existing SIF container file, you need to first convert it to a writable sandbox format.
Please note: Since the --fakeroot
option of the shell
and build
sub-commands does not work with container sandbox when the container is located on a shared storage such as BIGWORK
, PROJECT
or HOME
, the container sandbox must be stored locally on the login nodes. We recommend using the /tmp
directory (or variable $MY_SINGULARITY
) which has sufficient capacity.
username@login01$ cd $MY_SINGULARITY username@login01$ singularity build --sandbox rocky8-sandbox rocky8.sif
The build
command above creates a sandbox directory called rocky8-sandbox which you can then shell
into in writable mode and modify the container as desired:
username@login01$ singularity shell --writable --fakeroot --net rocky8-sandbox Singularity> yum install -qy python3-matplotlib
After making all desired changes, you exit the container and convert the sandbox back to the SIF file using:
Singularity> exit username@login01$ singularity build -F --fakeroot rocky8.sif rocky8-sandbox
Note: you can try to remove the sandbox directory rocky8-sandbox afterward but there might be a few files you can not delete due to the namespace mappings that happens. The daily /tmp
cleaner job will eventually clean it up.
Please note: In order to run a Singularity container, the container SIF file or sandbox directory must be located either in your $BIGWORK
or in the /tmp
directory.
There are four ways to run a container under Singularity.
If you simple call the container image as an executable or use the Singularity run
sub-command it will carry out instructions in the %runscript
section of the container recipe file:
How to call the container SIF file:
username@login01:~$ ./rocky8.sif --version This is what happens when you run the container Arguments received: --version Python 3.8.6
Use the run
sub-command:
username@login01:~$ singularity run rocky8.sif --version This is what happens when you run the container Arguments received: --version Python 3.8.6
The Singularity exec
sub-command lets you execute an arbitrary command within your container instead of just the %runscript
. For example, to get the content of file /etc/os-release
inside the container:
username@login01:~$ singularity exec rocky8.sif cat /etc/os-release NAME="Rocky Linux" VERSION="8.4 (Green Obsidian)" ....
The Singularity shell
sub-command invokes an interactive shell within a container. Note the Singularity>
prompt within the shell in the example below:
username@login01:$ singularity shell rocky8.sif Singularity>
Note that all three sub-commands shell
, exec
and run
let you execute a container directly from remote repository without first downloading it on the cluster. For example, to run an one-liner “Hello World” ruby program:
username@login01:$ singularity exec library://sylabs/examples/ruby ruby -e 'puts "Hello World!"' Hello World!
Please note: You can access (read & write mode) your HOME
, BIGWORK
and PROJECT
(only login nodes) storage from inside your container. In addition, the /tmp
(or TMPDIR
on compute nodes) directory of a host machine is automatically mounted in a container. Additional mounts can be specified using the --bind
option of the exec
, run
and shell
sub-commands, see singularity run --help
.
In order to containerize your parallel MPI application and run it properly on the cluster system you have to provide MPI library stack inside your container. In addition, the userspace driver for Mellanox InfiniBand HCAs should be installed in the container to utilize cluster InfiniBand fabric as a MPI transport layer.
This example Singularity recipe file ubuntu-openmpi.def
retrieves an Ubuntu
container from Docker Hub
, and installs required MPI and InfiniBand packages:
BootStrap: docker From: ubuntu:xenial %post # install openmpi & infiniband apt-get update apt-get -y install openmpi-bin openmpi-common libibverbs1 libmlx4-1 # enable access to BIGWORK storage on the cluster mkdir -p /bigwork /project # enable access to /scratch dir. required by mpi jobs mkdir -p /scratch
BootStrap: docker From: ubuntu:latest %post # install openmpi & infiniband apt-get update apt-get -y install openmpi-bin openmpi-common ibverbs-providers # enable access to BIGWORK storage on the cluster mkdir -p /bigwork /project # enable access to /scratch dir. required by mpi jobs mkdir -p /scratch
Once you have built the image file ubuntu-openmpi.sif
as explained in the previous sections, your MPI application can be run as follows (assuming you have already reserved a number of cluster compute nodes):
module load GCC/8.3.0 OpenMPI/3.1.4 mpirun singularity exec ubuntu-openmpi.sif /path/to/your/parallel-mpi-app
The above lines can be entered at the command line of an interactive session, or can also be inserted into a batch job script.
The Apache Hadoop is a framework that allows for the distributed processing of large data sets. The Hadoop-cluster
module helps to launch Hadoop or Spark cluster within the cluster system and manage them by the cluster batch job scheduler.
To run your Hadoop applications on the cluster system you should perform the following steps. The first step is to allocate some number of cluster work machines interactively or in a batch job script. Then, Hadoop cluster has to be started on the allocated nodes. Next, when the Hadoop cluster is up and running, you can launch you Hadoop applications to the cluster. Once your applications are finished, the Hadoop cluster has to be shut down (job termination automatically stops the running Hadoop cluster).
The following example runs the simple word-count MapReduce java application on the Hadoop cluster. The script requests 6 nodes, totally allocating $6\times40$ CPUs to the Hadoop cluster for 30 minutes. The Hadoop cluster with the persistent HDFS storage located on your $BIGWORK/hadoop-cluster/hdfs
directory is started by the command hadoop-cluster start -p
. After completion of the word-count Hadoop application, the command hadoop-cluster stop
shuts the cluster down
We recommend you to run your hadoop jobs on dumbo
cluster partition. The dumbo
cluster nodes provide large ( 21TB on each node) local disk storage.
#!/bin/bash -l #SBATCH --job-name=Hadoop.cluster #SBATCH --nodes=2 #SBATCH --ntasks-per-node=40 #SBATCH --mem=1000G #SBATCH --time=0:30:00 #SBATCH --partition=dumbo # Change to work dir cd $SLURM_SUBMIT_DIR # Load modules module load my_necessary_modules # Load the cluster management module module load Hadoop-cluster/1.0 # Start Hadoop cluster # Cluster storage is located on local disks of reserved nodes. # The storage is not persistent (removed after Hadoop termination) hadoop-cluster start # Report filesystem info&stats hdfs dfsadmin -report # Start the word count app hadoop fs -mkdir -p /data/wordcount/input hadoop fs -put -f $HADOOP_HOME/README.txt $HADOOP_HOME/NOTICE.txt /data/wordcount/input hadoop fs -rm -R -f /data/wordcount/output hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data/wordcount/input/data/wordcount/output hadoop fs -ls -R -h /data/wordcount/output rm -rf output hadoop fs -get /data/wordcount/output # Stop hadoop cluster hadoop-cluster stop
The command hadoop-cluster status
shows the status and configuration of running Hadoop cluster. Refer to the example Hadoop job script at $HC_HOME/examples/test-hadoop-job.sh
for other possible HDFS storage options. Note that to access the variable $HC_HOME
you should have the Hadoop-cluster
module loaded. Note also that you can not load the Hadoop-cluster
module on cluster login nodes.
Apache Spark is a large-scale data processing engine that performs in-memory computing. Spark has much more advantages over the MapReduce framework as far as the real-time processing of large data sets is concerned. It claims to process data up to 100x faster than Hadoop MapReduce in memory, while 10x faster with disks. Spark offers bindings in Java, Scala, Python and R for building parallel applications.
Because of its high memory and I/O bandwidth requirements, we recommend you to run your spark jobs on dumbo
cluster partition.
The batch script below, which asks for 4 nodes and 40 CPUs per node, executes the example java application SparkPi
that estimates the constant $\pi$. The command hadoop-cluster start –spark
starts Spark cluster on Hadoop’s resource manager YARN, which in turn runs on the allocated cluster nodes. Spark job submition to the running Spark cluster is done by the spark-submit command.
Fine tuning of Spark’s configuration can be done by setting parameters in the variable $SPARK_OPTS
.
#!/bin/bash -l #SBATCH --job-name=Spark.cluster #SBATCH --nodes=2 #SBATCH --ntasks-per-node=40 #SBATCH --mem=1000G #SBATCH --time=2:00:00 #SBATCH --partition=dumbo # Change to work dir cd $SLURM_SUBMIT_DIR # Load modules module load my_necessary_modules # Load modules module load Hadoop-cluster/1.0 # Start hadoop cluser with spark support hadoop-cluster start --spark # Submit Spark job # # spark.executor.instances - total number of executors # spark.executor.cores - number of cores per executor # spark.executor.memory - amount of memory per executor SPARK_OPTS=" --conf spark.driver.memory=4g --conf spark.driver. cores=1 --conf spark.executor.instances=17 --conf spark.executor.cores=5 --conf spark.executor.memory=14g " spark-submit ${SPARK_OPTS} --class org.apache.spark.examples.SparkPi \ $SPARK_HOME/examples/jars/spark-examples_2.11-2.1.1.jar 100 # Stop spark cluster hadoop-cluster stop
The command hadoop-cluster status
shows the status and configuration of running Spark cluster.
Alternatively, you can run Spark in an interactive mode as follows:
Submit an interactive batch job requesting 4 nodes and 40 CPUs per node on dumbo
cluster partition
login03~$ salloc --nodes=4 --ntasks-per-node=40 --mem=2000G --partition=dumbo
Once your interactive shell is ready, load the Hadoop-cluster
module, next start the Spark cluster and then run Python Spark Shell application
dumbo-n011~$ module load Hadoop-cluster/1.0 dumbo-n011~$ hadoop-cluster start --spark dumbo-n011~$ pyspark --master yarn --deploy-mode client
The command hadoop-cluster stop
shuts the Spark cluster down.
On start-up and also with the command hadoop-cluster status
, Hadoop shows where to access the web management pages of your virtual cluster. It will look like this:
========================================================================== = Web interfaces to Hadoop cluster are available at: = = HDFS (NameNode) http://dumbo-n0XX.css.lan:50070 = = YARN (Resource Manager) http://dumbo-n0XX.css.lan:8088 = = NOTE: your web browser must have proxy settings to access the servers = Please consult the cluster handbook, section "Hadoop/Spark" = ==========================================================================
When you put this into your browser without preparation, you will most likely get an error, since “css.lan” is a purely local “domain”, which does not exist in the world outside the LUIS cluster.
In order to access pages in this scope, you will need to setup both a browser proxy that recognizes special addresses pointing to “css.lan” and an ssh tunnel the proxy can refer to.
This is how you do it on a Linux machine running Firefox:
ssh -o ConnectTimeout=20 -C2qTnNf -D 8080 <username>@login.cluster.uni-hannover.de
ps -C ssh -o args | grep ConnectTimeout>& /dev/null || ssh -o ConnectTimeout=20 -C2qTnNf -D 8080 <username>@login.cluster.uni-hannover.de