User Tools

Site Tools


guide:modules_and_application_software

Modules & Application Software


The number of packages, i.e. software, which is installed with the operating system of cluster nodes is kept light on purpose. Additional packages and applications are provided by a module system which enables you to easily customise your working environment on the cluster system. This module system is called Lmod1) . Furthermore we can provide different versions of software which you can use on demand. Loading a module, software specific settings are applied, e.g. changing environment variables PATH, LD_LIBRARY_PATH and MANPATH.

We have adopted a systematic software naming and versioning convention in conjunction with the software installation system EasyBuild 2) .

Software installation on the cluster utilizes a hierarchical software module naming scheme. This means that the command module avail does not display all installed software modules right away. Instead the modules that are immediately available to load are displayed only. Upon loading some modules, more modules may become available. Specifically, loading a compiler module or MPI implementation module will make available all the software built with those applications. This way, he hope prerequisites for certain software become apparent.

At the top level of module hierarchy there are modules for compilers, toolchains and software applications that come as a binary and thus do not depend on compilers. Toolchain modules organize compilers, MPI implementations and numerical libraries. Currently the following toolchain modules are available:

  • Compiler only toolchains
    • GCC: GCC updated binutils
    • iccifort: Intel compilers GCC
  • Compiler MPI toolchains
    • gompi: GCC OpenMPI
    • iimpi: Intel compilers Intel MPI
    • iompi: Intel compilers OpenMPI
  • Compiler MPI numerical libraries toolchains
    • foss: gompi OpenBLAS FFTW ScaLAPACK
    • intel: iimpi Intel MKL
    • iomkl: iompi Intel MKL

Working with modules

This section explains how to use software modules.

List the entire list of possible modules

 module spider

The same but more compact output

 module -t spider

Search for specific modules that have “string” in their name

 module spider string

Detailed information about particular version of the module (including instructions on how to load the module)

 module spider name/version

List modules immediately available to load

 module avail

Some software modules are hidden from the avail and spider commands. These are mostly the modules for system library packages which other directly used user applications depend on. To list hidden modules you should provide the –show-hidden option to the avail and spider commands:

 module --show-hidden avail
 module --show-hidden spider

A hidden module has a dot (.) in font of its version (eg. zlib/.1.2.8).

List currently loaded modules

 module list

Load a specific version of a module

 module load name/version

If only name is specified, the command will load the default version marked with a (D) in the module avail listing (usually the latest version). Loading a module may automatically load other modules it depends on.

It is not possible to load two versions of the same module at the same time.

To switch between different modules

 module swap old new

To unload the specified module from the current environment

 module unload name

To clean your environment of all loaded modules

 module purge

Show what environment variables the module will set

 module show name/version

Save the current list of modules to “name” collection for later use

 module save name

Restore modules from collection “name”

 module restore name

List of saved collections

 module savelist

To get the complete list of options provided by Lmod through the command module type the following

 module help

Exercise: Working with modules

As an example of working with the Lmod modules, here we show how to load the gnuplot module.

List loaded modules

 module list

No modules loaded

Find available gnuplot versions

 module -t spider gnuplot

gnuplot/4.6.0
gnuplot/5.0.3

Determine how to load the selected gnuplot/5.0.3 module

 module spider gnuplot/5.0.3

--------------------------------------------------------------------------------
  gnuplot: gnuplot/5.0.3
--------------------------------------------------------------------------------
    Description:
      Portable interactive, function plotting utility - Homepage: http://gnuplot.sourceforge.net/

    This module can only be loaded through the following modules:

      GCC/4.9.3-2.25  OpenMPI/1.10.2

    Help:
      Portable interactive, function plotting utility - Homepage: http://gnuplot.sourceforge.net/

Load required modules

 module load GCC/4.9.3-2.25  OpenMPI/1.10.2

Module for GCCcore, version .4.9.3 loaded
Module for binutils, version .2.25 loaded
Module for GCC, version 4.9.3-2.25 loaded
Module for numactl, version .2.0.11 loaded
Module for hwloc, version .1.11.2 loaded
Module for OpenMPI, version 1.10.2 loaded

And finally load the selected gnuplot module

 module load gnuplot/5.0.3

Module for OpenBLAS, version 0.2.15-LAPACK-3.6.0 loaded
Module for FFTW, version 3.3.4 loaded
Module for ScaLAPACK, version 2.0.2-OpenBLAS-0.2.15-LAPACK-3.6.0 loaded
Module for bzip2, version .1.0.6 loaded
Module for zlib, version .1.2.8 loaded
.............
.............

In order to simplify a procedure of the gnuplot module loading, the current list of loaded modules can be saved in “mygnuplot” collection (the name string “mygnuplot” is of course arbitrary) and then loaded again when needed as follows

Save loaded modules to “mygnuplot”

 module save mygnuplot

Saved current collection of modules to: mygnuplot

If “mygnuplot” not is specified, the name “default” will be used.

Remove all loaded modules (or open a new shell)

 module purge

Module for gnuplot, version 5.0.3 unloaded
Module for Qt, version 4.8.7 unloaded
Module for libXt, version .1.1.5 unloaded
............
............

List currently loaded modules. This selection is empty now.

 module list

No modules loaded

List saved collections

 module savelist

Named collection list:
  1) mygnuplot

Load gnuplot module again

 module restore mygnuplot

Restoring modules to user's mygnuplot
Module for GCCcore, version .4.9.3 loaded
Module for binutils, version .2.25 loaded
Module for GCC, version 4.9.3-2.25 loaded
Module for numactl, version .2.0.11 loaded
.............
.............

List of available software


In this section you will find user guides for some of the software packages installed in the cluster. The provided guides can not replace documentation that comes with the application you are using, please read that as well.

A wide variety of application software is available in the cluster system. These applications are located on a storage system and are available through an NFS export via Module System - Lmod. Issue the command module spider on the cluster system or visit the page for a comprehensive list of available software. If you need a different version of an already installed application, or one that is currently not installed, please get in touch. The main prerequisite for use within the cluster system is availability for Linux. Furthermore, if the application needs a license, we need to have a look at additional questions.

Some selected Windows applications can also be executed on the cluster system with the help of Wine or Singularity containers. For information on Singularity see section or contact us for more information.

A current list of available software

Usage instructions

Build software from source code

Sub-clusters of the cluster system have different CPU architectures. The command lcpuarchs executed on the login nodes lists all available CPU types.

login03:~$ lcpuarchs -v
CPU arch names       Cluster partitions
--------------       ------------------
haswell              fi,haku,iqo,isd,iwes,lena,pci,smp
nehalem              nwf,smp,tane,taurus,tfd
sandybridge          bmwz,iazd,isu,itp

CPU of this machine: haswell

For more verbose output type: lcpuarchs -vv

Therefore, if a software executable built using the target specific compiler options runs on a machine with not suitable CPUs, then the “Illegal instruction” error message may be triggered. For example, if you compile your program on haswell node (eg. lena sub-cluster) with the gcc compiler option -march=native, then the program may not run on nehalem nodes (eg. tane sub-cluster).

This section explains how to build a software on the cluster system to avoid the mentioned issue and be able to submit jobs to all compute nodes without specifying the CPU type.

In the example below we want to compile the sample software my-soft of version 3.1.

In your HOME (or BIGWORK) directory create build/install directories for each available CPU architecture listed by the command lcpuarchs -s as well as the directory source for storing software installation sources

 login03:~$ mkdir -p $HOME/sw/source
 login03:~$ eval "mkdir -p $HOME/sw/{$(lcpuarchs -ss)}/mysof/3.1/{build,install}"

Copy software installation archive to the source directory

 login03:~$ mv my-soft-3.1.tar.gz $HOME/sw/source

Build my-soft for each available CPU architecture by submitting an interactive job to compute node with the proper CPU type. For example, to compile my-soft for skylake nodes, first submit an interactive job requesting the skylake compute node

 login03:~$ qsub -I -l nodes=1:skylake:ppn=4,walltime=6:00:00,mem=16gb

Then unpack and build the software. Note below the environment variable $ARCH storing the CPU type of reserved compute node.

 taurus-n034:~$ tar -zxvf $HOME/sw/source/my-soft-3.1.tgz -C $HOME/sw/$ARCH/my-soft/3.1/build
 taurus-n034:~$ cd $HOME/sw/$ARCH/my-soft/3.1/build
 taurus-n034:~$ ./configure --prefix=$HOME/sw/$ARCH/my-soft/3.1/install && make && make install

Finally, use the environment variable $ARCH in your job scripts to access the right installation path of my-soft executable for the current compute node. Note that you may need to set/update the LD_LIBRARY_PATH environment variable to point to the location of your software's shared libraries.

my-soft-job.sh
#!/bin/bash -l
#PBS -N my-soft
#PBS -M myemail@uni-hannover.de
#PBS -j oe
#PBS -l nodes=4:ppn=8,walltime=12:0:0,mem=96gb
 
# change to work dir
cd $PBS_O_WORKDIR
 
# run my_soft
export LD_LIBRARY_PATH=$HOME/sw/$ARCH/my-soft/3.1/install/lib:$LD_LIBRARY_PATH
$HOME/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe --input file.input

You can cartainly consider combining the software build and execution steps into a single batch job script. However, it is recommended that you first perform the build steps interactively before adding them to the job script to ensure the software compiles without errors. For example, such a job script might look like this:

my-soft-job.sh
#!/bin/bash -l
#PBS -N my-soft
#PBS -M myemail@uni-hannover.de
#PBS -j oe
#PBS -l nodes=4:ppn=8,walltime=12:0:0,mem=96gb
 
# install software if the executable does not exist
[ -e "$HOME/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe" ] || {
  mkdir -p $HOME/sw/$ARCH/mysof/3.1/{build,install}
  tar -zxvf $HOME/sw/source/my-soft-3.1.tgz -C $HOME/sw/$ARCH/my-soft/3.1/build
  cd $HOME/sw/$ARCH/my-soft/3.1/build
  ./configure --prefix=$HOME/sw/$ARCH/my-soft/3.1/install
  make
  make install
}
 
# change to work dir
cd $PBS_O_WORKDIR
 
# run my_soft
export LD_LIBRARY_PATH=$HOME/sw/$ARCH/my-soft/3.1/install/lib:$LD_LIBRARY_PATH
$HOME/sw/$ARCH/my-soft/3.1/install/bin/my-soft.exe --input file.input

EasyBuild

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild framework

The EasyBuild framework is available in the cluster through the module EasyBuild-custom. This module defines the location of the EasyBuild configuration files, recipes and installation directories. You can load EasyBuild module using the command:

 module load EasyBuild-custom

EasyBuild software and modules will be installed by default under the following directory:

 $BIGWORK/my.soft/software/$ARCH
 $BIGWORK/my.soft/modules/$ARCH

Here the variable ARCH will be either haswell, broadwell or sandybridge. The command lcpuarchs executed on the cluster login nodes lists all currently available values of ARCH. You can override the default software and module installation directory, and the location of your EasyBuild configuration files (MY_EASYBUILD_REPOSITORY) by exporting the environment variables listed below, before loading the EasyBuild modulefile:

 export EASYBUILD_INSTALLPATH=/your/preferred/installation/dir
 export MY_EASYBUILD_REPOSITORY=/your/easybuild/repository/dir
 module load EasyBuild-custom

How to build your software

After you load the EasyBuild environment as explained in the section above, you will have the command eb available to build your code using EasyBuild. If you want to build the code using a given configuration <filename>.eb and resolving dependencies, you will use the flag -r as in the example below:

 eb <filename>.eb -r

The build command just needs the configuration file name with the extension .eb and not the full path, provided that the configuration file is in your search path: the command eb –show-config will print the variable robot-paths that holds the search path. More options are available, please have a look at the short help message typing eb -h. For instance, you can check if any EasyBuild configuration file already exists for a given program name, using the search flag -S:

 eb -S <program_name>

You will be able to load the modules created by EasyBuild in the directory defined by
the EASYBUILD_INSTALLPATH variable using the following commands:

 module use $EASYBUILD_INSTALLPATH/modules/$ARCH/all
 module load <modulename>/version

The command module use will prepend the selected directory to your MODULEPATH environment variable, therefore the command module avail will show modules of your software as well. Note that by default the variable EASYBUILD_INSTALLPATH is set to the directory within your $BIGWORK. However, by default the $BIGWORK is not readable by other users. Therefore if you want ot make your software available to another cluster user with username user_name, you have to make your software installation path readable for the user as follows

 setfacl -m u:user_name:x $BIGWORK
 setfacl -R -m u:user_name:rx $BIGWORK/my.soft

Further Reading

Singularity containers

Please note: These instructions were written for Singularity 2.4.2 and 2.6.1.

“Singularity enables users to have full control of their environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data.”3)

Singularity overview

Singularity enables users to execute containers on High-Performance Computing (HPC) cluster like they are native programs or scripts on a host computer. For example, if the cluster system is running Scientific Linux, but your application runs in Ubuntu, you can create an Ubuntu container image, install your application into that image, copy the image to the cluster system and run your application using Singularity in its native Ubuntu environment.

One of the main benefits of Singularity is that containers are executed as a non-privileged user on the cluster system and can have access to network file systems like HOME, BIGWORK and PROJECT.

Additionally, Singularity properly integrates with the Message Passing Interface (MPI), and utilizes communication fabrics such as InfiniBand and Intel Omni-Path Architecture.

Singularity containers on the cluster

If you want to create a new container and set up a new environment for your jobs, we recommend that you start by reading the Singularity documentation. The basic steps to get started are detailed below.

Install Singularity on your local computer

Before you can create your image you need to install singularity on a system where you have administrative rights. The superuser rights are also necessary to bootstrap and modify your own container image. To install Singularity on your personal machine do the following:

 VERSION=2.4.2
 wget
 https://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.tar.gz
 tar xvf singularity-$VERSION.tar.gz
 cd singularity-$VERSION/
 ./configure --prefix=/usr/local
 make
 sudo make install

To verify Singularity is installed, run

 singularity

For detailed instructions on Singularity installation, refer to the online documentation.

Create Singularity container using recipe file

For a reproducible container, the recommended practice is to create containers using a Singularity recipe file. Note that you can not bootstrap or modify your containers on the cluster system, you can only run them. The recipe below builds a CentOS 7 container.

Create the following recipe file called centos7.def.

centos7.def
BootStrap: yum
OSVersion: 7
MirrorURL:
http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/
Include: yum wget
 
%setup
  echo "This section runs on the host outside the container during bootstrap"
 
%post
  echo "This section runs inside the container during bootstrap"
 
  # install packages in the container
  yum -y groupinstall "Development Tools"
  yum -y install vim python epel-release
  yum -y install python-pip
 
  # install tensorflow
  pip install --upgrade pip
  pip install --upgrade tensorflow
 
  # enable access to BIGWORK and PROJECT storage on the cluster system
  mkdir -p /bigwork /project
 
%runscript
  echo "This is what happens when you run the container"
 
  echo "Arguments received: $*"
  exec /usr/bin/python "$@"
 
%test
  echo "This test will be run at the very end of the bootstrapping process"
 
  python --version

This recipe file uses the YUM bootstrap module to bootstrap the core operation system, CentOS 7, within the container. For other bootstrap modules (e.g.. docker) and details on singularity recipe files, refer to the online documentation.

Next, build a container. This has to be executed on a machine where you have root privileges.

 sudo singularity build centavo.img centos7.def

Shell into the newly built container.

 singularity shell centos7.img

By default Singularity containers are build as read-only squashfs image files. If you need to modify your container, e.g.. install additional software, you need to convert the squashfs container to a writeable sandbox one first.

 sudo singularity build --sandbox centos7-sandbox centos7.img

This creates a sandbox directory called centos7-sandbox which you can then shell into and make changes.

 sudo singularity shell --writable centos7-sandbox

However, to keep the container reproducible, it is strongly recommended to perform all required changes via the recipe file.

To see other singularity command-line options, issue the following command.

 singularity help

Create Singularity container using Docker or Singularity Hub

Another easy way to obtain and use a container is to pull it directly from Docker Hub or Singularity Hub image repositories. For further details refer to singularity online documentation.

Upload the container image to your BIGWORK directory at the cluster system

Linux users can transfer the container with the following command. Feel free to use other methods of your choice.

 scp centos7.img username@transfer.cluster.uni-hannever.de:/bigwork/username

Running a container image

Please note: In order for you to be able to run your container, it must be located in your BIGWORK directory.

Login on to the cluster system

 ssh username@login.cluster.uni-hannever.de

Load Singularity module and run your container image.

 username@login01:~$ cd $BIGWORK
 username@login01:~$ module load GCC/4.9.3-2.25 Singularity/2.4.2
 username@login01:~$ singularity run centos7.img --version

The run singularity sub-command will carry out all instructions in the %runscript section of the container recipe file. Use the singularity sub-command exec to run any command from inside the container. For example, to get the content of file /etc/os-release inside the container, issue the command.

 username@login01:~$ singularity exec centos7.img cat /etc/os-release

Please note: You can access (read & write) your HOME, BIGWORK and PROJECT (only login nodes) storage from inside your container. In addition, /scratch (only on work nodes) and /tmp directories of a host machine are automaticaly mounted in a container.

Singularity & parallel MPI applications

In order to containerize your parallel MPI application and run it properly on the cluster system you have to provide MPI library stack inside your container. In addition, userspace driver for Mellanox InfiniBand HCAs should be installed in the container to utilize cluster InfiniBand fabric as a MPI transport layer.

This example Singularity recipe file ubuntu-openmpi.def retrieves an Ubuntu container from Docker Hub and installes required MPI and InfiniBand packages.

ubuntu-openmpi.def
BootStrap: docker
From: ubuntu:xenial
 
%post
# install openmpi & infiniband
apt-get update
apt-get -y install openmpi-bin openmpi-common libibverbs1 libmlx4-1
 
# enable access to BIGWORK and PROJECT storage on the cluster system
mkdir -p /bigwork /project
 
# enable access to /scratch dir. required by mpi jobs
mkdir -p /scratch

Once you have built an image file ubuntu-openmpi.img and transferred it over to the cluster system, as explained in the previous section, your MPI application can be run as follows (assuming you have already reserved a number of cluster work nodes).

module load foss/2016a
module load GCC/4.9.3-2.25 Singularity/2.4.2
mpirun singularity exec ubuntu-openmpi.img /path/to/your/parallel-mpi-app

The lines above can be entered on the command line of an interactive session as well as inserted into a batch job script.

Further Reading

Hadoop/Spark

The Apache Hadoop is a framework that allows for the distributed processing of large data sets. The Hadoop-cluster module helps to launch Hadoop or Spark cluster within the cluster system and manage them by the cluster batch job scheduler.

Hadoop - setup and running

To run your Hadoop applications on the cluster system you should perform the following steps. The first step is to allocate some number of cluster work machines interactively or in a batch job script. Then, Hadoop cluster has to be started on the allocated nodes. Next, when the Hadoop cluster is up and running, you can launch you Hadoop applications to the cluster. Once your applications are finished, the Hadoop cluster has to be shut down (job termination automatically stops the running Hadoop cluster).

The following example runs the simple word-count MapReduce java application on the Hadoop cluster. The script requests 6 nodes, totally allocating $6\times40$ CPUs to the Hadoop cluster for 30 minutes. The Hadoop cluster with the persistent HDFS storage located on your $BIGWORK/hadoop-cluster/hdfs directory is started by the command hadoop-cluster start -p. After completion of the word-count Hadoop application, the command hadoop-cluster stop shuts the cluster down

We recommend you to run your hadoop jobs on dumbo cluster partition. The dumbo cluster nodes provide large ( 21TB on each node) local disk storage.

test-hadoop-job.sh
#!/bin/bash -l
#PBS -N Hadoop.cluster
#PBS -l nodes=6:ppn=40,mem=3000gb,walltime=0:30:00
#PBS -W x=PARTITION:dumbo
 
# Change to work dir
cd $PBS_O_WORKDIR
 
# Load the cluster management module
module load Hadoop-cluster/1.0
 
# Start Hadoop cluster
#  Cluster storage is located on local disks of reserved nodes.
#  The storage is not persistent (removed after Hadoop termination)
hadoop-cluster start
 
# Report filesystem info&stats
hdfs dfsadmin -report
 
# Start the word count app
hadoop fs -mkdir -p /data/wordcount/input
hadoop fs -put -f $HADOOP_HOME/README.txt $HADOOP_HOME/NOTICE.txt /data/wordcount/input
hadoop fs -rm -R -f /data/wordcount/output
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data/wordcount/input/data/wordcount/output
 
hadoop fs -ls -R -h /data/wordcount/output
rm -rf output
hadoop fs -get /data/wordcount/output
 
# Stop hadoop cluster
hadoop-cluster stop

The command hadoop-cluster status shows the status and configuration of running Hadoop cluster. Refer to the example Hadoop job script at $HC_HOME/examples/test-hadoop-job.sh for other possible HDFS storage options. Note that to access the variable $HC_HOME you should have the Hadoop-cluster module loaded. Note also that you can not load the Hadoop-cluster module on cluster login nodes.

Spark - setup and running

Apache Spark is a large-scale data processing engine that performs in-memory computing. Spark has much more advantages over the MapReduce framework as far as the real-time processing of large data sets is concerned. It claims to process data up to 100x faster than Hadoop MapReduce in memory, while 10x faster with disks. Spark offers bindings in Java, Scala, Python and R for building parallel applications.

Because of its high memory and I/O bandwidth requirements, we recommend you to run your spark jobs on dumbo cluster partition.

The batch script below, which asks for 4 nodes and 40 CPUs per node, executes the example java application SparkPi that estimates the constant $\pi$. The command hadoop-cluster start –spark starts Spark cluster on Hadoop’s resource manager YARN, which in turn runs on the allocated cluster nodes. Spark job submition to the running Spark cluster is done by the spark-submit command.

Fine tuning of Spark’s configuration can be done by setting parameters in the variable $SPARK_OPTS.

test-spark-job.sh
#!/bin/bash -l
#PBS -N Spark.cluster
#PBS -l nodes=4:ppn=40,mem=2000gb,walltime=2:00:00
#PBS -W x=PARTITION:dumbo
 
# Load modules
module load Hadoop-cluster/1.0
 
# Start hadoop cluser with spark support
hadoop-cluster start --spark
 
# Submit Spark job
#
#  spark.executor.instances  - total number of executors
#  spark.executor.cores      - number of cores per executor
#  spark.executor.memory     - amount of memory per executor
SPARK_OPTS="
--conf spark.driver.memory=4g
--conf spark.driver. cores=1
--conf spark.executor.instances=17
--conf spark.executor.cores=5
--conf spark.executor.memory=14g
"
 
spark-submit ${SPARK_OPTS} --class org.apache.spark.examples.SparkPi \
             $SPARK_HOME/examples/jars/spark-examples_2.11-2.1.1.jar 100
 
# Stop spark cluster
hadoop-cluster stop

The command hadoop-cluster status shows the status and configuration of running Spark cluster.

Alternatively, you can run Spark in an interactve mode as follows:

Submit an interactive batch job requesting 4 nodes and 40 CPUs per node on dumbo cluster partition

login03~$ qsub -I -W x=PARTITION:dumbo -l node=4:ppn=40,mem=2000gb

Once your interactive shell is ready, load the Hadoop-cluster module, next start the Spark cluster and then run Python Spark Shell application

dumbo-n011~$ module load Hadoop-cluster/1.0
dumbo-n011~$ hadoop-cluster start --spark
dumbo-n011~$ pyspark --master yarn --deploy-mode client

The command hadoop-cluster stop shuts the Spark cluster down.

How to acces the web management pages provided by a Hadoop Cluster

On start-up and also with the command hadoop-cluster status, Hadoop shows where to access the web management pages of your virtual cluster. It will look like this:

==========================================================================
=   Web interfaces to Hadoop cluster are available at:
=
=    HDFS (NameNode)           http://dumbo-n0XX.css.lan:50070
=
=    YARN (Resource Manager)   http://dumbo-n0XX.css.lan:8088
=
=    NOTE: your web browser must have proxy settings to access the servers
=          Please consult the cluster handbook, section "Hadoop/Spark"
=
==========================================================================

When you put this into your browser without preparation, you will most likely get an error, since “css.lan” is a purely local “domain”, which does not exist in the world outside the LUIS cluster.

In order to access pages in this scope, you will need to setup both a browser proxy that recognizes special addresses pointing to “css.lan” and an ssh tunnel the proxy can refer to.

This is how you do it on a Linux machine running Firefox:

  1. Start Firefox and point it to the address https://localhost:8080 You should get an error message saying “Unable to connect”, since your local computer most probably is not set up to run a local web server at port 8080. Continue with step 2
  2. Go to the command line and create an ssh tunnel to a login node of the cluster. Replace <username> with your own username:
ssh -o ConnectTimeout=20 -C2qTnNf -D 8080 <username>@login.cluster.uni-hannover.de
  1. Go to the page you opened in step one and refresh it (click Reload or use Ctrl R or F5, typically). The error message should change into something saying “Secure Connection Failed - could not verify authenticity of the received data”. This actually shows that the proxy is running. Continue with step 4.
  2. Open a new tab in Firefox and enter the URL about:addons. In the search field “Find more extensions” type “FoxyProxy Standard” and install the AddOn.
  3. On the top right of your Firefox, you should see a new icon for FoxyProxy. Click on it and choose “Options”. Then go to “Add”.
    • Under “Proxy Type”, choose SOCKS5
    • Set “Send DNS through SOCKS5 proxy” to on
    • For the Title, we suggest “css.lan”
    • IP address, DNS name and server name point to “localhost”
    • In the “port” field, enter 8080 (this is the port you used above for your ssh proxy tunnel)
    • Click “Save” and choose the Patterns button on the new filter. Add a new white pattern like this: Set “Name” and “Pattern” respectively to css.lan and *.css.lan*. Keep rest of the options default and Click Save.
  4. Once again, click on the FoxyProxy icon and make sure that “Use Enabled Proxies By Patterns and Priority” is enabled.
  5. Congratulations! You should now be ready to directly view the URLs Hadoop tells you from you own workstation.
  6. If you want to facilitate/automate starting the ssh tunnel, you could use the following command line somewhere in your .xsession or .profile. Remember to replace <username> with your actual username:
ps -C ssh -o args | grep ConnectTimeout>& /dev/null || ssh -o ConnectTimeout=20 -C2qTnNf -D 8080 <username>@login.cluster.uni-hannover.de

Further Reading

guide/modules_and_application_software.txt · Last modified: 2021/06/06 17:04 (external edit)