Conda is a package management system which was initially created for Python, but currently also supports several other languages such as R or lua. Conda can be used to quickly find, install and update packages and their dependencies using community-maintained remote repositories (also called channels). Software packages and scientific libraries are installed in “environments” to provide the ability to maintain different, often incompatible, sets of software. Environments are also managed by Conda.
Please note: compared to the software modules that we provide on the cluster, there are much more and newer libraries available via Conda that you can manage yourself, but they may not be as well optimized for the processor architectures on the cluster. In addition, by default, Conda installs packages in user's home directory ($HOME) which has a quota for the size and number of files. You should take care to re-set the Conda installation directory to a subdirectory of your group's $SOFTWARE directory. Otherwise, you'll probably quite quickly run into your quota in $HOME, with inconvenient consequences (like not being able to use graphical logins any more before you remove the extraneous files, possibly erroneously also deleting files that should remain).
Please note: it is not recommended to mix cluster software modules and Conda-managed software in the same work environment.
In this section we explain how to use Conda in the cluster.
In the cluster, Conda is available through the Miniconda installation, which is a small version of Anaconda and only includes conda package manager, Python and a small number of other packages.
In order to create and use Conda environments on the cluster, you first need to load the Miniconda3 module:
module load Miniconda3
If you want to manage the Python 2 based conda environments, load the module Miniconda2 instead. Miniconda3 is based on Python 3.
Please note: Since loading the Miniconda module will also initialize conda for shell interaction, you do not need to additionally run the
conda init <shell> command modifying your shell init file, e.g.
$HOME/.bashrc, which should be avoided, as this may cause issues with your interactive or batch jobs. If you ran the command in the past, you may have conda entries in your shell init file. Please edit the file and remove the lines between
# >>> conda initialize >>> and
# <<< conda initialize <<<.
If you use Conda-installed packages, the Miniconda3 module should be the only module you load in your work environment or in your job.
The following will create a conda environment named
myconda and install the packages
numpy into it:
conda create -n myconda python numpy
Note that if you have loaded the Miniconda3 module, Python 3.x will be installed, unless you explicitly specify the python version above. By default, the Miniconda2 module installs Python 2.x.
Your conda environments by default are located in your home directory under the
$HOME/.conda/envs path. We recommend to change this location. Either use the
--prefix flag specifying the path to the directory you want (probably
$SOFTWARE/<nameofmycondadir>), or assign the path to the environment variable
Another way is to edit your
$HOME/.condarc file and define your conda environment locations using the key
For more details about environments see the Conda documentation.
Packages that can be installed using conda are provided with channels. Popular channels are Conda Forge and Bioconda, which are set by default on the cluster. If you want to install packages from a channel which is not among the default channels, you can specity it using the
--channel <your-channel> (or shortly
-n <your-channel>) flag during package installation, or add it permanently into your
$HOME/.condarc file under the
See the Conda documentation for more information about managing channels.
The default conda settings are defined in the cluster-wide configuration file
auto_activate_base: false envs_dirs: - $HOME/.conda/envs pkgs_dirs: - $HOME/.conda/pkgs channels: - conda-forge - bioconda - defaults
The file also defines the default location of conda's package cache directory (key
$CONDA_PKGS_DIRS environment variable overwrites the
If you want to remove packages that are not used in any environment, run:
conda clean --all
conda info to see your current conda settings, including the default channel URLs and the location of your conda environments.
The command lists all your environments:
conda info --envs
The active environment is marked with an asterisk (*).
To search for a package in configured channels, use the command:
conda search <package-name>
If you want to install additional packages in an existing environment, e.g. in
myconda, it must first be activated:
conda activate myconda
Once in the conda environment, which may be recognized by the presence of the environment name
(myconda)$ on the command prompt, you can install packages.
For example, the following installs
matplotlib and a specific version of
(myconda)$ conda install scipy==1.6.3 pandas matplotlib
Note that package installation has to be done on a cluster login node, as the compute nodes do not have access to the public network and thus to the conda channels.
Should you need to see all packages installed in the environment
myenv then run:
conda list -n myenv
-n option, packages in the current active environment are listed.
The default package dependency solver of Conda is known to be slow or even fail to resolve some environments.
If this is your case, you may try an alternative, faster solver for Conda using the
--solver flag as follows:
conda create -n myconda --solver=libmamba package1 package2 ...
--solver is also available for
conda install|remove|update commands.
If you want to enable the
libmamba solver permanently,
solver:libmamba to your
$HOME/.condarc file or run the command:
conda config --set solver libmamba
To revert back to the default
conda config --remove-key solver
More on the
libmamba solver can be read here.
To use applications from your conda environments interactively or in a job script, you first need to load the Miniconda module and then activate the environment containing the applications:
module load Miniconda3 conda activate myenv
Note that we do not recommend putting the above lines in your shell init file, e.g.
~/.bashrc. This may cause issus with your interactive or batch jobs.
Here is a sample job script to run an application from the conda environment:
#!/bin/bash -l #SBATCH --job-name=my-conda-application #SBATCH --nodes=1 #SBATCH --cpus-per-task=20 #SBATCH --mem=60G #SBATCH --time=00:30:00 #SBATCH --firstname.lastname@example.org #SBATCH --mail-type=END # Activate your conda environment module load Miniconda3 conda activate <your_conda_env_name> # Run app <run your application>
As already mentioned, if you use conda managed software, you should not mix it with the cluster software modules, thus loading only the Miniconda module in your interactive shell or in a job script.