===== Conda ===== 
----

<WRAP important>
**<fc #ff0000>Important Notice</fc>:**
We ask you to avoid using the ''Miniconda3'' modules, as the ''defaults'' channel appears to be no longer freely available under Anaconda’s updated licensing terms. Please switch to the ''Miniforge3'' module, which provides access to free and open-source channels like ''conda-forge''. Please update your environments accordingly to ensure unrestricted access to packages.
Additionally, please check your ''~/.condarc'' file and remove the ''defaults'' channel if it is still in use. 
You can also do this by running: ''conda config %%--%%env %%--%%remove channels defaults''
</WRAP>

Conda is a package management system which was initially created for Python, but currently also supports several other languages such as R or lua. 
Conda can be used to quickly find, install and update packages and their dependencies using community-maintained remote 
repositories (also called channels). Software packages and  scientific libraries are installed in "environments" to provide the ability to maintain different, often incompatible, sets of software. Environments are also managed by Conda. 

<typo fc:#FF0000>Please note:</typo> Compared to the software modules that we provide on the cluster, there are much more and newer libraries available via Conda that you can manage yourself, but they may not be as well optimized for the processor architectures on the cluster. In addition, by default, Conda installs packages in a user's home directory [[(:guide:storage_systems#home_-_configuration_files_and_setups_don_t_take_work_home_though|$HOME)]], which has a quota for the size and number of files. You should take care to re-set the Conda installation directory to a subdirectory of your group's [[:guide:storage_systems#software_-_install_software_for_your_group_here|$SOFTWARE]] directory. Otherwise, you'll probably quite quickly run into your quota in $HOME, with inconvenient consequences (like not being able to use graphical logins any more before you remove the extraneous files, possibly erroneously also deleting files that should remain).   

<typo fc:#FF0000>Please note:</typo> It is not recommended to mix cluster software modules and Conda-managed software in the same work environment. 


<typo fc:#FF0000>Please note:</typo> Pre-configured conda environments are available on the cluster, providing a range of packages commonly used across scientific fields. Please refer to the [[guide:soft:miniforge3#pre-configured_conda_environments|Pre-Configured Conda Environments]] section for details on accessing and using these resources.

In this section we explain how to use Conda in the cluster.

==== Conda usage on the cluster ====
On the cluster, Conda is available through the [[https://github.com/conda-forge/miniforge|Miniforge]] installation,
which provides default access to the free ''conda-forge'' channel. This installation includes only the Conda package manager, 
Python, and a minimal set of additional packages. The Miniforge installer also includes the //mamba// library, 
which offers fast package dependency resolution and speeds up package downloads by 
using parallel processing through multi-threading. For more details, see below.


==== Creating a Conda environment ===
In order to create and use Conda environments on the cluster, you first need to load the Miniforge3 module:
<code>
module load Miniforge3
</code>

<typo fc:#FF0000>Please note:</typo> Since loading the Miniforge module will also initialize conda for shell interaction, you do not need to additionally run the ''conda init <shell>'' command modifying your shell init file, e.g. ''$HOME/.bashrc''. In fact, you should avoid doing so, as it may cause issues with your interactive or batch jobs. If you ran the command in the past, you may have conda entries in your shell init file. Please edit the file and remove the lines between ''%%# >>> conda initialize >>>%%'' and ''%%# <<< conda initialize <<<%%''.

If you use Conda-installed packages, the Miniforge3 module should be the only module you load in your work environment or in your job.

The following will create a conda environment named ''myenv'' and install the packages ''python'' and ''numpy'' into it:
<code>
conda create -n myenv python numpy
</code>
Note that if you have loaded the ''Miniforge3/23.11.0-0'' module, Python 3.10.x will be installed, unless you explicitly specify the python version above.
For example, to create a Python 3.8 environment, use the following command:
<code>
conda create -n myenvpy38 python=3.8
</code>

Your conda environments by default are located in your home directory under the ''$HOME/.conda/envs'' path. We recommend to change this location. Either use the ''%%--%%prefix'' flag specifying the path to the directory you want (probably ''$SOFTWARE/<nameofmycondadir>''), or assign the path to the environment variable ''$CONDA_ENVS_PATH''.
Another way is to edit your ''$HOME/.condarc'' file and define your conda environment locations using the key ''envs_dirs:''.

Packages that can be installed using conda are provided in channels. Popular channels are [[https://conda-forge.org/|Conda Forge]] and [[https://bioconda.github.io|Bioconda]], which are set by default on the cluster. If you want to install packages from a channel which is not among the default channels, you can specity it using the ''%%--%%channel <your-channel>'' (or shortly ''-n <your-channel>'') flag during package installation. Alternatively, you can add it permanently to your ''$HOME/.condarc'' file under the ''channels:'' key. This can also be done by running the command: ''conda config --add channels <your-channel>''.

The default conda settings are defined in the cluster-wide configuration file ''$EBROOTMINIFORGE3/.condarc''
<code>
auto_activate_base: false

envs_dirs:
  - $HOME/.conda/envs

pkgs_dirs:
  - $HOME/.conda/pkgs

channels:
  - conda-forge
  - bioconda
</code>

The file also defines the default location of conda's package cache directory (key ''pkgs_dirs:''). 
The ''$CONDA_PKGS_DIRS'' environment variable overwrites the ''pkgs_dirs:'' setting.
The command ''conda config --show-sources'' displays all identified configuration sources.

If you want to remove packages that are not used in any environment, run:
<code>
conda clean --all
</code>

Execute ''conda info'' to see your current conda settings, including the default channel URLs and the location of your conda environments.

The command lists all your environments:
<code>
conda info --envs
</code>
The active conda environment is marked with an asterisk (*).

To search for a package in configured channels, use the command:
<code>
conda search <package-name>
</code>

If you want to install additional packages in an existing environment, e.g. in ''myenv'', it must first be activated:
<code>
conda activate myenv
</code>

Once a conda environment is activated, which may be recognized by the presence of the environment name ''(myenv)$'' on the command prompt, you can install additional software using either Conda or Pip. For example, the following installs ''pandas'', ''matplotlib'' and a specific version of ''scipy'':
<code>
(myenv)$ conda install scipy=1.6.3 pandas matplotlib
</code>

**Note** that it is strongly advised not to use conda for package installations after having installed packages via pip, as this may cause inconsistencies in the environment. For best practices, install packages using conda first, and reserve pip for any final installations.

**Note** that package installation may be done on both the cluster login servers as well as on the compute nodes, at least when using the default conda channels.
 
Should you need to see all packages installed in the environment ''myenv'' then run:
<code>
conda list -n myenv
</code>
Without the ''-n'' option, packages in the current active environment are listed.

=== A faster solver for Conda ===
The default package dependency solver of Conda is known to be slow or even fail to resolve some environments. 
If this is your case, you may try an alternative, faster solver for Conda using the ''%%--%%solver'' flag as follows:
<code>
conda create -n myconda --solver=libmamba package1 package2 ...
</code>

The option ''%%--%%solver'' is also available for ''conda install|remove|update'' commands.
If you want to enable the ''libmamba'' solver permanently,
either add ''solver:libmamba'' to your ''$HOME/.condarc'' file or run the command:
<code>
conda config --set solver libmamba
</code>

To revert back to the default ''classic'' solver:
<code>
conda config --remove-key solver
</code>
 
==== Using your Conda environments ====
To use applications from your conda environments interactively or in a job script, you first need to load the ''Miniforge3'' module and 
then activate the environment containing the applications:
<code>
module load Miniforge3
conda activate myenv
</code>
**Note** that we do **not** recommend putting the above lines in your shell init file, e.g. ''~/.bashrc''. This may cause issues with your interactive or batch jobs.

Here is a sample job script to run an application from the conda environment:
<file bash conda-app-job.sh>
#!/bin/bash -l
#SBATCH --job-name=my-conda-application
#SBATCH --nodes=1 
#SBATCH --cpus-per-task=20
#SBATCH --mem=60G
#SBATCH --time=00:30:00
#SBATCH --mail-user=user@yourinstitute.uni-hannover.de
#SBATCH --mail-type=END

# Activate your conda environment
module load Miniforge3
conda activate <your_conda_env_name>

# Run app
<run your application>
</file>
As already mentioned, if you use conda managed software, you should not mix it with the cluster software modules, thus loading only the ''Miniforge3'' module in your interactive shell or in a job script.

==== Pre-Configured Conda Environments ====
To streamline workflows and minimize redundant installations, several pre-configured conda environments are available to all users on the cluster. These shared environments include a broad selection of packages commonly used across various scientific disciplines, such as data analysis, machine learning, bioinformatics, and more. 

Using these environments helps conserve resources and ensure consistent setups, allowing you to quickly get started without needing to install the same packages individually.

Below is a list of environments and some primary packages they contain:

^ Path to Conda Environment ^ Key Packages ^
|''/sw/system/home/ood/conda/envs/miniforge3-2024.08_python-3.12'' | numpy, scipy, pandas, scikit-learn, scikit-bio, biopython, sqlalchemy, pymol, matplotlib, seaborn, pytorch, tensorflow, keras |
|''/sw/system/home/ood/conda/envs/miniforge3-2023.08_python-3.10'' | numpy, scipy, pandas, scikit-learn, matplotlib, seaborn, pytorch, tensorflow, keras, qiime2 |

For a complete list of packages within each environment, use the command: ''conda list -p <path to conda environment>''. If you believe that a shared environment is missing packages that would be beneficial for a reasonably broad user community in the cluster, feel free to reach out to the cluster support team. 

For each of the environments listed above, you can also access the corresponding GPU version by simply appending ''_gpu'' to the environment path.

To activate a shared environment, use the following command:
<code>
module load Miniforge3
conda activate <path to conda environment>
</code>

To switch to the GPU-optimized version, use:
<code>
module load Miniforge3
conda activate <path to conda environment>_gpu
</code>

In addition to the command line, these environments are also available through the JupyterLab service provided by the cluster web portal. This allows users to run Jupyter notebooks with the desired conda environments directly from the web interface. For more details on how to access and use the JupyterLab service, please refer to
[[:guide:soft:jupyterlab|the cluster web portal documentation]].