Python: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
Phase 1 § 5 P1.5.5: merge Setting up Python virtualenv + Virtual environment Python 3.4+ + Using conda... into Python. Filled empty Mamba heading with Miniforge, aligned variable to $myNobackup, fixed /bin/bin typo, dropped stale venv content. (via update-page on MediaWiki MCP Server)
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
Python is a high-level, interpreted programming language that has gained widespread popularity for its readability, versatility, and user-friendly syntax. Created by Guido van Rossum and first released in 1991, Python was designed to emphasize code clarity and reduce the complexity often associated with other languages. Its straightforward, English-like syntax makes it a natural choice for beginners, while its power and flexibility continue to attract experienced developers in numerous industries.
Python is a high-level, interpreted programming language popular in scientific computing for its readability and its enormous ecosystem of third-party libraries NumPy, Pandas, scikit-learn, PyTorch, TensorFlow, and many more. This page describes how to use Python on Anunna: the provided modules, how to manage your own packages with virtual environments or Miniforge, and how to expose an environment as a Jupyter kernel.
 
One of Python’s greatest strengths lies in its extensive standard library, which provides built-in modules and functions for tasks ranging from file manipulation to internet protocols. Additionally, a thriving open-source community has developed countless third-party libraries and frameworks, making Python suitable for everything from data analysis and machine learning to web development and automation. Popular libraries like NumPy, Pandas, and TensorFlow enable developers to handle massive datasets, train artificial intelligence models, and build sophisticated applications with relative ease.
 
Anunna offers environment modules for a single version of python for each bucket. On top of the standard environment modules, bundle modules are available for more specific user cases.
 
Users can also install their own Python distributions, like [https://mamba.readthedocs.io/en/latest/ Mamba]. While, the use of Anaconda is discouraged.


== Modules ==
== Modules ==


Each module year bucket comes with a version of Python installed
Anunna provides one Python version per [[Environment Modules | module bucket]], plus bundle modules that carry a curated set of common extensions. Load a bucket, then the Python module:


* '''2023''': Python/3.11.3
* '''2023''': <code>Python/3.11.3</code>
* '''2024''': Python/3.12.3
* '''2024''': <code>Python/3.12.3</code>


Additionally, modules of bundles of python extensions are also installed
<syntaxhighlight lang="bash">
module load 2024
module load Python/3.12.3
</syntaxhighlight>


* Python-bundle-PyPI
The bundle module <code>Python-bundle-PyPI</code> adds many frequently-used packages on top of the base interpreter. Use <code>module key <package></code> to find which bundle contains a package you need (see [[Environment Modules#Searching by keyword | searching modules by keyword]]).


== Mamba ==
For packages not in a module, the two recommended routes are a '''virtual environment''' built on a Python module (below), or '''Miniforge''' for a self-contained conda/mamba setup. The use of Anaconda is discouraged on Anunna — its default channels carry licensing restrictions and the full distribution is heavy; Miniforge is the lighter, unrestricted alternative.


== Creating Custom Virtual Environments from Modules ==
== Virtual environments ==


The HPC team is aware that users may need to build custom python environments for their jobs. These environments can be based off the python environment modules provided by Anunna.
A virtual environment is a self-contained directory holding a specific Python and its packages, so one project's dependencies cannot clash with another's. Python's built-in <code>venv</code> module is the simplest way to make one on top of a Python module.


Python virtual environments are self-contained directories that house a specific Python installation and its associated packages, ensuring that one project’s dependencies don’t clash with another. They isolate your software requirements, letting the user manage different versions of libraries or modules in separate, discrete environments. This prevents version conflicts and keeps your system’s base Python environment clean. Tools like ‘venv’ and ‘virtualenv’ simplify creating and activating these spaces, making it straightforward to switch between multiple projects.
First load the Python version you want:


<syntaxhighlight lang="bash">
module load 2024
module load Python/3.12.3
</syntaxhighlight>


* Load the module of the desired Python version
Then create the environment in a location of your choosing. The example uses <code>$myNobackup/PythonEnv</code> — <code>$myNobackup</code> is your Lustre nobackup location, set in your <code>~/.bash_aliases</code> (see [[Installing Personal Software#Aliases and local variables | Aliases and local variables]]). Keeping environments on Lustre rather than your home directory avoids filling your home quota, and the nobackup tier is the right choice because an environment can always be recreated from scratch and so does not need backing up.
* Create a virtual environment folder
* Activate virtual environment
* Install desired packages with Pip


<syntaxhighlight lang="bash">
python -m venv $myNobackup/PythonEnv/my_env
</syntaxhighlight>


Firstly, load the module of the desired python version. The example that hereby follows makes use of Python-3.12.3 available at the 2024 bucket.
Activate it whenever you want to use it:


<pre>
<syntaxhighlight lang="bash">
module load 2024
source $myNobackup/PythonEnv/my_env/bin/activate
module load Python/3.12.3
</syntaxhighlight>
</pre>


Once the Python module is loaded, the environment can be created. The environment will be stored in a folder at the location of your choosing. It is not necessary to create the folder beforehand, though in this example it is assumed that the location <code>$MYBKP/PythonEnv</code> already exists. Note that <code>$MYBKP</code> refers to the lustre backup location specified at your <code>~/.bash_aliases</code> file (see our entry on [[Aliases_and_local_variables| Aliases and local variables]])
Once active, the environment name appears as a prefix in your prompt:


<pre>
<syntaxhighlight lang="text">
python -m venv $MYBKP/PythonEnv/my_env
(my_env) user001@login200:~$
</pre>
</syntaxhighlight>


Once created, the virtual environment needs to be activated in order to be used or modified.
Install packages with <code>pip</code> while the environment is active; they go into the environment, not your home directory:


<pre>
<syntaxhighlight lang="bash">
source $MYBKP/PythonEnv/my_env/bin/activate
pip install -U numpy pandas matplotlib
</pre>
</syntaxhighlight>


After the environment has been activated, you should see the name of the environment as a suffix of your prompt.  
Leave the environment with <code>deactivate</code>.


<pre>
== Miniforge (conda / mamba) ==
(my_env) user001@login200:$
</pre>


While the virtual environment is active, you can use pip to install modules directly into your environment
[https://github.com/conda-forge/miniforge Miniforge] is a minimal installer that gives you the <code>conda</code> and <code>mamba</code> package managers preconfigured to use the community [https://conda-forge.org/ conda-forge] channel. <code>mamba</code> is a fast drop-in replacement for <code>conda</code>. This is the recommended way to use conda-style environments on Anunna; it avoids Anaconda's licensing restrictions.


<pre>
Download and run the installer, pointing it at a location with room (your Lustre nobackup space, not your home directory):
pip install -U numpy pandas matplotlib datetime
</pre>


The installed modules are stored at $MYBKP/PythonEnv/my_env/lib/python3.12/site-packages/.
<syntaxhighlight lang="bash">
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b -p $myNobackup/miniforge3
</syntaxhighlight>


'''Do not run <code>conda init</code>''' on Anunna — it writes startup code into your <code>~/.bashrc</code> that runs on every login and can interfere with the module system. Instead, activate Miniforge only when you need it:


<syntaxhighlight lang="bash">
source $myNobackup/miniforge3/bin/activate
</syntaxhighlight>


Note
Then create and use environments with <code>mamba</code>:


== Creating Jupyter kernels from virtual environments ==
<syntaxhighlight lang="bash">
mamba create -n myenv python=3.12 numpy pandas
mamba activate myenv
</syntaxhighlight>


It is often the case that users need to have a custom environment in jupyter. This can be facilitated with virtual environments. Assuming we use the virtual environment from the previous section, we just need to
== Jupyter kernels ==


* load the required modules
To use one of your environments inside [[Jupyter]], register it as a kernel.
* activate the environment
* install the '''ipython''' package
* generate the kernel
* Write a wrapper script
* Make the wrapper script executable
* Edit the kernel file


First, load the required modules.
=== From a virtual environment ===


A <code>venv</code> kernel needs a wrapper script, because the kernel is launched without your normal shell environment and so cannot load modules by itself. First, with the environment active, install <code>ipykernel</code> and generate the kernel:


<pre>
<syntaxhighlight lang="bash">
module load 2024
module load 2024
module load Python/3.12.3
module load Python/3.12.3
</pre>
source $myNobackup/PythonEnv/my_env/bin/activate
 
 
Then activate the environment
 
<pre>
source $myBKP/PythonEnv/my_env/bin/activate
</pre>
 
Install the ipykernel package
 
<pre>
pip install ipykernel
pip install ipykernel
</pre>
python -m ipykernel install --user --name=my_env_kernel
</syntaxhighlight>


Run the command below to generate a kernel. Enter the desired name for the kernel with the flag <code>--name</code>
The kernel is written to <code>~/.local/share/jupyter/kernels/</code>, which Jupyter watches. On its own it will not work, because it cannot find the modules — so write a wrapper script that loads them. Save this as, for example, <code>$HOME/wrap.sh</code>:


<pre>
<syntaxhighlight lang="bash">
python -m ipykernel install --user --name=my_env_kernel_name
#!/bin/bash -l
</pre>


The kernel will be written to your home folder, more precisely <code>~/.local/share/jupyter/kernels/</code>.
module reset
This location will be monitored by jupyter, which should display your custom kernel as one of the kernel options.
 
As it is, the kernel will not work on jupyter because it will not be able to find the necessary modules to run it. A workaround is, then, to write a wrapper script to load the modules inside the kernel.
 
 
<pre>
#!/bin/bash
source /etc/bash.bashrc
#modules to load
module load 2024
module load 2024
module load Python/3.12.3
module load Python/3.12.3
# wrapper line
# exec <pathToVirtualEnvPythonExecutable> -m ipykernel_launcher "$@"
exec /lustre/scratch/<AFFILIATION>/<GROUP>/user001/PythonEnv/my_env/bin/bin/python -m ipykernel_launcher "$@"
</pre>


The code above initializes lmod by sourcing <code>/etc/bash.bashrc</code>, then it loaded the modules required by the environment. Finally, it executes ipykernel_launcher from the virtual environment created. In this example, the code above is saved in <code>/home/WUR/user001/wrap.sh</code> .  Make sure the wrapper script is executable.
exec $myNobackup/PythonEnv/my_env/bin/python -m ipykernel_launcher "$@"
</syntaxhighlight>


The last thing to do now is to modify the kernel.json file of the jupyter kernel created above. It should be located at <code>~/.local/share/jupyter/kernels/my_env_kernel_name/kernel.json</code>
The <code>#!/bin/bash -l</code> line starts a login shell, which loads Lmod and sources your <code>~/.bash_aliases</code> (so <code>$myNobackup</code> is defined). Make the wrapper executable with <code>chmod +x $HOME/wrap.sh</code>.


<pre>
Finally point the kernel at the wrapper by editing <code>~/.local/share/jupyter/kernels/my_env_kernel/kernel.json</code>:
 
<syntaxhighlight lang="json">
{
{
  "argv": [
  "argv": [
Line 137: Line 117:
   "{connection_file}"
   "{connection_file}"
  ],
  ],
  "display_name": "Python modTest",
  "display_name": "Python my_env",
  "language": "python",
  "language": "python",
  "metadata": {
  "metadata": {
Line 143: Line 123:
  }
  }
}
}
</pre>
</syntaxhighlight>
 
The only difference from a plain kernel file is that <code>argv</code> points at the wrapper script instead of the Python executable directly.
 
=== From a conda / mamba environment ===
 
A conda or mamba environment is simpler, because the environment is self-contained. With Miniforge active and your environment created, install <code>ipykernel</code> into it and register the kernel:
 
<syntaxhighlight lang="bash">
mamba create -y -n kernel_test python=3 ipykernel
mamba activate kernel_test
python -m ipykernel install --user --name kernel_test
</syntaxhighlight>
 
To remove the kernel and environment again:
 
<syntaxhighlight lang="bash">
jupyter kernelspec uninstall kernel_test
mamba deactivate
mamba remove -y -n kernel_test --all
</syntaxhighlight>
 
== See also ==
 
* [[Environment Modules]]
* [[Installing Personal Software]]
* [[Jupyter]]
* [[R]]
* [[Apptainer]]
 
== External links ==


This kernel differs from a vanilla kernel file by specifying the location of the wrapper script as the first string passed to '''argv'''. Now that the modules are loaded, the kernel should be able to run on jupyter.
* [https://docs.python.org/3/library/venv.html Python venv documentation]
* [https://github.com/conda-forge/miniforge Miniforge]
* [https://conda-forge.org/ conda-forge]

Latest revision as of 14:01, 16 June 2026

Python is a high-level, interpreted programming language popular in scientific computing for its readability and its enormous ecosystem of third-party libraries — NumPy, Pandas, scikit-learn, PyTorch, TensorFlow, and many more. This page describes how to use Python on Anunna: the provided modules, how to manage your own packages with virtual environments or Miniforge, and how to expose an environment as a Jupyter kernel.

Modules

Anunna provides one Python version per module bucket, plus bundle modules that carry a curated set of common extensions. Load a bucket, then the Python module:

  • 2023: Python/3.11.3
  • 2024: Python/3.12.3
module load 2024
module load Python/3.12.3

The bundle module Python-bundle-PyPI adds many frequently-used packages on top of the base interpreter. Use module key <package> to find which bundle contains a package you need (see searching modules by keyword).

For packages not in a module, the two recommended routes are a virtual environment built on a Python module (below), or Miniforge for a self-contained conda/mamba setup. The use of Anaconda is discouraged on Anunna — its default channels carry licensing restrictions and the full distribution is heavy; Miniforge is the lighter, unrestricted alternative.

Virtual environments

A virtual environment is a self-contained directory holding a specific Python and its packages, so one project's dependencies cannot clash with another's. Python's built-in venv module is the simplest way to make one on top of a Python module.

First load the Python version you want:

module load 2024
module load Python/3.12.3

Then create the environment in a location of your choosing. The example uses $myNobackup/PythonEnv$myNobackup is your Lustre nobackup location, set in your ~/.bash_aliases (see Aliases and local variables). Keeping environments on Lustre rather than your home directory avoids filling your home quota, and the nobackup tier is the right choice because an environment can always be recreated from scratch and so does not need backing up.

python -m venv $myNobackup/PythonEnv/my_env

Activate it whenever you want to use it:

source $myNobackup/PythonEnv/my_env/bin/activate

Once active, the environment name appears as a prefix in your prompt:

(my_env) user001@login200:~$

Install packages with pip while the environment is active; they go into the environment, not your home directory:

pip install -U numpy pandas matplotlib

Leave the environment with deactivate.

Miniforge (conda / mamba)

Miniforge is a minimal installer that gives you the conda and mamba package managers preconfigured to use the community conda-forge channel. mamba is a fast drop-in replacement for conda. This is the recommended way to use conda-style environments on Anunna; it avoids Anaconda's licensing restrictions.

Download and run the installer, pointing it at a location with room (your Lustre nobackup space, not your home directory):

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b -p $myNobackup/miniforge3

Do not run conda init on Anunna — it writes startup code into your ~/.bashrc that runs on every login and can interfere with the module system. Instead, activate Miniforge only when you need it:

source $myNobackup/miniforge3/bin/activate

Then create and use environments with mamba:

mamba create -n myenv python=3.12 numpy pandas
mamba activate myenv

Jupyter kernels

To use one of your environments inside Jupyter, register it as a kernel.

From a virtual environment

A venv kernel needs a wrapper script, because the kernel is launched without your normal shell environment and so cannot load modules by itself. First, with the environment active, install ipykernel and generate the kernel:

module load 2024
module load Python/3.12.3
source $myNobackup/PythonEnv/my_env/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=my_env_kernel

The kernel is written to ~/.local/share/jupyter/kernels/, which Jupyter watches. On its own it will not work, because it cannot find the modules — so write a wrapper script that loads them. Save this as, for example, $HOME/wrap.sh:

#!/bin/bash -l

module reset
module load 2024
module load Python/3.12.3

exec $myNobackup/PythonEnv/my_env/bin/python -m ipykernel_launcher "$@"

The #!/bin/bash -l line starts a login shell, which loads Lmod and sources your ~/.bash_aliases (so $myNobackup is defined). Make the wrapper executable with chmod +x $HOME/wrap.sh.

Finally point the kernel at the wrapper by editing ~/.local/share/jupyter/kernels/my_env_kernel/kernel.json:

{
 "argv": [
  "/home/WUR/user001/wrap.sh",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python my_env",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

The only difference from a plain kernel file is that argv points at the wrapper script instead of the Python executable directly.

From a conda / mamba environment

A conda or mamba environment is simpler, because the environment is self-contained. With Miniforge active and your environment created, install ipykernel into it and register the kernel:

mamba create -y -n kernel_test python=3 ipykernel
mamba activate kernel_test
python -m ipykernel install --user --name kernel_test

To remove the kernel and environment again:

jupyter kernelspec uninstall kernel_test
mamba deactivate
mamba remove -y -n kernel_test --all

See also