Setting up Python virtualenv
With many Python packages available, which are often in conflict or requiring different versions depending on application, installing and controlling packages and versions is not always easy. In addition, so many packages are often used only occasionally, that it is questionable whether a system administrator of a centralized server system or a High Performance Compute (HPC) infrastructure can be expected to resolve all issues posed by users of the infrastructure. Even on a local system with full administrative rights managing versions, dependencies, and package collisions is often very difficult. The solution is to use a virtual environment, in which a specific set of packages can then be installed. As many different virtual environments can be created, and used side-by-side, as is necessary.
NOTE: as of Python 3.3 virtual environment support is built-in. See this page for an alternative set-up of your virtual environment if using Python 3.4 or higher.
Creating a new virtual environment
It is assumed that the appropriate virtualenv
executable for the Python version of choice is installed. A new virtual environment, in this case called newenv
is created like so:
<source lang='bash'>
module load python/my-favourite-version (e.g. 2.7.12)
virtualenv newenv
OR
pyvenv newenv (For versions >3.4)
</source>
When the new environment is created, one will see a message similar to this:
New python executable in newenv/bin/python3 Also creating executable in newenv/bin/python Installing Setuptools.........................................................................done. Installing Pip................................................................................done.
Activating a virtual environment
Once the environment is created, each time the environment needs to be activated, the following command needs to be issued:
<source lang='bash'>
source newenv/bin/activate
</source>
This assumes that the folder that contains the virtual environment documents (in this case called newenv
), is in the present working directory.
When working on the virtual environment, the virtual environment name will be between brackets in front of the user-host-prompt
string.
(newenv)user@host:~$
Installing modules on the virtual environment
Installing modules is the same as usual. The difference is that modules are in /path/to/virtenv/lib
, which may be living somewhere on your home directory. When working from the virtual environment, the default pip
will belong to the python version that is currently active. This means that the executable in /path/to/virtenv/bin
are in fact the first in the $PATH
.
<source lang='bash'>
pip install numpy
</source>
Similarly, installing packages from source works exactly the same as usual.
<source lang='bash'>
python setup.py install
</source>
deactivating a virtual environment
Quitting a virtual environment can be done by using the command deactivate
, which was loaded using the source
command upon activating the virtual environment.
<source lang='bash'>
deactivate
</source>
Virtualenv kernels in Jupyter
Want your own virtualenv kernel in a notebook? This can be done by making your own kernel specifications:
(an alternative way to the manual way (using conda) is described here )
- Make sure you have the ipykernel module in your venv. Activate it and pip install it:
source ~/path/to/my/virtualenv/bin/activate && pip install ipykernel
- Create the following directory path in your homedir if it doesn't already exist:
mkdir -p ~/.local/share/jupyter/kernels/
- Think of a nice descriptive name that doesn't clash with one of the already present kernels. I'll use 'testing'. Create this folder:
mkdir ~/.local/share/jupyter/kernels/testing/
- Add this file to this folder:
vi ~/.local/share/jupyter/kernels/testing/kernel.json { "language": "python", "argv": [ "/home/myhome/path/to/my/virtualenv/bin/python", "-m", "ipykernel", "-f", "{connection_file}" ], "display_name": "testing" }
- Reload Jupyterhub page. testing should now exist in your kernels list.
You can do more complex things with this, such as construct your own Spark environment. This relies on having the module findspark installed:
vi ~/.local/share/jupyter/kernels/mysparkkernel/kernel.json { "language": "python", "env": { "SPARK_HOME": "/cm/shared/apps/spark/my-spark-version" }, "argv": [ "/home/myhome/my/spark/venv/bin/python", "-m", "ipykernel", "-c", "import findspark; findspark.init()", "-f", "{connection_file}" ], "display_name": "My Spark kernel" }
(You'll want to make sure your spark cluster has the same environment - start it after activating this venv inside your sbatch script)
Make IPython work under virtualenv
IPython may not work initially under a virtual environment. It may produce an error message like below:
File "/usr/bin/ipython", line 11 print "Could not start qtconsole. Please install ipython-qtconsole" ^
This can be resolved by adding a soft link with the name ipython
to the bin
directory in the virtual environment folder.
<source lang='bash'>
ln -s /path/to/virtenv/bin/ipython3 /path/to/virtenv/bin/ipython
</source>