Setting up Python virtualenv

From HPCwiki
Jump to navigation Jump to search

With many Python packages available, which are often in conflict or requiring different versions depending on application, installing and controlling packages and versions is not always easy. In addition, so many packages are often used only occasionally, that it is questionable whether a system administrator of a centralized server system or a High Performance Compute (HPC) infrastructure can be expected to resolve all issues posed by users of the infrastructure. Even on a local system with full administrative rights managing versions, dependencies, and package collisions is often very difficult. The solution is to use a virtual environment, in which a specific set of packages can then be installed. As many different virtual environments can be created, and used side-by-side, as is necessary.

NOTE: as of Python 3.3 virtual environment support is built-in. See this page for an alternative set-up of your virtual environment if using Python 3.4 or higher.

Creating a new virtual environment

It is assumed that the appropriate virtualenv executable for the Python version of choice is installed. A new virtual environment, in this case called newenv is created like so:

module load python/my-favourite-version (e.g. 2.7.12)
virtualenv newenv
OR
pyvenv newenv (For versions >3.4)

When the new environment is created, one will see a message similar to this:

  New python executable in newenv/bin/python3
  Also creating executable in newenv/bin/python
  Installing Setuptools.........................................................................done.
  Installing Pip................................................................................done.

Activating a virtual environment

Once the environment is created, each time the environment needs to be activated, the following command needs to be issued:

source newenv/bin/activate

This assumes that the folder that contains the virtual environment documents (in this case called newenv), is in the present working directory. When working on the virtual environment, the virtual environment name will be between brackets in front of the user-host-prompt string.

  (newenv)user@host:~$

Installing modules on the virtual environment

Installing modules is the same as usual. The difference is that modules are in /path/to/virtenv/lib, which may be living somewhere on your home directory. When working from the virtual environment, the default pip will belong to the python version that is currently active. This means that the executable in /path/to/virtenv/bin are in fact the first in the $PATH.

pip install numpy

Similarly, installing packages from source works exactly the same as usual.

python setup.py install

deactivating a virtual environment

Quitting a virtual environment can be done by using the command deactivate, which was loaded using the source command upon activating the virtual environment.

deactivate

Virtualenv kernels in Jupyter

Want your own virtualenv kernel in a notebook? This can be done by making your own kernel specifications:

(an alternative way to the manual way (using conda) is described here )

  • Make sure you have the ipykernel module in your venv. Activate it and pip install it:
source ~/path/to/my/virtualenv/bin/activate && pip install ipykernel
  • Create the following directory path in your homedir if it doesn't already exist:
mkdir -p ~/.local/share/jupyter/kernels/
  • Think of a nice descriptive name that doesn't clash with one of the already present kernels. I'll use 'testing'. Create this folder:
mkdir ~/.local/share/jupyter/kernels/testing/
  • Add this file to this folder:
vi ~/.local/share/jupyter/kernels/testing/kernel.json 
{
 "language": "python",
 "argv": [
  "/home/myhome/path/to/my/virtualenv/bin/python",
  "-m",
  "ipykernel",
  "-f",
  "{connection_file}"
 ],
 "display_name": "testing"
}
  • Reload Jupyterhub page. testing should now exist in your kernels list.

You can do more complex things with this, such as construct your own Spark environment. This relies on having the module findspark installed:

 vi ~/.local/share/jupyter/kernels/mysparkkernel/kernel.json 
{
 "language": "python",
 "env": {
   "SPARK_HOME":
     "/shared/apps/spark/my-spark-version"
 },
 "argv": [
  "/home/myhome/my/spark/venv/bin/python",
  "-m",
  "ipykernel",
  "-c", "import findspark; findspark.init()",
  "-f",
  "{connection_file}"
 ],
 "display_name": "My Spark kernel"
}

(You'll want to make sure your spark cluster has the same environment - start it after activating this venv inside your sbatch script)

Make IPython work under virtualenv

IPython may not work initially under a virtual environment. It may produce an error message like below:

    File "/usr/bin/ipython", line 11
    print "Could not start qtconsole. Please install ipython-qtconsole"
                                                                      ^

This can be resolved by adding a soft link with the name ipython to the bin directory in the virtual environment folder.

ln -s /path/to/virtenv/bin/ipython3 /path/to/virtenv/bin/ipython

External links