JupyterHub with GPU: Difference between revisions

From HPCwiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 5: Line 5:
* [https://wiki.anunna.wur.nl/index.php/Log_in_to_Anunna Connect to login node of Anunna]
* [https://wiki.anunna.wur.nl/index.php/Log_in_to_Anunna Connect to login node of Anunna]
* [https://wiki.anunna.wur.nl/index.php/Running_Snakemake_pipelines#Installation Install miniconda]
* [https://wiki.anunna.wur.nl/index.php/Running_Snakemake_pipelines#Installation Install miniconda]
== Link lustre path to home directory ==
When working from Jupyterhub the default working directory is the home folder. However, it is recommended to put your data and code on the lustre pathings. To make this easier, we can create a link to lustre from our home directory:
<nowiki>
ln -s /lustre/[path to your lustre folder] [reference name, for example lustre_folders]</nowiki>
To remove a link:
<nowiki>
rm [reference name, for example lustre_folders]</nowiki>


== Create conda environment that we can use for a jupyter kernel ==  
== Create conda environment that we can use for a jupyter kernel ==  


  <nowiki>
  <nowiki>
conda create -y -n kernel_test python=3 ipykernel && conda activate kernel_test
conda create -y -n kernel_test python=3.10 ipykernel  
conda activate kernel_test
python -m ipykernel install --user --name kernel_test</nowiki>
python -m ipykernel install --user --name kernel_test</nowiki>


NOTE: You can specific the python version for you conda environment with python=3 Please take care what python version is compatible with you required packages.  
NOTE: You can specific the python version for you conda environment with python=3.10 Please take care what python version is compatible with you required packages.


== Install required packages ==
== Install required packages ==
Line 20: Line 33:
As an example I use the following pytorch installation:
As an example I use the following pytorch installation:


<nowiki>
<nowiki>
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118</nowiki>
</nowiki>


= Start jupyter notebook with GPU =  
= Start jupyter notebook with GPU =  
Line 28: Line 40:
Go [https://notebook.anunna.wur.nl here] and select:
Go [https://notebook.anunna.wur.nl here] and select:


* Select a location for your server: on the cluster (default option)
* <strong>Select a location for your server:</strong> on the cluster (default option)
* Partition to use: gpu
* <strong>Partition to use:</strong> gpu
* Partition to use: gpu
* <strong>Memory (in MB):</strong> desired memory
* Memory (in MB): desired memory
* <strong>Number of CPUs:</strong> desired CPU count
* Number of CPUs: desired CPU count
* <strong>Maximum execution time (hours:minutes:seconds):</strong> maximum amount of time the notebook is available
* Maximum execution time (hours:minutes:seconds): maximum amount of time the notebook is available
* <strong>Extra options:</strong> --gres=gpu:1 (default when selecting GPU, gpu:x for x amount of GPUs)
* Extra options: --gres=gpu:1 (default when selecting GPU, gpu:x for x amount of GPUs)
 
= Using multiple GPUs =
 
* Select multiple GPUs in when starting jupyterhub in the extra options menu: --gres=gpu:x where x is amount of requested GPUs
* There should be multiple GPUs available to the jupyterhub notebook. Check this by using GPU tests in the following section.


= Test GPU availability =
= Test GPU availability =
Line 40: Line 56:
== Pytorch ==  
== Pytorch ==  


<nowiki>
<nowiki>
def check_all_cuda_devices():
def check_all_cuda_devices():
     device_count = torch.cuda.device_count()
     device_count = torch.cuda.device_count()
Line 104: Line 120:
     mat_mul = torch.matmul(tensor_a, cuda_twos.T)
     mat_mul = torch.matmul(tensor_a, cuda_twos.T)
     print(mat_mul, '\n')
     print(mat_mul, '\n')
    try:
        get_version()
    except Exception as e:
        print('get_version() failed, exception message below:')
        print(e)


    try:
try:
        check_cuda()
    get_version()
    except Exception as e:
except Exception as e:
        print('check_cuda() failed, exception message below:')
    print('get_version() failed, exception message below:')
        print(e)
    print(e)


     #print('>>>> time.sleep(20)')
try:
     #time.sleep(20)
    check_cuda()
except Exception as e:
     print('check_cuda() failed, exception message below:')
     print(e)


    try:
try:
        check_cuda_ops()
    check_cuda_ops()
    except Exception as e:
except Exception as e:
        print('check_cuda_ops() failed, exception message below:')
    print('check_cuda_ops() failed, exception message below:')
        print(e)
    print(e)</nowiki>
</nowiki>


== Tensorflow ==
== Tensorflow ==


<nowiki>
<nowiki>
import tensorflow as tf
import tensorflow as tf
hasGPUSupport = tf.test.is_built_with_cuda()
hasGPUSupport = tf.test.is_built_with_cuda()
gpuList = tf.config.list_physical_devices('GPU')
gpuList = tf.config.list_physical_devices('GPU')
print("Tensorflow Compiled with CUDA/GPU Support:", hasGPUSupport)
print("Tensorflow Compiled with CUDA/GPU Support:", hasGPUSupport)
print("Tensorflow can access", len(gpuList), "GPU")
print("Tensorflow can access", len(gpuList), "GPU")
Line 145: Line 160:
# Run on the GPU
# Run on the GPU
c = tf.matmul(a, b)
c = tf.matmul(a, b)
print(c)
print(c)</nowiki>
</nowiki>

Latest revision as of 12:34, 25 October 2023

Create a jupyterhub instance with GPU support enabled.

setup

Link lustre path to home directory

When working from Jupyterhub the default working directory is the home folder. However, it is recommended to put your data and code on the lustre pathings. To make this easier, we can create a link to lustre from our home directory:

ln -s /lustre/[path to your lustre folder] [reference name, for example lustre_folders]

To remove a link:

rm [reference name, for example lustre_folders]

Create conda environment that we can use for a jupyter kernel

conda create -y -n kernel_test python=3.10 ipykernel 
conda activate kernel_test
python -m ipykernel install --user --name kernel_test

NOTE: You can specific the python version for you conda environment with python=3.10 Please take care what python version is compatible with you required packages.

Install required packages

For pytorch you can find information here and for TensorFlow here.

As an example I use the following pytorch installation:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Start jupyter notebook with GPU

Go here and select:

  • Select a location for your server: on the cluster (default option)
  • Partition to use: gpu
  • Memory (in MB): desired memory
  • Number of CPUs: desired CPU count
  • Maximum execution time (hours:minutes:seconds): maximum amount of time the notebook is available
  • Extra options: --gres=gpu:1 (default when selecting GPU, gpu:x for x amount of GPUs)

Using multiple GPUs

  • Select multiple GPUs in when starting jupyterhub in the extra options menu: --gres=gpu:x where x is amount of requested GPUs
  • There should be multiple GPUs available to the jupyterhub notebook. Check this by using GPU tests in the following section.

Test GPU availability

Pytorch

def check_all_cuda_devices():
    device_count = torch.cuda.device_count()
    for i in range(device_count):
        print('>>>> torch.cuda.device({})'.format(i))
        result = torch.cuda.device(i)
        print(result, '\n')

        print('>>>> torch.cuda.get_device_name({})'.format(i))
        result = torch.cuda.get_device_name(i)
        print(result, '\n')


def check_cuda():
    print('>>>> torch.cuda.is_available()')
    result = torch.cuda.is_available()
    print(result, '\n')

    print('>>>> torch.cuda.device_count()')
    result = torch.cuda.device_count()
    print(result, '\n')

    print('>>>> torch.cuda.current_device()')
    result = torch.cuda.current_device()
    print(result, '\n')

    print('>>>> torch.cuda.device(0)')
    result = torch.cuda.device(0)
    print(result, '\n')

    print('>>>> torch.cuda.get_device_name(0)')
    result = torch.cuda.get_device_name(0)
    print(result, '\n')

    check_all_cuda_devices()


def check_cuda_ops():
    print('>>>> torch.zeros(2, 3)')
    zeros = torch.zeros(2, 3)
    print(zeros, '\n')

    print('>>>> torch.zeros(2, 3).cuda()')
    cuda_zero = torch.zeros(2, 3).cuda()
    print(cuda_zero, '\n')

    print('>>>> torch.tensor([[1, 2, 3], [4, 5, 6]])')
    tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]]).cuda()
    print(tensor_a, '\n')

    print('>>>> tensor_a + cuda_zero')
    sum = tensor_a + cuda_zero
    print(sum, '\n')

    print('>>>> tensor_a * cuda_twos')
    tensor_a = tensor_a.to(torch.float)
    cuda_zero = cuda_zero.to(torch.float)
    cuda_twos = (cuda_zero + 1.0) * 2.0
    product = tensor_a * cuda_twos
    print(product, '\n')

    print('>>>> torch.matmul(tensor_a, cuda_twos.T)')
    mat_mul = torch.matmul(tensor_a, cuda_twos.T)
    print(mat_mul, '\n')

try:
    get_version()
except Exception as e:
    print('get_version() failed, exception message below:')
    print(e)

try:
    check_cuda()
except Exception as e:
    print('check_cuda() failed, exception message below:')
    print(e)

try:
    check_cuda_ops()
except Exception as e:
    print('check_cuda_ops() failed, exception message below:')
    print(e)

Tensorflow

import tensorflow as tf

hasGPUSupport = tf.test.is_built_with_cuda()
gpuList = tf.config.list_physical_devices('GPU')

print("Tensorflow Compiled with CUDA/GPU Support:", hasGPUSupport)
print("Tensorflow can access", len(gpuList), "GPU")
print("Accessible GPUs are:")
print(gpuList)

tf.debugging.set_log_device_placement(True)
# Place tensors on the GPU
with tf.device('device:GPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Run on the GPU
c = tf.matmul(a, b)
print(c)