JupyterHub with GPU
Jump to navigation
Jump to search
Create a jupyterhub instance with GPU support enabled.
setup
Link lustre path to home directory
When working from Jupyterhub the default working directory is the home folder. However, it is recommended to put your data and code on the lustre pathings. To make this easier, we can create a link to lustre from our home directory:
ln -s /lustre/[path to your lustre folder] [reference name, for example lustre_folders]
To remove a link:
rm [reference name, for example lustre_folders]
Create conda environment that we can use for a jupyter kernel
conda create -y -n kernel_test python=3.10 ipykernel conda activate kernel_test python -m ipykernel install --user --name kernel_test
NOTE: You can specific the python version for you conda environment with python=3 Please take care what python version is compatible with you required packages.
Install required packages
For pytorch you can find information here and for TensorFlow here.
As an example I use the following pytorch installation:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Start jupyter notebook with GPU
Go here and select:
- Select a location for your server: on the cluster (default option)
- Partition to use: gpu
- Memory (in MB): desired memory
- Number of CPUs: desired CPU count
- Maximum execution time (hours:minutes:seconds): maximum amount of time the notebook is available
- Extra options: --gres=gpu:1 (default when selecting GPU, gpu:x for x amount of GPUs)
Using multiple GPUs
- Select multiple GPUs in when starting jupyterhub in the extra options menu: --gres=gpu:x where x is amount of requested GPUs
- There should be multiple GPUs available to the jupyterhub notebook. Check this by using GPU tests in the following section.
Test GPU availability
Pytorch
def check_all_cuda_devices(): device_count = torch.cuda.device_count() for i in range(device_count): print('>>>> torch.cuda.device({})'.format(i)) result = torch.cuda.device(i) print(result, '\n') print('>>>> torch.cuda.get_device_name({})'.format(i)) result = torch.cuda.get_device_name(i) print(result, '\n') def check_cuda(): print('>>>> torch.cuda.is_available()') result = torch.cuda.is_available() print(result, '\n') print('>>>> torch.cuda.device_count()') result = torch.cuda.device_count() print(result, '\n') print('>>>> torch.cuda.current_device()') result = torch.cuda.current_device() print(result, '\n') print('>>>> torch.cuda.device(0)') result = torch.cuda.device(0) print(result, '\n') print('>>>> torch.cuda.get_device_name(0)') result = torch.cuda.get_device_name(0) print(result, '\n') check_all_cuda_devices() def check_cuda_ops(): print('>>>> torch.zeros(2, 3)') zeros = torch.zeros(2, 3) print(zeros, '\n') print('>>>> torch.zeros(2, 3).cuda()') cuda_zero = torch.zeros(2, 3).cuda() print(cuda_zero, '\n') print('>>>> torch.tensor([[1, 2, 3], [4, 5, 6]])') tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]]).cuda() print(tensor_a, '\n') print('>>>> tensor_a + cuda_zero') sum = tensor_a + cuda_zero print(sum, '\n') print('>>>> tensor_a * cuda_twos') tensor_a = tensor_a.to(torch.float) cuda_zero = cuda_zero.to(torch.float) cuda_twos = (cuda_zero + 1.0) * 2.0 product = tensor_a * cuda_twos print(product, '\n') print('>>>> torch.matmul(tensor_a, cuda_twos.T)') mat_mul = torch.matmul(tensor_a, cuda_twos.T) print(mat_mul, '\n') try: get_version() except Exception as e: print('get_version() failed, exception message below:') print(e) try: check_cuda() except Exception as e: print('check_cuda() failed, exception message below:') print(e) try: check_cuda_ops() except Exception as e: print('check_cuda_ops() failed, exception message below:') print(e)
Tensorflow
import tensorflow as tf hasGPUSupport = tf.test.is_built_with_cuda() gpuList = tf.config.list_physical_devices('GPU') print("Tensorflow Compiled with CUDA/GPU Support:", hasGPUSupport) print("Tensorflow can access", len(gpuList), "GPU") print("Accessible GPUs are:") print(gpuList) tf.debugging.set_log_device_placement(True) # Place tensors on the GPU with tf.device('device:GPU:0'): a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) # Run on the GPU c = tf.matmul(a, b) print(c)