Compute Nodes

From HPCwiki
Jump to navigation Jump to search

The compute nodes are where your jobs actually run. Unlike the login nodes, you do not log in to a compute node to do your work — you reach it through the SLURM scheduler, which places your job on a node matching the resources you request.

For what the different node types offer (CPU, NVIDIA GPU, AMD GPU) and their hardware, see Compute Hardware Overview; for how jobs are placed, see Scheduler Overview (Slurm).

Running work on a compute node

  • For a non-interactive job, submit a batch script — see Batch Jobs.
  • For an interactive session — a shell on a compute node — use sinteractive; see Interactive Jobs.

Connecting directly to a compute node

Once you have an interactive job running, you can connect to the node it landed on directly from your own machine — handy for tools such as VSCode Remote.

First, start an interactive job:

sinteractive

or, for an NVIDIA A100 GPU:

sinteractive -p gpu --gres=gpu:1 --constraint='nvidia&A100'

Check which node it landed on:

squeue -u $USER -l

Add the following to your local ~/.ssh/config so SSH jumps via the login node:

Host node* gpu*
    Hostname       %h.internal.anunna.wur.nl
    ProxyJump      login.anunna.wur.nl
    User           your_user_id

Host *.anunna.wur.nl
    User           your_user_id

You can now connect to the node directly from your local terminal, for example:

ssh node201

In VSCode, open a remote window, choose Connect to Host…, and type the node name. After a few seconds VSCode is connected to the node.

Be aware that once the interactive job finishes, the connection is lost and any processes still running on the node are killed. You are also limited to the amount of memory you requested.

Logging on to a worker node for inspection

You may SSH from a login node to a worker node that is running one of your jobs, to get a little more insight into what the job is doing. This does not require a password:

ssh node049

This is only for inspecting your own running jobs — running work on a node outside the scheduler is not permitted.

See also