This is a multi-container Slurm cluster using Kubernetes. The Helm chart creates a named volume for persistent storage of MySQL data files as well as an NFS volume for shared storage.
Requires:
The Helm chart will run the following containers:
The Helm chart will create the following named volumes:
A named RWM volume mounted to /home
is also expected, this can be external or can be deployed using the scripts in the /nfs
directory (See “Deploying the Cluster”)
All config files in slurm-cluster-chart/files
will be mounted into the container to configure their respective services on startup. Note that changes to these files will not all be propagated to existing deployments (see “Reconfiguring the Cluster”).
Additional parameters can be found in the values.yaml
file, which will be applied on a Helm chart deployment. Note that some of these values will also not propagate until the cluster is restarted (see “Reconfiguring the Cluster”).
On initial deployment ONLY, run
./generate-secrets.sh
This generates a set of secrets. If these need to be regenerated, see “Reconfiguring the Cluster”
An RWM volume is required, if a named volume exists, set nfs.claimName
in the values.yaml
file to its name. If not, manifests to deploy a Rook NFS volume are provided in the /nfs
directory. You can deploy this by running
/nfs/deploy-nfs.sh
and leaving nfs.claimName
as the provided value
To access the cluster via ssh
, you will need to make your public keys available. Do this by running
./publish-keys.sh
After configuring kubectl
with the appropriate kubeconfig
file, deploy the cluster using the Helm chart:
helm install <deployment-name> slurm-cluster-chart
Subsequent releases can be deployed using:
helm upgrade <deployment-name> slurm-cluster-chart
Retrieve the external IP address of the login node using:
LOGIN=$(kubectl get service login -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
and connect to the cluster as the rocky
user with
ssh rocky@$LOGIN
From the shell, execute slurm commands, for example:
[root@slurmctld /]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 2 idle c[1-2]
The Intel MPI Benchmarks are included in the containers. These can be run both with mpirun and srun. Example job scripts:
#SBATCH -N 2 #SBATCH –ntasks-per-node=1
echo $SLURM_JOB_ID: $SLURM_JOB_NODELIST srun /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 pingpong
* mpirun
```console
#!/usr/bin/env bash
#SBATCH -N 2
#SBATCH --ntasks-per-node=1
echo $SLURM_JOB_ID: $SLURM_JOB_NODELIST
/usr/lib64/openmpi/bin/mpirun --prefix /usr/lib64/openmpi mpitests-IMB-MPI1 pingpong
Note: The mpirun script assumes you are running as user ‘rocky’. If you are running as root, you will need to include the –allow-run-as-root argument
To guarantee changes to config files are propagated to the cluster, use
kubectl rollout restart deployment <deployment-names>
Generally restarts to slurmd
, slurmctld
, login
and slurmdbd
will be required
Regenerate secrets by rerunning
./generate-secrets.sh
Some secrets are persisted in volumes, so cycling them requires a full teardown and reboot of the volumes and pods which these volumes are mounted on. Run
kubectl delete deployment mysql
kubectl delete pvc var-lib-mysql
helm upgrade <deployment-name> slurm-cluster-chart
and then restart the other dependent deployments to propagate changes:
kubectl rollout restart deployment slurmd slurmctld login slurmdbd