Details
-
Feature
-
Should have
-
SRCnet
-
-
-
1
-
1
-
0
-
Team_DAAC
-
Sprint 1
-
-
-
-
Accepted by FO
-
-
SRC23-PB Teal-D site-provisioning team_DAAC tests-compilation
Description
This is urgent, because it will impact the hardware that needs to be precured for SRCNet v0.1 sites that want to run Kubernetes VMs in OpenStack.
Some workloads require low latency networking to help them scale out between multiple nodes e.g. Dask UCX, MPI, Nvidia cuFFTMp, etc. Some storage systems need RDMA, to work efficiently.
While it is possible to enable ethernet RDMA networking within OpenStack VMs, and K8s clusters running in both OpenStack VMs and OpenStack baremetal nodes, this is not well understood, and it is hardware dependent.
Cambridge Arcus OpenStack cloud (on which CSD3 and Dawn are both running), currently has limited support for RDMA network. We aim to apply the latest SR-IOV option, using OVS offloads, on the hardware where that is possible. Hopefully other OpenStack sites can help review the write up of this work, and collaborate to help document a repeatable way to apply this at other OpenStack sites. It would be helpful to engage others, likely through the HPC and Cloud CoP, to review this write up.
To help measure the success of the setup within K8s VMs, StackHPC have already worked worth Mellanox to test RDMA within Kubernetes using these tests, which can be reused for this work:
https://github.com/stackhpc/kube-perftest
https://www.stackhpc.com/k8s-rdma-openstack.html
Note there is a follow-up piece of work planned, where we look at evacuating the current state of Infini-band within OpenStack VMs. There have been many changes to UFM since this work was first done as part of the AlaSKA effort. Much of the infrastructure targeting HPC and AI at Cambridge is focused on InfiniBand, so this is critical to get access to that hardware without having to re-wire those nodes with some high speed ethernet, when resources are moved between OpenStack VMs and Dawn/CSD3. This all assumes that we do not want to give untrusted users access to baremetal hardware, due to the risks around securing the firmware on those servers. VMs have been shown to have limited overheads, and be much easier to operate, when correctly configured.