Details
-
Feature
-
Must have
-
SRCnet
-
-
-
3.5
-
3.5
-
0
-
Team_RED
-
Sprint 5
-
-
-
-
24.3
-
Stories Completed, Outcomes Reviewed
-
-
SRC23-PB SRCNet0.1 science-platform-services
Description
This Feature will build on the lessons learned from SP-3885. There will be an emphasis on low-level scheduling at the kubernetes level.
This will involve having kueue execute Jobs on the cluster and prioritize interactive jobs over headless jobs.
Single kueue
There may need to be an additional queue to support preemptive behaviour. This supports launching interactive Jobs with no wait times.
The Job description properties/labels are already supported by the skaha Jobs.
This will require Helm chart changes.
Dynamic queue creation at deployment time is out of scope for this story.
There are 2 possible approaches to ensure that interactive jobs have resource priority over batch jobs:
- Use Workload Priority Class in KUEUE to prioritise a job. This requires at least 2 workload priority classes to be defined, each with a value for the priority. A higher value ensures a higher priority over a lower value. Example – a value of 90000 has a higher weightage over 50000. Constraint – all jobs must be scheduled on a single queue.
- Schedule headless jobs using KUEUE while using the native kubernetes scheduler for interactive jobs. This behaviour needs to be observed.