Details
-
Feature
-
Not Assigned
-
None
-
None
Description
Some of this is duplicated in SP-4287, so we need to strip this one back to only focus on why we would investigate reframe (Alex).
Going forward, STARS needs to be improved to address the following:
- Workloads are becoming more varied. For instance, some are CPU only, some are GPU only, some are short, some are long. We need a way to select a subset of these where appropriate (possibly matching the underlying resources).
- Fixes related to proper separation of setup and running, particularly for environments where worker nodes do not have Internet access.
The main part of the work here would involve evaluating if it would be worth rewriting STARS with ReFrame, since it can address or simplify some of the points outlined. As part of this work are specific questions that we need to answer:
Why does it benefit STARS code?
Why use ReFrame and not something else?
What are the goals in mind and why do we need ReFrame to achieve them?
Is this actually how we are going to be using STARS?
Is this something to consider in the next 3-6 months or is this functionality a longer term goal that we don't know what direction to take in?
https://reframe-hpc.readthedocs.io/en/stable/
Attachments
Issue Links
- Child Of
-
SP-4781 Distributed Data Computing v0.2 - Roadmap
- Funnel