Details
-
Architectural Decision
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Why now? Software deployment issues are starting to bite
Scope: Data processing software
Quality: Maintainability, Usability
With the SKA we are developing a lot of software for the purpose of high-performance computing. The facilities directly available to the SKA (e.g. STFC cluster) are at present not remotely sufficient to test such code at scale, which necessitates that we move to shared HPC resources (i.e. SLURM clusters). Furthermore, we would like to retain portability to other large existing HPC facilities in order to allow us to test system setups for SKA usage.
This presents a couple of challenges:
- These systems are not under our full control, which means that we generally cannot assume a certain operating system or system library (version) to be present
- Depending on environment we might especially want to link against different dependencies (e.g. match the MPI implementation to the interconnect, or the FFT implementation to the CPU vendor)
- We still want to retain reproduceability of scientific results as much as possible
What makes this more pressing is that running such software will increasingly becoming an end-user issue, as we will want to enable scientific users (both at the SKA and SRCs) to do this - and ideally in an easy way.