Loading...

Xporter

XML

Word

Printable

Details

Type: Architectural Decision
Resolution: Unresolved
Fix Version/s: None
Component/s: None
Labels:
None

Link to Confluence:
https://confluence.skatelescope.org/display/SWSI/ADR-100+Continuous+Integration+and+Deployment+of+HPC+software

Description

Why now? Software deployment issues are starting to bite

Scope: Data processing software

Quality: Maintainability, Usability

With the SKA we are developing a lot of software for the purpose of high-performance computing. The facilities directly available to the SKA (e.g. STFC cluster) are at present not remotely sufficient to test such code at scale, which necessitates that we move to shared HPC resources (i.e. SLURM clusters). Furthermore, we would like to retain portability to other large existing HPC facilities in order to allow us to test system setups for SKA usage.

This presents a couple of challenges:

These systems are not under our full control, which means that we generally cannot assume a certain operating system or system library (version) to be present
Depending on environment we might especially want to link against different dependencies (e.g. match the MPI implementation to the interconnect, or the FFT implementation to the CPU vendor)
We still want to retain reproduceability of scientific results as much as possible

What makes this more pressing is that running such software will increasingly becoming an end-user issue, as we will want to enable scientific users (both at the SKA and SRCs) to do this - and ideally in an easy way.

Attachments

Structure

Activity

People

Assignee:: Bartolini, Marco

Reporter:: Wortmann, Peter

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 19/Feb/24 2:11 PM

Updated:: 30/Apr/24 8:59 AM

Continuous Integration / Deployment of HPC software