Details
-
Feature
-
Not Assigned
-
None
-
None
-
Data Processing
-
-
-
11.5
Description
Data islands are an important concept for storage scalability in the SDP architecture. The vision is that we distribute large data objects (such as visibilities) across multiple isolated storage instances. If paired with suitable locality of execution engines, this is meant to provide SDP with guaranteed horizontal scaling.
There are two aspects we should likely investigate:
- Performance scalability. Attempt to allocate a number of storage instances and demonstrate that their performance behaviour is isolated from each other (at SKA-appropriate scale in terms of throughput and size). Test workload should be an appropriate I/O-centric program such as the imaging I/O benchmark.
- Overheads: In order to make this work, we will need to bring up and tear down a lot of storage instances and execution engines. It needs to be shown that this can be accomplished without negatively impacting the complexity and robustness of workflows. Test workload could be a couple of major loops using the ARL or another existing radio astronomy software, utilising shared storage instances to gather images for cleaning.