Details
-
Enabler
-
Should have
-
None
-
Data Processing
-
-
-
5
-
5
-
5
-
1
-
Team_YANDA
-
Sprint 5
-
-
-
-
14.1
-
Stories Completed, Integrated, Solution Intent Updated, Outcomes Reviewed, NFRS met, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO
-
-
SPO-1578
Description
The overall goal here should be to make a first attempt at assessing how execution frameworks can (or cannot) help us addressing SDP scaling challenges. Following the consortia work [1] on this topic we are still working off the assumption that our toughest problem is managing
- storage I/O
- internal I/O
- local memory residency
- workload balance
In a not-yet-entirely-determined priority order. Imaging is best understood in this context, so we should focus on that at first. The overall challenge would be to implement a pipeline with the following properties:
- Computational scaling of ~O(n_vis + log(n_image) n_image^3 + n_source), where n_image is the total image resolution (scaling with maximum baseline length).
- Work with image sizes that are bigger than fit into individual node memory
- Work with (much) more visibilities than fit into collective node memory, i.e. use background storage for loading them (>= 10 TB class)
- Load every visibility from storage only once per major loop iteration
This is a tough combination of challenges. There are basically two known approaches that fit the bill:
- Facet imaging [2] - obviously fulfills (2), can fulfill (1) by reducing facet visibilities using BLDA, can do (3)+(4) by designating nodes to load visibility chunks, phase rotate and distribute to (de)gridding nodes
- Distributed FFT - either using the standard approach or [3] - fulfills (1) natively, mostly fulfills (2) with somewhat increased overhead, can do (3)+(4) with better quality than BDA
What both approaches have in common is that they put quite a bit of strain on the execution framework: In either case we need to load visibilities on a node, then proceed to relate it to the image data - which means scheduling many related tasks across multiple nodes.
[1] http://ska-sdp.org/sites/default/files/attachments/pipeline-working-sets.pdf
[2] https://www.aanda.org/articles/aa/abs/2018/03/aa31474-17/aa31474-17.html
[3] https://gitlab.com/ska-telescope/sdp/ska-sdp-exec-iotest, https://gitlab.com/scpmw/crocodile/-/blob/io_benchmark/examples/notebooks/facet-subgrid-impl-new.ipynb and https://arxiv.org/abs/2108.10720 [under review]