Details
-
Spike
-
Must have
-
None
-
Data Processing
-
Intra Program
-
2
-
0
-
-
-
12.6
Description
This should be a first attempt at making a reference pipeline that we can utilise to assess how execution frameworks can (or cannot) help us addressing SDP scaling challenges. Following the consortia work [1] on this topic we are still working off the assumption that our toughest problem is managing
- storage I/O
- internal I/O
- local memory residency
- workload balance
In a not-yet-entirely-determined priority order. To this end, a test pipeline should fulfill the following requirements:
- Operate entirely in image space (as DFT is expected to not scale)
- Demonstrate ability to work with images >128k^2 in size
- Load visibilities from storage only once
For the purpose of this test, this could be either an imaging (visibilities -> image) or a prediction (image -> visibilities) pipeline. It is expected that there would be two candidate approaches:
- Use a distributed FFT as demonstrated in https://gitlab.com/ska-telescope/sdp/ska-sdp-exec-iotest (and documented in https://gitlab.com/scpmw/crocodile/-/blob/io_benchmark/examples/notebooks/facet-subgrid-impl-new.ipynb and https://arxiv.org/abs/2108.10720 [under review])
- Use a facetted imaging approach using phase rotation and per-facet baseline-dependent averaging. This approach has less quality, but less memory requirements and has been predicted by the parametric model to scale as well. It might also be easier to assemble using existing, non-experimental software.
[1] http://ska-sdp.org/sites/default/files/attachments/pipeline-working-sets.pdf
Some previous tickets which prepared the io pipeline for tests: SP-1099 SP-1181 SPO-1196 SPO-971 Hopefully should be easy to pick up from there.
Attachments
Issue Links
- mentioned in
-
Page Loading...