Details
-
Enabler
-
Should have
-
None
-
Data Processing
-
-
-
8
-
8
-
0
-
Team_HIPPO, Team_PANDO
-
Sprint 5
-
-
Overdue
-
-
-
Low G4 Mid G3
Description
See frame in PI21 Backlog board
Who? (Beneficiaries)
- Pipeline developers.
- System scientists (Commissioning).
- Commissioning and Operations staff planning for science commissioning & verification ahead of and during AA2.
Why? (Benefit hypothesis)
- Scaling strategy implemented in AA2 pipelines might not be enough to cover AA* Mid ICAL use cases.
- We also need to evaluate technology options:
- on the storage side (the existing measurement set implementation is a known and recurring bottleneck, even though it likely won't matter until large scales);
- on execution frameworks and networking (we have demonstrated that Dask can orchestrate pipeline, but can it provide the throughput necessary?);
- portability to compute platforms (e.g. accelerators).
What? (Acceptance criteria)
- Implement performance prototype that implements visibility streaming while holding facet data in-memory in a distributed fashion,
- Obvious option would be to start with the "distributed Fourier Transformation / SwiFTly" implementation. (Gridder: HIPPO; memory access patterns and gridding of zarr stores: PANDO)
- Investigate performance, especially considering scheduler and network throughput.
- Demonstrate integration with:
- processing functions;
- storage backends for loading visibilities.
Attachments
Issue Links
- links to