Details
-
Enabler
-
Should have
-
None
-
Data Processing
-
-
-
2
-
2
-
0
-
Team_YANDA
-
Sprint 5
-
-
-
-
-
Low-G1 Mid-G1
Description
The ingest pipeline has grown to be a complex system with many moving pieces and software components. With time we have characterised some of these components' performance, but not all of them. Similarly, for those systems where we understand their performance, we have only explored some of the data dimensions, leaving others unattended. For example, we have a good understanding of the performance of the visibility receiver in the time and frequency dimension, but haven't tested it much along the lines of number of antennas/baselines.
Who?
- Ingest pipeline developer
What?
- Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions.
- For said processing scripts, parameters should propagate in the usual fashion to to cbf-emulator/receiver/processors
- Define a mode in which the cbf-emulator package send dummy (zero) data for arbitrary sizes in all axes (channels, antennas/baselines, polarisations and time), without requiring an input Measurement Set.
Enable the reception of arbitrary data by allowing to specify the arbitrary associated metadata in an easy way, without requiring external input artefacts like full ExecutionBlock definitions, antenna layouts or input Measurement Sets.(why not? execution blocks or antenna layouts are not particularly heavy, and the input MS part is covered by the previous point.)
- Check that strided spectral windows / channel definitions work throughout the pipeline as expected
- Tackle currently-known performance bottlenecks on the ingest pipeline.
Why?
- Enhance our ability to produce and scale testing scenarios with ease.
- Benchmark and profile the ingest pipeline, and its components, when scaling different data axes.
- Characterise and understand the resource needs of the ingest pipeline as it currently stands for AA0.5 for the different known processors.
- Exploring and understand the next bottlenecks of the ingest pipeline towards AA2.