Loading...

Change Owns to Parent Ofs

Set start and due date...

Xporter

XML

Word

Printable

Details

Type: Enabler
Priority: Should have
Fix Version/s: PI20
Component/s: COM SDP SW
Labels:
None

ARTs:

Data Processing
Benefit hypothesis:

Hide

See Why?

Show
See Why?
Acceptance criteria:

Hide

See What?

Show
See What?
Feature Points:
2
Initial Size:
2
WSJF:
0
Epic Link:
AA0.5 Observation Execution
Agile Teams:

Team_YANDA
Due Sprint:
Sprint 5
Story Point Burn-up:
Overdue:
Outcomes:

Hide

A large part of this effort consisted of developing and extending an “integration repository”.
SKAO / Science Data Processor / SDP Realtime Receive Integration · GitLab

This repository allows for automatic integration tests without complicated orchestration.
The intent being to provide lightweight infrastructure that can be easily deployed by visibility receive developers for the first stages of ‘integration’ within the visibility receive component. Although there is probably a need to streamline the integration at a level higher than this. For example some early comments were made by the FO that new processing scripts could be adapted to provide this functionality — the decision was made to limit the scope of this particular feature to a new repository within the component.

The Acceptance Criteria were satisfied within this framework

Acceptance Criteria:

1) Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions.

This is satisfied by a number of linked tickets that widen the capability of the integration repo including:

Initial tickets to build the repo and move some integration tests from elsewhere in the visbility receive domain and consolidate them here (YAN-1511, YAN-1516, YAN-1517, YAN-1541, YAN-1576)

The functionality of the cbf-emulator has also been extended to allow for more configurable data dimensions (YAN-1540) including dummy data as configurable dimensions and a test was written to exercise this (YAN-1552).

2) Check that strided spectral windows / channel definitions work throughout the pipeline as expected

This has been widely tested revealing some corner cases and other issues solved by YAN-1584, with more fixes required in YAN-1433 and YAN-1581.

3) Tackle currently known bottlenecks.

This work is still in progress - but it was deemed more practical to close out this feature and deal with more performance bottlenecks as they arise in PI 21. It is likely that there will be a follow-on feature.

Some preliminary profiling was performed in YAN-1574, which backed up our intuition that memory copies are the primary receiver bottleneck. It was also observed that aggregation might need to be tunable to maximise plasma usage (e.g. >1GB aggregated payloads in a 2GB plasma store only allows one in-flight RPC call).

Show
A large part of this effort consisted of developing and extending an “integration repository”. SKAO / Science Data Processor / SDP Realtime Receive Integration · GitLab This repository allows for automatic integration tests without complicated orchestration. The intent being to provide lightweight infrastructure that can be easily deployed by visibility receive developers for the first stages of ‘integration’ within the visibility receive component. Although there is probably a need to streamline the integration at a level higher than this. For example some early comments were made by the FO that new processing scripts could be adapted to provide this functionality — the decision was made to limit the scope of this particular feature to a new repository within the component. The Acceptance Criteria were satisfied within this framework Acceptance Criteria: 1) Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions. This is satisfied by a number of linked tickets that widen the capability of the integration repo including: Initial tickets to build the repo and move some integration tests from elsewhere in the visbility receive domain and consolidate them here (YAN-1511, YAN-1516, YAN-1517, YAN-1541, YAN-1576) The functionality of the cbf-emulator has also been extended to allow for more configurable data dimensions (YAN-1540) including dummy data as configurable dimensions and a test was written to exercise this (YAN-1552). 2) Check that strided spectral windows / channel definitions work throughout the pipeline as expected This has been widely tested revealing some corner cases and other issues solved by YAN-1584, with more fixes required in YAN-1433 and YAN-1581. 3) Tackle currently known bottlenecks. This work is still in progress - but it was deemed more practical to close out this feature and deal with more performance bottlenecks as they arise in PI 21. It is likely that there will be a follow-on feature. Some preliminary profiling was performed in YAN-1574, which backed up our intuition that memory copies are the primary receiver bottleneck. It was also observed that aggregation might need to be tunable to maximise plasma usage (e.g. >1GB aggregated payloads in a 2GB plasma store only allows one in-flight RPC call).

Requirement Status:

PI22 - UNCOVERED
Goals_MIRO:
Low-G1 Mid-G1

Description

The ingest pipeline has grown to be a complex system with many moving pieces and software components. With time we have characterised some of these components' performance, but not all of them. Similarly, for those systems where we understand their performance, we have only explored some of the data dimensions, leaving others unattended. For example, we have a good understanding of the performance of the visibility receiver in the time and frequency dimension, but haven't tested it much along the lines of number of antennas/baselines.

Who?

Ingest pipeline developer

What?

Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions.
- For said processing scripts, parameters should propagate in the usual fashion to to cbf-emulator/receiver/processors
- Define a mode in which the cbf-emulator package send dummy (zero) data for arbitrary sizes in all axes (channels, antennas/baselines, polarisations and time), without requiring an input Measurement Set.
- Enable the reception of arbitrary data by allowing to specify the arbitrary associated metadata in an easy way, without requiring external input artefacts like full ExecutionBlock definitions, antenna layouts or input Measurement Sets. (why not? execution blocks or antenna layouts are not particularly heavy, and the input MS part is covered by the previous point.)
Check that strided spectral windows / channel definitions work throughout the pipeline as expected
Tackle currently-known performance bottlenecks on the ingest pipeline.

Why?

Enhance our ability to produce and scale testing scenarios with ease.
Benchmark and profile the ingest pipeline, and its components, when scaling different data axes.
Characterise and understand the resource needs of the ingest pipeline as it currently stands for AA0.5 for the different known processors.
Exploring and understand the next bottlenecks of the ingest pipeline towards AA2.

Attachments

Structure

Activity

People

Assignee:: Wortmann, Peter

Reporter:: Ashdown, Mark

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Feature Progress

Story Point Burn-up: (100.00%)

Feature Estimate: 2.0

	Issues	Story Points
To Do	0	0.0
In Progress	0	0.0
Complete	6	11.0
Total	6	11.0

Dates

Created:: 09/Aug/23 1:25 PM

Updated:: 13/Feb/24 2:02 PM

Develop visibility receive benchmarks