Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3626

Develop visibility receive benchmarks

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Enabler
    • Should have
    • PI20
    • COM SDP SW
    • None
    • Data Processing
    • Hide

      See Why?

      Show
      See Why?
    • Hide

      See What?

      Show
      See What?
    • 2
    • 2
    • 0
    • Team_YANDA
    • Sprint 5
    • Hide

      A large part of this effort consisted of developing and extending an “integration repository”. 
      SKAO / Science Data Processor / SDP Realtime Receive Integration · GitLab

      This repository allows for automatic integration tests without complicated orchestration. 
      The intent being to provide lightweight infrastructure that can be easily deployed by visibility receive developers for the first stages of ‘integration’ within the visibility receive component. Although there is probably a need to streamline the integration at a level higher than this. For example some early comments were made by the FO that new processing scripts could be adapted to provide this functionality — the decision was made to limit the scope of this particular feature to a new repository within the component.

      The Acceptance Criteria were satisfied within this framework

      Acceptance Criteria:

      1) Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions.

      This is satisfied by a number of linked tickets that widen the capability of the integration repo including:

      Initial tickets to build the repo and move some integration tests from elsewhere in the visbility receive domain and consolidate them here (YAN-1511, YAN-1516, YAN-1517, YAN-1541, YAN-1576)

      The functionality of the cbf-emulator has also been extended to allow for more configurable data dimensions (YAN-1540) including dummy data as configurable dimensions and a test was written to exercise this (YAN-1552).

      2) Check that strided spectral windows / channel definitions work throughout the pipeline as expected

      This has been widely tested revealing some corner cases and other issues solved by YAN-1584, with more fixes required in YAN-1433 and YAN-1581.

      3) Tackle currently known bottlenecks.

      This work is still in progress - but it was deemed more practical to close out this feature and deal with more performance bottlenecks as they arise in PI 21. It is likely that there will be a follow-on feature.

      Some preliminary profiling was performed in YAN-1574, which backed up our intuition that memory copies are the primary receiver bottleneck. It was also observed that aggregation might need to be tunable to maximise plasma usage (e.g. >1GB aggregated payloads in a 2GB plasma store only allows one in-flight RPC call).

      Show
      A large part of this effort consisted of developing and extending an “integration repository”.  SKAO / Science Data Processor / SDP Realtime Receive Integration · GitLab This repository allows for automatic integration tests without complicated orchestration.  The intent being to provide lightweight infrastructure that can be easily deployed by visibility receive developers for the first stages of ‘integration’ within the visibility receive component. Although there is probably a need to streamline the integration at a level higher than this. For example some early comments were made by the FO that new processing scripts could be adapted to provide this functionality — the decision was made to limit the scope of this particular feature to a new repository within the component. The Acceptance Criteria were satisfied within this framework Acceptance Criteria: 1) Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions. This is satisfied by a number of linked tickets that widen the capability of the integration repo including: Initial tickets to build the repo and move some integration tests from elsewhere in the visbility receive domain and consolidate them here (YAN-1511, YAN-1516, YAN-1517, YAN-1541, YAN-1576) The functionality of the cbf-emulator has also been extended to allow for more configurable data dimensions (YAN-1540) including dummy data as configurable dimensions and a test was written to exercise this (YAN-1552). 2) Check that strided spectral windows / channel definitions work throughout the pipeline as expected This has been widely tested revealing some corner cases and other issues solved by YAN-1584, with more fixes required in YAN-1433 and YAN-1581. 3) Tackle currently known bottlenecks. This work is still in progress - but it was deemed more practical to close out this feature and deal with more performance bottlenecks as they arise in PI 21. It is likely that there will be a follow-on feature. Some preliminary profiling was performed in YAN-1574, which backed up our intuition that memory copies are the primary receiver bottleneck. It was also observed that aggregation might need to be tunable to maximise plasma usage (e.g. >1GB aggregated payloads in a 2GB plasma store only allows one in-flight RPC call).
    • PI22 - UNCOVERED

    • Low-G1 Mid-G1

    Description

      The ingest pipeline has grown to be a complex system with many moving pieces and software components. With time we have characterised some of these components' performance, but not all of them. Similarly, for those systems where we understand their performance, we have only explored some of the data dimensions, leaving others unattended. For example, we have a good understanding of the performance of the visibility receiver in the time and frequency dimension, but haven't tested it much along the lines of number of antennas/baselines.

      Who?

      • Ingest pipeline developer

      What?

      • Write benchmark processing scripts that involve all the pieces of the full ingest pipeline, including: cbf-emulator, vis-receiver, plasma and one or more processors (including, at least, mswriter and qametrics). These benchmarks would work with dummy data, and be able to scale along the different data dimensions.
        • For said processing scripts, parameters should propagate in the usual fashion to to cbf-emulator/receiver/processors
        • Define a mode in which the cbf-emulator package send dummy (zero) data for arbitrary sizes in all axes (channels, antennas/baselines, polarisations and time), without requiring an input Measurement Set.
        • Enable the reception of arbitrary data by allowing to specify the arbitrary associated metadata in an easy way, without requiring external input artefacts like full ExecutionBlock definitions, antenna layouts or input Measurement Sets. (why not? execution blocks or antenna layouts are not particularly heavy, and the input MS part is covered by the previous point.)
      • Check that strided spectral windows / channel definitions work throughout the pipeline as expected
      • Tackle currently-known performance bottlenecks on the ingest pipeline.

      Why?

      • Enhance our ability to produce and scale testing scenarios with ease.
      • Benchmark and profile the ingest pipeline, and its components, when scaling different data axes.
      • Characterise and understand the resource needs of the ingest pipeline as it currently stands for AA0.5 for the different known processors.
      • Exploring and understand the next bottlenecks of the ingest pipeline towards AA2.

      Attachments

        Structure

          Activity

            People

              p.wortmann Wortmann, Peter
              m.ashdown Ashdown, Mark
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Feature Progress

                Story Point Burn-up: (100.00%)

                Feature Estimate: 2.0

                IssuesStory Points
                To Do00.0
                In Progress   00.0
                Complete611.0
                Total611.0

                Dates

                  Created:
                  Updated:

                  Structure Helper Panel