Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-1585

Standardise SDP processing performance benchmarking

Details

    • Enabler
    • Not Assigned
    • PI10
    • COM SDP SW
    • None
    • Services
    • Hide

      The SDP hardware and software environment is expected to be continously evolving, so we need to make sure that we are able to track their performance on typical SDP tasks.

      Show
      The SDP hardware and software environment is expected to be continously evolving, so we need to make sure that we are able to track their performance on typical SDP tasks.
    • Hide
      • Automated pipeline/script that can establish the performance and scaling properties of a radio-astronomy related benchmark (like the imaging I/O test)
        • Onboard benchmark (Repository is included in the SKA GitLab organisation. Code matches coding guidelines Basic unit/regression/integration tests implemented in GitLab CI/CD pipeline. Code documentation matches SKA guidelines)
        • Shed some light on the scalability of the prototype beyond 16 nodes (~100 nodes).
        • Relative performance of the parallel file systems (LUSTRE and GPFS?).
      • Identify requirements for and demonstrate portability across different HPC environments - to this end we might want to show support for e.g. SLURM and/or containerisation
      • Show how we can systematically compare cluster environments and benchmark configurations using this framework
      Show
      Automated pipeline/script that can establish the performance and scaling properties of a radio-astronomy related benchmark (like the imaging I/O test) Onboard benchmark (Repository is included in the SKA GitLab organisation. Code matches coding guidelines Basic unit/regression/integration tests implemented in GitLab CI/CD pipeline. Code documentation matches SKA guidelines) Shed some light on the scalability of the prototype beyond 16 nodes (~100 nodes). Relative performance of the parallel file systems (LUSTRE and GPFS?). Identify requirements for and demonstrate portability across different HPC environments - to this end we might want to show support for e.g. SLURM and/or containerisation Show how we can systematically compare cluster environments and benchmark configurations using this framework
    • 5
    • 5
    • 6.6
    • Team_PLANET
    • Sprint 5
    • Hide
      Show
      The imaging-iotest code is onboarded into SKA Gitlab repository -> https://gitlab.com/ska-telescope/sdp/ska-sdp-exec-iotest Scalability tests are performed upto 128 nodes on LUSTRE and IBM Spectrum Scale file systems. Results are documented at https://confluence.skatelescope.org/pages/viewpage.action?pageId=142970142 . The prototype is containerised using Singularity and image building is integrated into CI pipeline -> https://gitlab.com/ska-telescope/sdp/ska-sdp-exec-iotest/container_registry/1884364 The prototype can be run on different clusters using the orchestrator scripts developed in Python. The repository is hosted at https://gitlab.com/ska-telescope/platform-scripts/-/tree/master/ska-sdp-benchmark-suite . Although the scripts include only imaging IO test at the moment, any radio astronomy pipeline can be integrated into it to run it on HPC platforms. A toolkit to monitor the performance metrics as pipelines run in an HPC environment is implemented -> https://gitlab.com/ska-telescope/platform-scripts/-/tree/master/ska-sdp-monitor-cpu-metrics . This can be used on any HPC machine that uses batch schedulers like SLURM, PBS, OAR to submit jobs. Demonstrations of orchestrator scripts and monitor toolkits have been made in Systems demos.
    • 17.4
    • Stories Completed, Integrated, Outcomes Reviewed, NFRS met, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO
    • PI22 - UNCOVERED

    • SPO-1002

    Description

      Sibling feature to SP-1548, but focused on hardware+software performance testing: Just as we need to establish and maintain the scientific performance of our pipelines, it is equally important that we stay on top of computational performance. After all, given the intense amount of data the SDP is meant to ingest, falling behind in processing is equivalent to data loss, so we will need high assurances that we can finish in an allotted time.

      The goal for this feature is to establish a first prototype of infrastructure that will allow us to track and model performance of our workflows.

      Attachments

        Issue Links

          Structure

            Activity

              People

                p.wortmann Wortmann, Peter
                M.Paipuri Paipuri, Mahendra [X] (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (89.58%)

                  Feature Estimate: 5.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   15.0
                  Complete2343.0
                  Total2448.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel