Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4288

Add benchmark metrics into a dashboard

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      The SRC workloads Gitlab repository hosts workloads that are representative of what users will be running at SRC sites. The STARS script collects metrics (currently only processing duration) for the tasks, which are recorded on confluence. This feature would use existing monitoring infrastructure (likely Elastic + Grafana) to store the metric data output from STARS. This will be the first step towards enabling us to view in real time the performance of different SRC machines, a key component of the eventual compute tests described in the SRCNet 0.1 Implementation Plan.

      Show
      The SRC workloads Gitlab repository hosts workloads that are representative of what users will be running at SRC sites. The STARS script collects metrics (currently only processing duration) for the tasks, which are recorded on confluence. This feature would use existing monitoring infrastructure (likely Elastic + Grafana) to store the metric data output from STARS. This will be the first step towards enabling us to view in real time the performance of different SRC machines, a key component of the eventual compute tests described in the SRCNet 0.1 Implementation Plan.
    • Hide

      AC1: At least one task in the src-workloads repo has a means of verifying that the task completed successfully each time it is run (via checksum or hardcoded success criteria)

      AC2: Script is added to the src-workloads repo, or STARS is modified, such that when the task(s) in AC1 are run, the performance data is pushed (along with easily obtainable machine data) to a user-specified Elastic (or other NoSQL document store) DB.

      AC3: New dashboard added to existing Grafana instance to display results from task runs at different sites.

      Show
      AC1: At least one task in the src-workloads repo has a means of verifying that the task completed successfully each time it is run (via checksum or hardcoded success criteria) AC2: Script is added to the src-workloads repo, or STARS is modified, such that when the task(s) in AC1 are run, the performance data is pushed (along with easily obtainable machine data) to a user-specified Elastic (or other NoSQL document store) DB. AC3: New dashboard added to existing Grafana instance to display results from task runs at different sites.
    • 2.5
    • 2
    • 0
    • Team_MAGENTA
    • Sprint 5
    • Show
      AC1: Source finding task has an integration test script: https://gitlab.com/ska-telescope/src/src-workloads/-/blob/master/tasks/source-finding-pybdsf/scripts/test/integration-test-sourcefinding.sh   AC2/3: Workloads dashboard: https://monit.srcdev.skao.int/grafana/d/adqjpmxh900lcc/workload-tasks?orgId=1   Demo: https://drive.google.com/file/d/1Xp7zgH2f33wLxNWR1OP3DgRb2__txh2K/view?usp=drive_link See repo for documentation: https://gitlab.com/ska-telescope/src/src-workloads/-/tree/master/bench
    • 24.3
    • Stories Completed, Outcomes Reviewed, Satisfies Acceptance Criteria
    • PI24 - UNCOVERED

    • PI23 SRC23-PB SRCNet0.x example-workflows-and-benchmarks tests-compilation

    Description

      The deployment of new monitoring infrastructure has been removed from the previous version of this feature, since it is possible to use existing DB/dashboard instances.

      This feature is now to focus on the development needed within the src-workloads repository. Before automated performance metrics about task runs can be captured, it needs to be possible to automatically determine whether a task ran successfully (or failed). This can use e.g. hardcoded checksum validation (or even just verify that expected output files created successfully) for now.

      Once this has been done, the STARS script should be modified so that it can push the performance score, together with other easily obtainable (or user-set) metadata, to a running Elastic instance (or other document-type NoSQL DB). During local development this can use an ephemeral Elastic container.

      Finally, a simple dashboard view of this data can be added to an existing Grafana instance.

      Feature Point estimate sized to give time to learn about relevant technologies.

       

      Attachments

        Issue Links

          Structure

            Activity

              People

                Jesus.Salgado Salgado, Jesus
                A.Clarke Clarke, Alex
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 2.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1018.0
                  Total1018.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel