Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4250

Adapt UKSRC profiling tool for quantitative and analytical feedback

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      Validating the efficacy of the UKSRC profiling tool on SRCNet workloads, provides a demonstration of how all SRCs can easily attain information on computing usage for their workloads. For workload developers, this can assist in identifying where optimisations can be found within a workload.

      Collecting the profiling data and analysis of the multiple workloads then also informs hardware procurement, as identifying the most common bottlenecks across workloads highlights what hardware would be most beneficial to purchase. Additionally, the knowledge of the hardware usage of various workloads informs users on which systems could run the desired workloads without requesting excessive amounts of hardware the workload can't use effectively.

      The outcome from this feature will inform the compilation of Infrastructure Comput tests for V0.1

      Show
      Validating the efficacy of the UKSRC profiling tool on SRCNet workloads, provides a demonstration of how all SRCs can easily attain information on computing usage for their workloads. For workload developers, this can assist in identifying where optimisations can be found within a workload. Collecting the profiling data and analysis of the multiple workloads then also informs hardware procurement, as identifying the most common bottlenecks across workloads highlights what hardware would be most beneficial to purchase. Additionally, the knowledge of the hardware usage of various workloads informs users on which systems could run the desired workloads without requesting excessive amounts of hardware the workload can't use effectively. The outcome from this feature will inform the compilation of Infrastructure Comput tests for V0.1
    • Hide

      1. Demonstrate UKSRC profiling tool to measure an SRCNet repository workload.

      2. Measure STARScore of UKSRC / SPSRC resources (Where possible)

      Show
      1. Demonstrate UKSRC profiling tool to measure an SRCNet repository workload. 2. Measure STARScore of UKSRC / SPSRC resources (Where possible)
    • Intra Program
    • 2
    • 2
    • 0
    • Team_CORAL
    • Sprint 5
    • PI23 - UNCOVERED

    • SRCNet0.1 compute-tests tests-compilation

    Description

      In order to measure hardware usage of the workflows used by demonstrator cases in the UKSRC, the PyProfQueue package was created. This package takes in bash scripts which it uses to submit jobs to HPC queuing systems with the needed declarations and initialisation to track hardware usage with Prometheus and Node Exporter, as well as a performance measure using Likwid. The hardware usage can be used to understand the resources that are required by a workflow, and track the Hardware usage through out the lifetime of the workload execution.

      To test the robustness of this package, and adapt it to the wider communities needs, we aim to use this package with SRCNet repository workloads. By doing this, we can validate if it performs as expected on a wider range of workloads, and change it where it falls short, but we can also learn more about the metrics that other workload developers may require or desire. By combining this with a STARScore measure of UKSRC resources, we can centralise the information of SRC owned or sponsored hardware that is available in the UK, with the aim of simultaneously providing feedback for the development of the STARS suite.

      Attachments

        Issue Links

          Structure

            Activity

              People

                Robert.Perry Perry, Robert
                M.Keil Keil, Marcus
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (35.00%)

                  Feature Estimate: 2.0

                  IssuesStory Points
                  To Do210.0
                  In Progress   13.0
                  Complete47.0
                  Total720.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel