Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4293

Develop the Storage Benchmarking functionality required for v0.1

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      There are multiple different storage options that can be used at SRCNet nodes. Given an understanding of storage performance (per SRC site and storage type) SRCNet will be enabled to plan and optimise storage utilisation (and identify future requirements).

      Whilst some storage performance tests are documented on Confluence, all SRCNet storage benchmarking tests need to be made reusable, and easily reproducible by all SRCs. 

      These tests have been mentioned in SRCNet documents :Implementation plan v0.1 and SRCNet v0.1 Node requirements.

      This feature is to enable existing tests to be run by the participating v0.1 sites as well as developing further storage benchmark tests so that we can build a collective understanding of SRCNet storage performance.

      Show
      There are multiple different storage options that can be used at SRCNet nodes. Given an understanding of storage performance (per SRC site and storage type) SRCNet will be enabled to plan and optimise storage utilisation (and identify future requirements). Whilst some storage performance tests are documented on Confluence, all SRCNet storage benchmarking tests need to be made reusable, and easily reproducible by all SRCs.  These tests have been mentioned in SRCNet documents : Implementation plan v0.1 and SRCNet v0.1 Node requirements . This feature is to enable existing tests to be run by the participating v0.1 sites as well as developing further storage benchmark tests so that we can build a collective understanding of SRCNet storage performance.
    • Hide

      AC 0: A set of storage benchmark workflow tests are identified and developed. These will give sufficient information to identify v0.1 storage requirements and optimisation opportunities.
      (profiling all of the v0.1 sites, actually identifying requirements and/or optimisation opportunities is not a requirement for PI23 - but likely will be for PI24)

      AC 1: Benchmark workflows are run (statistical performance data gathered) for at least 3 types of storage
      AC 2: Benchmark workflows are run (statistical performance data gathered) for at least 3 sites, belonging to at least 2 SRCs
      AC 3: The outcome of these benchmarks are shared with SRCNet participants.
      AC 4: The benchmark workflow tests, container images, container image definitions, and scripts for running the tests are made available via an appropriate SRCNet test repository, with usage guidelines included.

      (Container images need to not be .sif images, ie it should be possible to run them via any container engine, including but not limited to Singularity. They can be made available either in a new repo or they can be consolidated here: https://gitlab.com/ska-telescope/src/src-workloads/-/tree/master/bench?ref_type=heads The latter option would be in discussion with the maintainers of the repo)

      Show
      AC 0: A set of storage benchmark workflow tests are identified and developed. These will give sufficient information to identify v0.1 storage requirements and optimisation opportunities. (profiling all of the v0.1 sites, actually identifying requirements and/or optimisation opportunities is not a requirement for PI23 - but likely will be for PI24) AC 1: Benchmark workflows are run (statistical performance data gathered) for at least 3 types of storage AC 2: Benchmark workflows are run (statistical performance data gathered) for at least 3 sites, belonging to at least 2 SRCs AC 3: The outcome of these benchmarks are shared with SRCNet participants. AC 4: The benchmark workflow tests, container images, container image definitions, and scripts for running the tests are made available via an appropriate SRCNet test repository, with usage guidelines included. (Container images need to not be .sif images, ie it should be possible to run them via any container engine, including but not limited to Singularity. They can be made available either in a new repo or they can be consolidated here: https://gitlab.com/ska-telescope/src/src-workloads/-/tree/master/bench?ref_type=heads The latter option would be in discussion with the maintainers of the repo)
    • 3.5
    • 4.5
    • 0
    • Team_DAAC, Team_TEAL
    • Sprint 5
    • Hide

      AC0:  A set of storage benchmark workflow tests are identified and developed.

      AC1: Benchmark workflows are run (statistical performance data gathered) for at least 3 types of storage:

      AC 2: Benchmark workflows are run (statistical performance data gathered) for at least 3 sites, belonging to at least 2 SRCs

      • Benchmarks run in one site (DP3 run & IO500) and expected one more (espSRC).

      AC3: The outcome of these benchmarks are shared with SRCNet participants

      • TBC.

      AC 4: The benchmark workflow tests, container images, container image definitions, and scripts for running the tests are made available via an appropriate SRCNet test repository, with usage guidelines included.

       

      Show
      AC0:  A set of storage benchmark workflow tests are identified and developed. Main [documentation| https://confluence.skatelescope.org/x/OGVEEQ ]. AC1: Benchmark workflows are run (statistical performance data gathered) for at least 3 types of storage: DP3 and IO500 tests on 5 differents types of storage AC 2: Benchmark workflows are run (statistical performance data gathered) for at least 3 sites, belonging to at least 2 SRCs Benchmarks run in one site ( DP3 run & IO500 ) and expected one more (espSRC). AC3: The outcome of these benchmarks are shared with SRCNet participants TBC. AC 4: The benchmark workflow tests, container images, container image definitions, and scripts for running the tests are made available via an appropriate SRCNet test repository, with usage guidelines included. DP3 and IO500 code repositories  
    • PI24 - UNCOVERED

    • PI24-PB SRC23-PB SRCNet0.1 storage-tests team_DAAC team_TEAL tests-compilation
    • SPO-3479

    Description

      Benchmarks will be tested against a number of potential storage options, including Slurm BeeGFS and others to establish best practice for configuring storage at SRCNet nodes.

      One variety of tests can be CASA measurement set tests, on a single node, for serial read, serial write, parallel read and parallel write operations. These CASA tests will make use of a representative set of visibilities and image cubes, likely using a combination of publicly available LOFAR data and simulated SKA data sets.
      The initial version of those tests can be found here:https://confluence.skatelescope.org/display/SRCSC/Benchmarking

       

      Attachments

        Issue Links

          Structure

            Activity

              People

                M.Parra Parra, Manuel
                D.Watson Watson, Duncan
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (94.74%)

                  Feature Estimate: 3.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   11.0
                  Complete1518.0
                  Total1619.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel