Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-1048

Assess performance latency needed to launch a containerised (receive) workflow using k8s

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Services
    • Hide

      There is still concern & unresolved risk that the launching a set of containers for a time critical workflow (such as receive) incurs more latency than the current design allows. Making an initial quantitative assessment which can be repeated and scaled in future is therefore an important step towards mitigating this risk.

      Show
      There is still concern & unresolved risk that the launching a set of containers for a time critical workflow (such as receive) incurs more latency than the current design allows. Making an initial quantitative assessment which can be repeated and scaled in future is therefore an important step towards mitigating this risk.
    • Hide

      < To be discussed with the FO at PI planning>

      • Benchmark scripts added to suitable repo in SKA Gitlab
      • Benchmark results captured on confluence and socialised with stakeholders at a system demo.
      Show
      < To be discussed with the FO at PI planning> Benchmark scripts added to suitable repo in SKA Gitlab Benchmark results captured on confluence and socialised with stakeholders at a system demo.
    • 2
    • 2
    • 6.5
    • Team_PLANET
    • Sprint 5
    • 7.5
    • PI24 - UNCOVERED

    • Team_PLANET
    • SPO-575

    Description

      Assess performance latency needed to launch a containerised (receive) workflow using k8s eg. "Launch" here means the time until we know receive addresses and have reserved all required resources (especially buffer) to the point where SDP can indicate that it is okay to proceed with the scheduling block. This feature is basically about whether we can do this through Kubernetes or need an intermediate reservation/allocation process of our own to buffer delays.

      Ultimate goal might be to test this for 500 nodes, ~6PB of storage but we should scale incrementally towards this. This is related to ADR-8, we would ideally like a <5second response time for the assign resources command (might become an NFR).

      Attachments

        Issue Links

          Structure

            Activity

              People

                m.deegan Deegan, Miles
                b.mort Mort, Ben
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (41.94%)

                  Feature Estimate: 2.0

                  IssuesStory Points
                  To Do39.0
                  In Progress   00.0
                  Complete36.5
                  Total615.5

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel