Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-1259

Establish resource requirements for each application workload in terms of CPU, Memory, ephemeral storage, and persistent storage

Details

    • Enabler
    • Not Assigned
    • PI8
    • None
    • None
    • Services
    • Hide

      In order to migrate testing and integration workloads away from isolated environments, it is necessary to establish what the resourcing requirements are so that the shared services can be sized accordingly.

      Show
      In order to migrate testing and integration workloads away from isolated environments, it is necessary to establish what the resourcing requirements are so that the shared services can be sized accordingly.
    • Hide

      Each team will provide resource estimates for each containerised application delivered that encompasses:

      • CPU (millisecs)
      • Memory (MB/GB)
      • Ephemeral storage (temporary storage used by running application)
      • Persistent Storage (Block, Network Filesystem and object storage)
      • any additional requirements such as GPUs, high performance networks etc

      values and helm charts associated with Skampi are updated to reflect these estimates (requests, and limits for CPU, Mem, and Ephemeral storage)

      Show
      Each team will provide resource estimates for each containerised application delivered that encompasses: CPU (millisecs) Memory (MB/GB) Ephemeral storage (temporary storage used by running application) Persistent Storage (Block, Network Filesystem and object storage) any additional requirements such as GPUs, high performance networks etc values and helm charts associated with Skampi are updated to reflect these estimates (requests, and limits for CPU, Mem, and Ephemeral storage)
    • 4
    • 4
    • 0
    • Team_BUTTONS, Team_CIPA, Team_CREAM, Team_KAROO, Team_MCCS, Team_NCRA, Team_PERENTIE, Team_SYSTEM
    • Sprint 5
    • Hide

      NCRA Team: Application resource requirements were logged and provided to System Team
      Resource estimates ar captured in following spreadsheet:
      https://docs.google.com/spreadsheets/d/14agPTjY88eqjL3rBuxHxnPwPPND3QmawoT7hXScyCdk/edit#gid=0
      Resource stats are merged in SKAMPI master through MR https://gitlab.com/ska-telescope/skampi/-/merge_requests/106.
      Later we updated cpu (request tag) usage through MR https://gitlab.com/ska-telescope/skampi/-/merge_requests/116 to resolve the Corba exception in AssignResources on Central Node.

      CREAM Team:

      Values for CSP.LMC:
      Reference stories are CT-111 and CT-112. Outcomes are summarised here for convenience.
      CT-111 See the document linked to the story - also linked here
      https://www.dropbox.com/s/aif3n5vh3afyd6t/CT-111-k8s-resources.pdf?dl=0

      The k8s resources values have been set into the CSP.LMC and MID.CBF helm charts (see links below)
      https://gitlab.com/ska-telescope/skampi/-/blob/master/charts/skampi/charts/csp-proto/values.yaml
      https://gitlab.com/ska-telescope/skampi/-/blob/master/charts/skampi/charts/cbf-proto/values.yaml

      The updated charts have been added to the skampi master repo and tests run with success (see link below)
      https://gitlab.com/ska-telescope/skampi/-/pipelines/188670445

      Values for WebJive Suite:
      Reference stories are: CT-114, CT-131, CT-132, CT-133 see comments to the stories.
      Merge request:
      https://gitlab.com/ska-telescope/skampi/-/merge_requests/114
      Chart: https://gitlab.com/ska-telescope/skampi/-/blob/master/charts/skampi/charts/webjive/values.yaml

      Buttons Team:
      Summary provided in comments of AT2-563
      Merge request: https://gitlab.com/ska-telescope/observation-execution-tool/-/merge_requests/49

      Perentie Team:
      As our Tango devices are not yet integrated in skampi, we have not measured usage. We have applied 'default' values from tango-example to our charts, and will update once integrated with skampi and we can use the monitoring tools there to measure actual resource usage. (See also AT6-661)

      MCCS Team
      We initially set out to establish metrics as requested by this feature via MCCS-152. A Confluence page records the process we followed to establish our values (https://confluence.skatelescope.org/display/SE/MCCS-152+Analyse+MCCS+k8s+metrics+to+add+resource+settings). However once this was done we found it difficult to exercise MCCS to provide realistic values due to many areas of the software being relatively immature (this is only our second PI). Following advice MCCS-207 was created and implemented to effectively refactor the MCCS helm chart to use the official tango-util library. This provided a set of defaults which were appropriate at this stage to fulfil what was required of MCCS by this feature.

      System Team
      Applied ResourceQuotas automatically to Skampi Namespaces, and developed LimitRanges (defaults) that enable Skampi to run+test in it's current form.

      Show
      NCRA Team: Application resource requirements were logged and provided to System Team Resource estimates ar captured in following spreadsheet: https://docs.google.com/spreadsheets/d/14agPTjY88eqjL3rBuxHxnPwPPND3QmawoT7hXScyCdk/edit#gid=0 Resource stats are merged in SKAMPI master through MR https://gitlab.com/ska-telescope/skampi/-/merge_requests/106 . Later we updated cpu (request tag) usage through MR https://gitlab.com/ska-telescope/skampi/-/merge_requests/116 to resolve the Corba exception in AssignResources on Central Node. CREAM Team: Values for CSP.LMC: Reference stories are CT-111 and CT-112. Outcomes are summarised here for convenience. CT-111 See the document linked to the story - also linked here https://www.dropbox.com/s/aif3n5vh3afyd6t/CT-111-k8s-resources.pdf?dl=0 The k8s resources values have been set into the CSP.LMC and MID.CBF helm charts (see links below) https://gitlab.com/ska-telescope/skampi/-/blob/master/charts/skampi/charts/csp-proto/values.yaml https://gitlab.com/ska-telescope/skampi/-/blob/master/charts/skampi/charts/cbf-proto/values.yaml The updated charts have been added to the skampi master repo and tests run with success (see link below) https://gitlab.com/ska-telescope/skampi/-/pipelines/188670445 Values for WebJive Suite: Reference stories are: CT-114, CT-131, CT-132, CT-133 see comments to the stories. Merge request: https://gitlab.com/ska-telescope/skampi/-/merge_requests/114 Chart: https://gitlab.com/ska-telescope/skampi/-/blob/master/charts/skampi/charts/webjive/values.yaml Buttons Team: Summary provided in comments of AT2-563 Merge request: https://gitlab.com/ska-telescope/observation-execution-tool/-/merge_requests/49 Perentie Team: As our Tango devices are not yet integrated in skampi, we have not measured usage. We have applied 'default' values from tango-example to our charts, and will update once integrated with skampi and we can use the monitoring tools there to measure actual resource usage. (See also AT6-661) MCCS Team We initially set out to establish metrics as requested by this feature via MCCS-152. A Confluence page records the process we followed to establish our values ( https://confluence.skatelescope.org/display/SE/MCCS-152+Analyse+MCCS+k8s+metrics+to+add+resource+settings ). However once this was done we found it difficult to exercise MCCS to provide realistic values due to many areas of the software being relatively immature (this is only our second PI). Following advice MCCS-207 was created and implemented to effectively refactor the MCCS helm chart to use the official tango-util library. This provided a set of defaults which were appropriate at this stage to fulfil what was required of MCCS by this feature. System Team Applied ResourceQuotas automatically to Skampi Namespaces, and developed LimitRanges (defaults) that enable Skampi to run+test in it's current form.
    • 9.3
    • Stories Completed, Integrated, BDD Testing Passes (no errors), Outcomes Reviewed, NFRS met, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO

    Description

      In order to migrate testing and integration workloads away from isolated environments, it is necessary to establish what the resourcing requirements are so that the shared services can be sized accordingly.

      Each team will provide resource estimates for each containerised application delivered that encompasses:

      • CPU (millisecs)
      • Memory (MB/GB)
      • Ephemeral storage (temporary storage used by running application)
      • Persistent Storage (Block, Network Filesystem and object storage)
      • any additional requirements such as GPUs, high performance networks etc

      Attachments

        Issue Links

          Structure

            Activity

              People

                a.bridger Bridger, Alan
                s.valame Valame, Snehal
                Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 4.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1927.300001
                  Total1927.300001

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel