Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3133

Explore and implement options for handling failure scenarios

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Obs Mgt & Controls
    • Hide

      One or more approaches for handling failure scenarios are identified. 

      Applicability of the approach(s) for specific scenarios is understood.

       

      Show
      One or more approaches for handling failure scenarios are identified.  Applicability of the approach(s) for specific scenarios is understood.  
    • Hide

      The analysis of approaches for handling failure scenarios in TMC is documented. 

      Show
      The analysis of approaches for handling failure scenarios in TMC is documented. 
    • 3
    • 3
    • 0
    • Team_HIMALAYA, Team_SAHYADRI
    • Sprint 2
    • Hide
      • The implementation of failure handling (error propagation) and command timeout for the AssignResources command is merged into the tmc integration repo
      Show
      Proposal for failure handling in TMC https://confluence.skatelescope.org/display/SWSI/Failure+Handling+in+TMC Created POC based on BC ver 13 (use of longrunningcommandResult attribute) to report and propagate the exceptions in TMC Created a POC on implementing a timeout while processing a command. The POC is extended for AssignResources commands in all TMC nodes, thus providing end-to-end command timeout detection for one command. Relevant POC work MRs are as below: POC on Sdpleafnodes: https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-sdpleafnodes/-/merge_requests/336 POC on cspleafnodes: https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-cspleafnodes/-/merge_requests/56 POC on SubarrayNode: https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-subarraynode/-/merge_requests/87 POC on CentralNode: https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-centralnode/-/merge_requests/89 System Demo provided for the failure propagation in TMC and also the command timeout The implementation of failure handling (error propagation) and command timeout for the AssignResources command is merged into the tmc integration repo
    • 19.1
    • Stories Completed, Solution Intent Updated, Satisfies Acceptance Criteria, Accepted by FO

    Description

      The spike will explore options for handling failure scenarios in TMC that result into a subarray to go into an inconsistent state (e.g. TMC, SDP and CSP devices/systems report inconsistent observation states). This may occur due to exceptions detected and raised in segment systems, that are not recognized or handled by the TMC.
      The spike will look as aspects such as:
      1. Exception handling and propagation
      2. Reporting failures to the user
      3. Recovering from failures
      The spike shall come up with one or more approaches, which can be prototyped separately.

      Attachments

        Issue Links

          Structure

            Activity

              People

                Adam.Avison Avison, Adam
                m.patil Patil, Mangesh
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 3.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1655.0
                  Total1655.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel