Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3279

Verify and improve robustness of the TMC state machine implementation

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Obs Mgt & Controls
    • Hide

      By the end of the PI18 TMC LOW and MID will implement the full operational state and observing state machines.  The set of tests has to be reviewed and expanded  to verify that the state machine implementation is robust and can gracefully handle invalid and unexpected commands, as well as, misbehaving sub-systems. 

      TMC must be able to operate in presence of errors and failures, and correctly derive overall state of resources and subarrays.

      Why now

      By the end of the PI18 TMC LOW and MID will implement the full operational state and observing state machines, it is now time to verify that TMC implementation is robust and able to function in presence of errors and failures.

      Show
      By the end of the PI18 TMC LOW and MID will implement the full operational state and observing state machines.  The set of tests has to be reviewed and expanded  to verify that the state machine implementation is robust and can gracefully handle invalid and unexpected commands, as well as, misbehaving sub-systems.  TMC must be able to operate in presence of errors and failures, and correctly derive overall state of resources and subarrays. Why now :  By the end of the PI18 TMC LOW and MID will implement the full operational state and observing state machines, it is now time to verify that TMC implementation is robust and able to function in presence of errors and failures.
    • Hide

      Agreed that the XTP tests listed in attachment are the priority.

      The list the telescope level tests can be used as a guidance,  but TMC testing should not be limited by what is tested at the Telescope level, it should expand a range of tests to provide resiliency.

      Outcomes of this feature are: 

      1. TMC Tests.
      2. Documentation:  the list of tests cases implemented. It would be beneficial to add to the list  the test cases, that due to shortage of capacity, or other reasons,  cannot be implemented in PI18.
      3. If a test fails,  investigate and clearly document what does not work and why,  provide suggestions for improvements for TMC, other components and sub-systems, infrastructure, and test cases. 
      4. Report bugs, create stories, Features and Enablers as appropriate.

      Need to identify a set of tests that can be implemented during PI18.

      Suggested tests:

      I) TMC correctly handles unexpected commands, i.e.commands that cannot be executed in the current state/obsState.   Expected behaviour: TMC rejects the command, logs the reason why the command was rejected, remains in the same state and is able to successfully execute next command (or reject next command) as appropriate. In other words, rejection of a command does not leave lasting effect on TMC.   Note1: I would expect that a complete set of unit tests to verify this for all command/state/obsState combinations already exists.  It may be acceptable to have a reduced set of test cases for the integrated TMC.   Note2:  When the Base Classes are modified to accept and insert in the queue commands without checking against the current obsState, these tests will have to be updated, but that's OK.   The goal is to have resilient and reliable TMC at the end of PI18.

      II) TMC correctly handles invalid commands:

      a) JSON validation. Does TMC plan to upgrade to the latest version of the Base Classes that implements JSON validation in PI18?  If not then tests related to JSON validation scan be postponed till PI19.

      b) JSON string passes validation, but contains invalid or unsupported combination of parameter values,  parameters that are out of range, or other errors.  For each command identify the parameters that are verified by TMC and create the test cases to verify that TMC indeed detects the errors. When TMC detects  the error,  what does TMC do, which status is reported?  Does TMC perform state transition? Generate a log? Generate an event or alarm?  Provide summary of the implemented test cases, containing at least the following: command,  current TMC Controller and Subarray state/obsState/healthState, type of error (e.g. invalid Dish ID),  reject or accept, command status code returned, log, state transition (yes/no), state/obsState/health after the command execution ended.

      Implement TMC tests for:

      • JSON schema version
      • syntactic error - the command passes a malformed JSON script
      • syntactic error - parameter out of range (at least one test for each supported parameter)
      • semantics  - invalid/unsupported combination of parameter values (e.g. parameter value not supported for the specified observing mode)
      • semantics - an attempt to use an unavailable resource (not deployed or not assigned to the subarray - depending on the context - where applicable test for both).
      • semantics - an attempt to use a non existent resource (e.g. DISH not deployed, FSP not deployed, station not deployed).

      III) TMC accepts and correctly executes successive Configure commands. At this time we cannot  exercise the use of command queue;  send at least three configure commands one after the other using the same and different configuration scripts, but wait for each command to be completed   before issuing the next one.  Verify that the commands are accepted and correctly executed (and configuration is forwarded to the subsystems as expected). Verify that correct Delay Models are generated. (Himalaya)

      IV)  Abort - verify that the commands Abort and restart return the TMC in a state where it is fully operational and ready to process  further commands. In other words, Abort and Restart do not leave lingering effects.  Verify that the commands, events, state transitions are all logged and reported  to all registered clients.  If this is not that case, investigate and clearly document what does not work well and why,  provide suggestions for improvements for TMC, other components and sub-systems and test cases.  (Himalaya)

      Suggested test cases for Abort():

      1. Issue a command Abort while TMC Subarray obsState=EMPTY. Verify that TMC bahaviour is as expected.
      2. Issue a command Abort while TMC Subarray is in obsState=RESOURCING. After TMC Subarray  enters ABORTED, issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).  Create two Test Cases:  a) use exactly the same resources in both AssignResources commands.  b) use different resources (different dishes, different FSPs, and if possible different SDP resources).   
      3. Issue a command Abort while TMC Subarray is in obsState=RESOURCING. After TMC Subarray  enters ABORTING, issue a command Abort(). Verify that TMC behavior is as expected.  Verify that the 1st command Abort is successfully completed and the Subarray transitions to ABORTED. Issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).  
      4. Issue a command Abort while TMC Subarray obsState=IDLE. Verify that TMC bahaviour is as expected.
      5. Issue a command Abort while TMC Subarray  obsState=CONFIGURING.   After TMC Subarray  enters ABORTED, issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).   The purpose of this tests that scan can be successfully executed after Abort/Restart.
      6. Issue command Abort while TMC Subarray  obsState=READY.  Verify that TMC behavior is as expected. 
      7. Issue command Abort while TMC Subarray  obsState=SCANNING.  After TMC Subarray  enters ABORTED, issue Restart().   After TMC Subarray enters EMPTY,  issue a command AssignResources  a) use exactly the same resources.  b) use different resources (different dishes, different FSPs, and if possible different SDP resources).    Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()). The purpose of this tests that scan can be successfully executed after Abort/Restart. 
      8. Issue command Abort while TMC Subarray  obsState=ABORTING.  Verify that TMC behaviour is as expected. 
      9. Issue command Abort while TMC Subarray  obsState=RESTARTING. Verify that TMC bahaviour is as expected. 

       

       

      Show
      Agreed that the XTP tests listed in attachment are the priority. The list the telescope level tests can be used as a guidance,  but TMC testing should not be limited by what is tested at the Telescope level, it should expand a range of tests to provide resiliency. Outcomes of this feature are:  TMC Tests. Documentation:  the list of tests cases implemented. It would be beneficial to add to the list  the test cases, that due to shortage of capacity, or other reasons,  cannot be implemented in PI18. If a test fails,  investigate and clearly document what does not work and why,  provide suggestions for improvements for TMC, other components and sub-systems, infrastructure, and test cases.  Report bugs, create stories, Features and Enablers as appropriate. Need to identify a set of tests that can be implemented during PI18. Suggested tests: I) TMC correctly handles unexpected commands , i.e.commands that cannot be executed in the current state/obsState.   Expected behaviour: TMC rejects the command, logs the reason why the command was rejected, remains in the same state and is able to successfully execute next command (or reject next command) as appropriate. In other words, rejection of a command does not leave lasting effect on TMC.   Note1: I would expect that a complete set of unit tests to verify this for all command/state/obsState combinations already exists.  It may be acceptable to have a reduced set of test cases for the integrated TMC.   Note2:  When the Base Classes are modified to accept and insert in the queue commands without checking against the current obsState, these tests will have to be updated, but that's OK.   The goal is to have resilient and reliable TMC at the end of PI18. II) TMC correctly handles invalid commands : a) JSON validation. Does TMC plan to upgrade to the latest version of the Base Classes that implements JSON validation in PI18?  If not then tests related to JSON validation scan be postponed till PI19. b) JSON string passes validation, but contains invalid or unsupported combination of parameter values,  parameters that are out of range, or other errors.  For each command identify the parameters that are verified by TMC and create the test cases to verify that TMC indeed detects the errors. When TMC detects  the error,  what does TMC do, which status is reported?  Does TMC perform state transition? Generate a log? Generate an event or alarm?  Provide summary of the implemented test cases, containing at least the following: command,  current TMC Controller and Subarray state/obsState/healthState, type of error (e.g. invalid Dish ID),  reject or accept, command status code returned, log, state transition (yes/no), state/obsState/health after the command execution ended. Implement TMC tests for: JSON schema version syntactic error - the command passes a malformed JSON script syntactic error - parameter out of range (at least one test for each supported parameter) semantics  - invalid/unsupported combination of parameter values (e.g. parameter value not supported for the specified observing mode) semantics - an attempt to use an unavailable resource (not deployed or not assigned to the subarray - depending on the context - where applicable test for both). semantics - an attempt to use a non existent resource (e.g. DISH not deployed, FSP not deployed, station not deployed). III) TMC accepts and correctly executes successive Configure commands. At this time we cannot  exercise the use of command queue;  send at least three configure commands one after the other using the same and different configuration scripts, but wait for each command to be completed   before issuing the next one.  Verify that the commands are accepted and correctly executed (and configuration is forwarded to the subsystems as expected). Verify that correct Delay Models are generated. (Himalaya) IV)  Abort - verify that the commands Abort and restart return the TMC in a state where it is fully operational and ready to process  further commands. In other words, Abort and Restart do not leave lingering effects.  Verify that the commands, events, state transitions are all logged and reported  to all registered clients.  If this is not that case, investigate and clearly document what does not work well and why,  provide suggestions for improvements for TMC, other components and sub-systems and test cases.  (Himalaya) Suggested test cases for Abort(): Issue a command Abort while TMC Subarray obsState=EMPTY. Verify that TMC bahaviour is as expected. Issue a command Abort while TMC Subarray is in obsState=RESOURCING. After TMC Subarray  enters ABORTED, issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).  Create two Test Cases:  a) use exactly the same resources in both AssignResources commands.  b) use different resources (different dishes, different FSPs, and if possible different SDP resources).    Issue a command Abort while TMC Subarray is in obsState=RESOURCING. After TMC Subarray  enters ABORTING, issue a command Abort(). Verify that TMC behavior is as expected.  Verify that the 1st command Abort is successfully completed and the Subarray transitions to ABORTED. Issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).   Issue a command Abort while TMC Subarray obsState=IDLE. Verify that TMC bahaviour is as expected. Issue a command Abort while TMC Subarray  obsState=CONFIGURING.   After TMC Subarray  enters ABORTED, issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).   The purpose of this tests that scan can be successfully executed after Abort/Restart. Issue command Abort while TMC Subarray  obsState=READY.  Verify that TMC behavior is as expected.  Issue command Abort while TMC Subarray  obsState=SCANNING.  After TMC Subarray  enters ABORTED, issue Restart().   After TMC Subarray enters EMPTY,  issue a command AssignResources  a) use exactly the same resources.  b) use different resources (different dishes, different FSPs, and if possible different SDP resources).    Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()). The purpose of this tests that scan can be successfully executed after Abort/Restart.  Issue command Abort while TMC Subarray  obsState=ABORTING.  Verify that TMC behaviour is as expected.  Issue command Abort while TMC Subarray  obsState=RESTARTING. Verify that TMC bahaviour is as expected.     
    • 4.5
    • 4.5
    • 10
    • 2.222
    • Team_SAHYADRI
    • Sprint 3
    • Hide

      XTP-20322 is discarded by Gerhard. 
       

      Show
      Implemented the listed negative test case scenarios  --TMC handles unexpected and invalid commands --TMC is able to perform successive configure commands --TMC is able to perform multiple scans commands --TMC is able to perform Scan after failed attempt of AssignResources command XTP-20320 --TMC is able to perform Scan after failed attempt of Configure command XTP-20321 etc The related MRs that are merged into ska-tmc-integration as below: https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-integration/-/merge_requests/71 https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-integration/-/merge_requests/72 https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-integration/-/merge_requests/69 https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-integration/-/merge_requests/66 https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-integration/-/merge_requests/67 https://gitlab.com/ska-telescope/ska-tmc/ska-tmc-integration/-/merge_requests/61 System Demo provided for the TMC robustness and the test cases developed  The implementation and utilization of the command queue is analyzed. Some limitations with queuing multiple commands at a time are found. The findings are documented at:  https://confluence.skatelescope.org/display/SE/Command+Queue+Implementation Based on the finding, the ska base classes need to be updated, the work will be planned separately.   XTP-20322 is discarded by Gerhard.   
    • 19.3
    • Stories Completed, Integrated, Solution Intent Updated, Outcomes Reviewed, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO

    Description

      The list the telescope level tests can be used as a guidance,  but TMC testing should not be limited by what is tested at the Telescope level, it should expand  the range of tests to provide resiliency.

      Agreed that the XTP tests listed in attachment are the priority.

      The list the telescope level tests can be used as a guidance,  but TMC testing should not be limited by what is tested at the Telescope level, it should expand a range of tests to provide resiliency.

      Outcomes of this feature are: 

      1. TMC Tests.
      2. Documentation:  the list of tests cases implemented. It would be beneficial to add to the list  the test cases, that due to shortage of capacity, or other reasons,  cannot be implemented in PI18.
      3. If a test fails,  investigate and clearly document what does not work and why,  provide suggestions for improvements for TMC, other components and sub-systems, infrastructure, and test cases. 
      4. Report bugs, create stories, Features and Enablers as appropriate.

      Suggested tests:

      I) TMC correctly handles unexpected commands, i.e.commands that cannot be executed in the current state/obsState.   Expected behaviour: TMC rejects the command, logs the reason why the command was rejected, remains in the same state and is able to successfully execute next command (or reject next command) as appropriate. In other words, rejection of a command does not leave lasting effect on TMC.   Note1: I would expect that a complete set of unit tests to verify this for all command/state/obsState combinations already exists.  It may be acceptable to have a reduced set of test cases for the integrated TMC.   Note2:  When the Base Classes are modified to accept and insert in the queue commands without checking against the current obsState, these tests will have to be updated, but that's OK.   The goal is to have resilient and reliable TMC at the end of PI18.

      II) TMC correctly handles invalid commands:

      a) JSON validation. Does TMC plan to upgrade to the latest version of the Base Classes that implements JSON validation in PI18?  If not then tests related to JSON validation scan be postponed till PI19.

      b) JSON string passes validation, but contains invalid or unsupported combination of parameter values,  parameters that are out of range, or other errors.  For each command identify the parameters that are verified by TMC and create the test cases to verify that TMC indeed detects the errors. When TMC detects  the error,  what does TMC do, which status is reported?  Does TMC perform state transition? Generate a log? Generate an event or alarm?  Provide summary of the implemented test cases, containing at least the following: command,  current TMC Controller and Subarray state/obsState/healthState, type of error (e.g. invalid Dish ID),  reject or accept, command status code returned, log, state transition (yes/no), state/obsState/health after the command execution ended.

      Implement TMC tests for:

      • JSON schema version
      • syntactic error - the command passes a malformed JSON script
      • syntactic error - parameter out of range (at least one test for each supported parameter)
      • semantics  - invalid/unsupported combination of parameter values (e.g. parameter value not supported for the specified observing mode)
      • semantics - an attempt to use an unavailable resource (not deployed or not assigned to the subarray - depending on the context - where applicable test for both).
      • semantics - an attempt to use a non existent resource (e.g. DISH not deployed, FSP not deployed, station not deployed).

      III) TMC accepts and correctly executes successive Configure commands. At this time we cannot  exercise the use of command queue;  send at least three configure commands one after the other using the same and different configuration scripts, but wait for each command to be completed   before issuing the next one.  Verify that the commands are accepted and correctly executed (and configuration is forwarded to the subsystems as expected). Verify that correct Delay Models are generated. (Himalaya)

      IV)  Abort - verify that the commands Abort and restart return the TMC in a state where it is fully operational and ready to process  further commands. In other words, Abort and Restart do not leave lingering effects.  Verify that the commands, events, state transitions are all logged and reported  to all registered clients.  If this is not that case, investigate and clearly document what does not work well and why,  provide suggestions for improvements for TMC, other components and sub-systems and test cases.  (Himalaya)

      Suggested test cases for Abort():

      1. Issue a command Abort while TMC Subarray obsState=EMPTY. Verify that TMC bahaviour is as expected.
      2. Issue a command Abort while TMC Subarray is in obsState=RESOURCING. After TMC Subarray  enters ABORTED, issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).  Create two Test Cases:  a) use exactly the same resources in both AssignResources commands.  b) use different resources (different dishes, different FSPs, and if possible different SDP resources).   
      3. Issue a command Abort while TMC Subarray is in obsState=RESOURCING. After TMC Subarray  enters ABORTING, issue a command Abort(). Verify that TMC behavior is as expected.  Verify that the 1st command Abort is successfully completed and the Subarray transitions to ABORTED. Issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).  
      4. Issue a command Abort while TMC Subarray obsState=IDLE. Verify that TMC bahaviour is as expected.
      5. Issue a command Abort while TMC Subarray  obsState=CONFIGURING.   After TMC Subarray  enters ABORTED, issue Restart().  After TMC Subarray enters EMPTY,  issue a command AssignResources. Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()).   The purpose of this tests that scan can be successfully executed after Abort/Restart.
      6. Issue command Abort while TMC Subarray  obsState=READY.  Verify that TMC behavior is as expected. 
      7. Issue command Abort while TMC Subarray  obsState=SCANNING.  After TMC Subarray  enters ABORTED, issue Restart().   After TMC Subarray enters EMPTY,  issue a command AssignResources  a) use exactly the same resources.  b) use different resources (different dishes, different FSPs, and if possible different SDP resources).    Verify that the command AssignResources is correctly executed, then issue command Configure(), and execute a scan (Scan(), endScan()). The purpose of this tests that scan can be successfully executed after Abort/Restart. 
      8. Issue command Abort while TMC Subarray  obsState=ABORTING.  Verify that TMC behaviour is as expected. 
      9. Issue command Abort while TMC Subarray  obsState=RESTARTING. Verify that TMC bahaviour is as expected. 

      If there is time:

      V) Command Restart() - Verify that the command Restart is correctly handled in all states, and that TMC functions without any issues after Restart is rejected.

      VI) obsState=FAULT -  Test cases that bring TMC Subarray in obsState=FAULT and then verify that TMC can return to obsState=EMPTY and continue to function correctly - are such tests already implemented.  Identify when does TMC  Subarray transition to FAULT ? 

      VII) state=FAILED - what  does TMC transition to FAILED?  Do we have tests to verify related bahavior?

       

      Attachments

        Issue Links

          Structure

            Activity

              People

                s.vrcic Vrcic, Sonja
                s.vrcic Vrcic, Sonja
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (93.33%)

                  Feature Estimate: 4.5

                  IssuesStory Points
                  To Do13.0
                  In Progress   00.0
                  Complete1342.0
                  Total1445.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel