Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-1716

MCCS: Plan to, and/or "fix" instability caused by long-running commands on real hardware

Details

    • Obs Mgt & Controls
    • Hide

      Links to the documentation and code repository provided in the field Outcomes.
      If the test suite has been created or updated please provide the link(s) for the repository.

      Show
      Links to the documentation and code repository provided in the field Outcomes. If the test suite has been created or updated please provide the link(s) for the repository.
    • 1
    • 1
    • 8
    • Team_MCCS
    • Sprint 5
    • Hide

      This outcome is shared by SP-1716 & SP-1717

      SP-1716 aimed to understand what was causing instabilities in the MCCS code when executing 'long-running'commands. Then when we understood this, find a way to address this so that commands did not time out before completion. This was pretty well understood at the end of PI10 so the 'fix' plan was ready to go.

      SP-1717 aimed to adopt the most recent version (0.11) of the tango_base_classes. In many ways this was a revolutionary step because it was a means to push forward the adoption of asyncrhonisity in code execution, an initiative not only in MCCS but for all of the SKAO OMC ART.

      Crossover. The first instance of feature cross over between SP-1717 & SP-1716 was that because v0.11 tango_base_classes introduced asynchronisty, to a great extent time out during command execution was no longer a problem, in that a sequence of events required for a command to execute would in turn start and then complete before moving onto the next in the chain.

      As a result the implementing work of SP-1716 became a redunant action until a point at which SP-1717 completed and we could evaluate if further action was required or desired.Therefore during SP-1716 refocused towards supporting the KAROO team in their forming of an understanding and documentation the Reference Design and Implementation for asynchronous (long-running) commands SP-1640.

      It was an aspiration to demonstrate the refactored MCCS software on real hardware. However it was not possible in PI11 as there was insufficient time to made the required changes to the TPM drivers following the release of the 'post MCCS-400' v0.11 tango_base_classes adoption.

      Highlights of support activites:
      Drew:

      Ross
      We have offered comments on the long-running command Confluence page. We also attended and contributed to the Tango CoP that was exclusively discussing the Karoo teams design. I had a 1-2-1 meeting with Paul Swart to discuss the MCCS desigtn and implementation whereby we send messages back to requestor Tango devices rather than wait for and action events from subservient devices.

      Review comments were given for the Karoo implementation of long-running command on this merge request: https://gitlab.com/ska-telescope/ska-tango-examples/-/merge_requests/197

      There were no directly related code comits or tests asociated with this feature work made to https://gitlab.com/ska-telescope/ska-low-mccs/

      Show
      This outcome is shared by SP-1716 & SP-1717 SP-1716 aimed to understand what was causing instabilities in the MCCS code when executing 'long-running'commands. Then when we understood this, find a way to address this so that commands did not time out before completion. This was pretty well understood at the end of PI10 so the 'fix' plan was ready to go. SP-1717 aimed to adopt the most recent version (0.11) of the tango_base_classes. In many ways this was a revolutionary step because it was a means to push forward the adoption of asyncrhonisity in code execution, an initiative not only in MCCS but for all of the SKAO OMC ART. Crossover. The first instance of feature cross over between SP-1717 & SP-1716 was that because v0.11 tango_base_classes introduced asynchronisty, to a great extent time out during command execution was no longer a problem, in that a sequence of events required for a command to execute would in turn start and then complete before moving onto the next in the chain. As a result the implementing work of SP-1716 became a redunant action until a point at which SP-1717 completed and we could evaluate if further action was required or desired.Therefore during SP-1716 refocused towards supporting the KAROO team in their forming of an understanding and documentation the Reference Design and Implementation for asynchronous (long-running) commands SP-1640 . It was an aspiration to demonstrate the refactored MCCS software on real hardware. However it was not possible in PI11 as there was insufficient time to made the required changes to the TPM drivers following the release of the 'post MCCS-400' v0.11 tango_base_classes adoption. Highlights of support activites: Drew: Reviewed SKA Tango Base design to support long running commands proposal at: https://confluence.skatelescope.org/display/SWSI/SKA+Tango+Base+design+to+support+long+running+commands . See comments section for feedback provided under this story. Contribution toward discussions recorded on #ska-base-classes Enguaged in discussion on Karoo's proposal at cop-tango meeting 2021-07-12 Drew commented that "My overall feeling is that we have had a substantial positive influence on the thinking/design." The MCCS approach to message queue presented followed by discussion, at CoP Tango on 10 August. Notes can be viewed at: ( https://confluence.skatelescope.org/display/SE/TANGO+CoP+Meeting+%2321+-+2021-08-10 ) Involved in follow-up discussion at MCP meeting on 12 August, notes are captured here: ( https://confluence.skatelescope.org/display/SE/2021-08-12+MVP+Meeting ). Ross We have offered comments on the long-running command Confluence page. We also attended and contributed to the Tango CoP that was exclusively discussing the Karoo teams design. I had a 1-2-1 meeting with Paul Swart to discuss the MCCS desigtn and implementation whereby we send messages back to requestor Tango devices rather than wait for and action events from subservient devices. Review comments were given for the Karoo implementation of long-running command on this merge request: https://gitlab.com/ska-telescope/ska-tango-examples/-/merge_requests/197 There were no directly related code comits or tests asociated with this feature work made to https://gitlab.com/ska-telescope/ska-low-mccs/
    • 11.6
    • Stories Completed, Outcomes Reviewed, Satisfies Acceptance Criteria, Accepted by FO

    Description

      Plan to, and/or "fix" instability caused by long-running commands on real hardware

      (implementing transitional states, execution of commands running longer than 3 seconds)Spike to assess/plan implement schemeĀ adapt all devices to use chosen scheme

      Attachments

        Issue Links

          Structure

            Activity

              People

                s.vrcic Vrcic, Sonja
                v.mohile Mohile, Vivek
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 1.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete46.0
                  Total46.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel