Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-1667

Deployment & integration of persistent SDP

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Feature
    • Must have
    • PI15
    • COM SDP SW
    • None
    • Data Processing
    • Hide

      (see why in the description)

      SDP will need to be able to manage long-running processing jobs that will be semi-independent of the overall observatory schedule. This means that it is critical to be able to operate SDP long-term without the need for regular shut-downs for either internal (storage filling up) or external reasons (interface changes).

      Show
      (see why in the description) SDP will need to be able to manage long-running processing jobs that will be semi-independent of the overall observatory schedule. This means that it is critical to be able to operate SDP long-term without the need for regular shut-downs for either internal (storage filling up) or external reasons (interface changes).
    • Hide
      • Deploy SDP into a shared namespace on DP testing platform (and/or new k8s cluster provided by the system team for SS-94), document how it would be done in different environments
        • Especially be ready to give some support to others in replicating it as needed (e.g. help SKA operations to set up a persistent demonstration system in their operations room). Timescale isn't quite clear on this, might have to be clarified as part of PI planning.
      • Expose operator interfaces (e.g. notebooks?) that allow operators to run processing blocks on SDP
      • Demonstrate that long-term usage will not cause unmanageable "leaks" - such as configuration database entries, storage usage or Kubernetes entities.
      • Stretch: Show how we can migrate to new SDP (LMC?) versions without interrupting ongoing processing.
      Show
      Deploy SDP into a shared namespace on DP testing platform (and/or new k8s cluster provided by the system team for SS-94 ), document how it would be done in different environments Especially be ready to give some support to others in replicating it as needed (e.g. help SKA operations to set up a persistent demonstration system in their operations room). Timescale isn't quite clear on this, might have to be clarified as part of PI planning. Expose operator interfaces (e.g. notebooks?) that allow operators to run processing blocks on SDP Demonstrate that long-term usage will not cause unmanageable "leaks" - such as configuration database entries, storage usage or Kubernetes entities. Stretch: Show how we can migrate to new SDP (LMC?) versions without interrupting ongoing processing.
    • 4
    • 4
    • 0
    • Team_ORCA
    • Sprint 5
    • Hide

      We have made a persistent stand-alone deployment of the SDP in shared namespaces on the DP cluster (called dp-shared and dp-shared-p).

      In order to exercise the deployment and get data on its behaviour, we run an automated test once every hour using a scheduled run of the SDP integration CI pipeline. The test runs the visibility receive processing script only. See the pipeline runs in GitLab.

      The configuration of the persistent deployment differs from the "ephemeral" deployments for testing in a number of ways:

      • Persistence is enabled for the configuration database, so if the server is restarted then the state of the system is not lost. We have the option of deploying multiple servers for even greater reliability, but so far that has not proved necessary.
      • Two subarrays are deployed, one for the automated testing and the other for manual testing by users. If it become necessary to support multiple simultaneous manual testers, we can deploy more subarrays (the Helm chart supports an arbitrary number).

      More information is available in Confluence in Persistent SDP. We have set up a preliminary page to capture feedback from users and a Slack channel for users to ask for help: #help-sdp.

      We have implemented continuous deployment for the SDP integration repository. The CI pipeline is configured to deploy the SDP to the following environments:

      • Integration, which is upgraded when a branch is merged to master (namespaces: sdp-integration and sdp-integration-p), and
      • Staging, which is upgraded when a release is tagged (namespaces: sdp-staging and sdp-staging-p).

      If we follow the integration → staging → production deployment model, we could consider the shared persistent deployment as one of the production environments. The deployment to "production" is not automated at present, so it is done manually.

      We have created Jupyter notebooks to explain how to control the SDP via its Tango interface and the ska-sdp CLI. They can be run on the DP cluster using the BinderHub set up by the System Team. The notebooks are in the ska-sdp-notebooks repository and they are documented in the developer portal.

      Show
      We have made a persistent stand-alone deployment of the SDP in shared namespaces on the DP cluster (called dp-shared and dp-shared-p). In order to exercise the deployment and get data on its behaviour, we run an automated test once every hour using a scheduled run of the SDP integration CI pipeline. The test runs the visibility receive processing script only. See the pipeline runs in GitLab . The configuration of the persistent deployment differs from the "ephemeral" deployments for testing in a number of ways: Persistence is enabled for the configuration database, so if the server is restarted then the state of the system is not lost. We have the option of deploying multiple servers for even greater reliability, but so far that has not proved necessary. Two subarrays are deployed, one for the automated testing and the other for manual testing by users. If it become necessary to support multiple simultaneous manual testers, we can deploy more subarrays (the Helm chart supports an arbitrary number). More information is available in Confluence in Persistent SDP . We have set up a preliminary  page to capture feedback from users and a Slack channel for users to ask for help: #help-sdp . We have implemented continuous deployment for the SDP integration repository. The CI pipeline is configured to deploy the SDP to the following environments: Integration, which is upgraded when a branch is merged to master (namespaces: sdp-integration and sdp-integration-p), and Staging, which is upgraded when a release is tagged (namespaces: sdp-staging and sdp-staging-p). If we follow the integration → staging → production deployment model, we could consider the shared persistent deployment as one of the production environments. The deployment to "production" is not automated at present, so it is done manually. We have created Jupyter notebooks to explain how to control the SDP via its Tango interface and the ska-sdp CLI. They can be run on the DP cluster using the BinderHub set up by the System Team. The notebooks are in the ska-sdp-notebooks repository and they are documented in the developer portal .
    • 17.4
    • Stories Completed, Integrated, Solution Intent Updated, BDD Testing Passes (no errors), Outcomes Reviewed, NFRS met, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO
    • PI24 - UNCOVERED

    • SPO-1775 SPO-1782

    Description

      Who?

      • AA0.5 Operators
      • AIV Engineers

      What? (outcomes)

      • Contribution to SS-94 for SDP being available in a persistent sandbox deployment (ie the PI15 "operations sandbox")
      • Coordinate and contribute to the publishing two SDP releases:
        1. REL-9 (Targeting 15.2-3)
          • SDP (or SDP as part of SKAMPI) is deployed to one or more persistent environments in a way that allows long-term maintenance and is configurable to be deployed for either MID or LOW control device naming
        2. REL-106 (Targeting 15.4) 
          1. Operator-oriented interfaces to use and manage the instance (collaboration with SP-2551)
          2. A script/notebook is provided along with sufficient documentation (similar to that for the standalone SDP walkthrough) to
            • Startup and shutdown SDP in standalone mode and as part of a SKAMPI deployment (USE-1, USE-21)
            • Run the visibility receive workflow to capture and inspect data using a CBF emulator (USE-22, USE-3, USE-24) in standalone mode and as part of a SKAMPI deployment, ideally using the latest release of visibility receive capable of running multiple successive scans (SP-2483)
      • Help support exploratory testing (15.5)

      Why?

      • We believe that a persistent sandbox deployment of the SDP system will provide valuable experience and feedback needed to deliver a robust, and resilient system for use at early Array Assemblies, starting with AA0.5.
      • We also believe that the developer experience and knowledge needed for introducing new workflows or other service components necessarily to roll out new SDP capabilities will be greatly improved by developing these against a well-tested deployment of the latest stable SDP system.

      References

      • Related solution goal: SPO-1775
      • Services ART are planning to roll out a notebook service by ~15.3 (SP-2590) which should be ready for use with REL-106 scripts 

      Attachments

        Issue Links

          Structure

            Activity

              People

                p.wortmann Wortmann, Peter
                f.graser Graser, Ferdl
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 4.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1120.5
                  Total1120.5

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel