Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4494

UKSRC - Deploy all v0.1 local compulsory services

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      Deployed, monitorable, documented and supportable services: will enable initial operations activities, including test campaigns; will (later) enable test users to interact with our systems; will provide the SOG with valuable operating experience; and will provide stakeholders with a tangible, assessable demonstration of SRCNet's progress so far.

      Deploying all local compulsory services is one requirement for SRCNet to "confirm" that an SRC will be a v0.1 node.

      A sufficient number of nodes must be "confirmed" for SRCnet to begin the v0.1 phase of work. This will be a landmark achievement for SRCNet. 

      Show
      Deployed, monitorable, documented and supportable services: will enable initial operations activities, including test campaigns; will (later) enable test users to interact with our systems; will provide the SOG with valuable operating experience; and will provide stakeholders with a tangible, assessable demonstration of SRCNet's progress so far. Deploying all local compulsory services is one requirement for SRCNet to "confirm" that an SRC will be a v0.1 node. A sufficient number of nodes must be "confirmed" for SRCnet to begin the v0.1 phase of work. This will be a landmark achievement for SRCNet. 
    • Hide

      For this SRC, all v0.1 local compulsory services are:

      AC1: Deployed locally

      AC2: Local integration test(s) are completed to check the connection with global services - as informed by deployment documentation (to be populated)

      AC3: Monitorable via the centralised service monitoring dashboards and shown to be running successfully

      AC4: Documented - meaning that any SRC-specific deployment instructions / troubleshooting are captured and (where applicable) added to ska-src-docs-operator

      AC5: Supportable - meaning that an operator (a SOG member if possible) is able to access the service deployment and provide support next PI

      AC6: Deployed via a GitOps Tool (e.g. ArgoCD, FluxCD) OR repeatable deployment methods fully documented with deployment velocity expectations (e.g. new versions are deployed within 1 working day)

      v0.1 local compulsory services:

      • Rucio Storage Element (configured to use SKA IAM)
      • JupyterHub (specifically JupyterHub, not another service that provides Jupyter Notebooks)
      • at least one Visualisation Service out of: CARTA, VisIVO, Aladin
        note: CANFAR can provide this visualisation service
      • SODA Service
      • Data Management API (see via features under SP-4654)
      • cavern see description 
      • a Orchestrator Service (e.g. Kubernetes)
      • perfSONAR
      • Local Service Monitoring stack (e.g. Prometheus)
      Show
      For this SRC, all v0.1 local compulsory services are: AC1: Deployed locally AC2: Local integration test(s) are completed to check the connection with global services - as informed by deployment documentation ( to be populated ) AC3: Monitorable via the centralised service monitoring dashboards and shown to be running successfully AC4: Documented - meaning that any SRC-specific deployment instructions / troubleshooting are captured and (where applicable) added to ska-src-docs-operator AC5: Supportable - meaning that an operator (a SOG member if possible) is able to access the service deployment and provide support next PI AC6: Deployed via a GitOps Tool (e.g. ArgoCD, FluxCD) OR repeatable deployment methods fully documented with deployment velocity expectations (e.g. new versions are deployed within 1 working day) v0.1 local compulsory services: Rucio Storage Element (configured to use SKA IAM) JupyterHub (specifically JupyterHub, not another service that provides Jupyter Notebooks) at least one Visualisation Service out of: CARTA, VisIVO, Aladin note: CANFAR can provide this visualisation service SODA Service Data Management API (see via features under SP-4654 ) cavern see description  a Orchestrator Service (e.g. Kubernetes) perfSONAR Local Service Monitoring stack (e.g. Prometheus)
    • Intra Program
    • 6.5
    • 5
    • 0
    • Team_DAAC, Team_PURPLE
    • Sprint 5
    • Hide
        AC1 AC2 AC3 AC4 AC5 AC6
      Service Deployed Integrated Monitorable/Running Documented Supportable GitOps (or ...)
      Rucio Storage Element            
      JupyterHub            
      Visualisation Service            
      SODA Service            
      Data Management API            
      cavern            
      Orchestrator Service            
      perfSONAR            
      Local Service Monitoring            
      Show
        AC1 AC2 AC3 AC4 AC5 AC6 Service Deployed Integrated Monitorable/Running Documented Supportable GitOps (or ...) Rucio Storage Element             JupyterHub             Visualisation Service             SODA Service             Data Management API             cavern             Orchestrator Service             perfSONAR             Local Service Monitoring            
    • PI24 - UNCOVERED

    • PI24-PB SRCNet0.1 Team_PURPLE team_DAAC team_TEAL
    • SPO-3480

    Description

      For this feature, these deployments may be hosted on temporary infrastructure and/or infrastructure that does not meet the full operational requirements committed to via the v0.1 EoIs

      Some of these deployments may already be completed. The size of this feature will vary per SRC. Do not include optional services as part of this feature.

      See the implementation plan doc and miro for additional information and ongoing SRC plans.

      cavern has been conditionally added to the compulsory service list. Early PI24 development work / architectural decisions will answer if this is required to deliver the Data Management API and/or SODA Service.

      Deployment is intended to be a simple process. Additional development work required, per service or per SRC, may be split out into other features or contained here - whichever approach best enables teams to plan their efforts.

      For some local compulsory services, the technology stack is not fixed. If a technology is chosen that differs from proven implementations, it is the SRCs responsibility to plan and complete any additional work necessary to integrate their deployment with SRCNet's global services, including service monitoring. 

      Multiple other features are required to enable this work and should be linked and sequenced accordingly, IBNLT SP-4598, SP-4570, SP-4570, SP-4569, SP-4517

       


      Cross team details available in miro:
      https://miro.com/app/board/uXjVK0qVHVs=/ 

       


      The DAAC team have been identified as the team to help operate a number of the services required for SRCNet 0.1 at RAL:

      • Implement GitOps for managing staging and production K8s clusters
      • SRCNet Azimuth GitOps managed across both RAL and Cambridge from a single git repo.
      • Deployment of Data Management API, SODA, JuypterHub, Visualization service, and any associated monitoring.

      Do not do this former stretch AC as part of this feature:

      • Hopefully setup SRCNet Gateway, CANFAR with Cavern

      IF you think the team has the capacity to deploy optional services, propose this as a separate, Could Have feature.

       


      The Purple team (via help from other RAL teams) will be responsible for:

      • (Running central SKA IAM service)
      • Compute hardware enrolled into the STFC RAL OpenStack based Cloud
      • Storage hardware turned into a Ceph cluster (assuming we start with Manila+xrootd, what is team DAAC given)
      • FTS and Rucio RSE created on top of the above storage hardware
      • RAL cloud, should authenticate via SKA IAM, with group membership within SKA IAM authorizing access to OpenStack projects
      • Providing Team DAAC with appropriate access to run the other SRCNet services, particular those requiring access to the Rucio RSE.

      The DAAC team need to bring up all the other services at RAL, building on the above storage, including:

      Attachments

        Issue Links

          Structure

            Activity

              People

                Jesus.Salgado Salgado, Jesus
                Robert.Perry Perry, Robert
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (0%)

                  Feature Estimate: 6.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete00.0
                  Total00.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel