Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4554

NLSRC - Deploy all v0.1 local compulsory services

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      Deployed, monitorable, documented and supportable services: will enable initial operations activities, including test campaigns; will (later) enable test users to interact with our systems; will provide the SOG with valuable operating experience; and will provide stakeholders with a tangible, assessable demonstration of SRCNet's progress so far.

      Deploying all local compulsory services is one requirement for SRCNet to "confirm" that an SRC will be a v0.1 node.

      A sufficient number of nodes must be "confirmed" for SRCnet to begin the v0.1 phase of work. This will be a landmark achievement for SRCNet. 

      Show
      Deployed, monitorable, documented and supportable services: will enable initial operations activities, including test campaigns; will (later) enable test users to interact with our systems; will provide the SOG with valuable operating experience; and will provide stakeholders with a tangible, assessable demonstration of SRCNet's progress so far. Deploying all local compulsory services is one requirement for SRCNet to "confirm" that an SRC will be a v0.1 node. A sufficient number of nodes must be "confirmed" for SRCnet to begin the v0.1 phase of work. This will be a landmark achievement for SRCNet. 
    • Hide

      For this SRC, all v0.1 local compulsory services are:

      AC1: Deployed locally

      AC2: Local integration test(s) are completed to check the connection with global services - as informed by deployment documentation (to be populated)

      AC3: Monitorable via the centralised service monitoring dashboards and shown to be running successfully

      AC4: Documented - meaning that any SRC-specific deployment instructions / troubleshooting are captured and (where applicable) added to ska-src-docs-operator

      AC5: Supportable - meaning that an operator (a SOG member if possible) is able to access the service deployment and provide support next PI

      AC6: Deployed via a GitOps Tool (e.g. ArgoCD, FluxCD) OR repeatable deployment methods fully documented with deployment velocity expectations (e.g. new versions are deployed within 1 working day)

      v0.1 local compulsory services:

      • Rucio Storage Element (configured to use SKA IAM)
      • JupyterHub (specifically JupyterHub, not another service that provides Jupyter Notebooks)
      • at least one Visualisation Service out of: CARTA, VisIVO, Aladin
        note: CANFAR can provide this visualisation service
      • SODA Service
      • Gate-Keeper API (see via features under SP-4654)
      • cavern see description 
      • a Orchestrator Service (e.g. Kubernetes)
      • perfSONAR
      • Local Service Monitoring stack (e.g. Prometheus)
      Show
      For this SRC, all v0.1 local compulsory services are: AC1: Deployed locally AC2: Local integration test(s) are completed to check the connection with global services - as informed by deployment documentation ( to be populated ) AC3: Monitorable via the centralised service monitoring dashboards and shown to be running successfully AC4: Documented - meaning that any SRC-specific deployment instructions / troubleshooting are captured and (where applicable) added to ska-src-docs-operator AC5: Supportable - meaning that an operator (a SOG member if possible) is able to access the service deployment and provide support next PI AC6: Deployed via a GitOps Tool (e.g. ArgoCD, FluxCD) OR repeatable deployment methods fully documented with deployment velocity expectations (e.g. new versions are deployed within 1 working day) v0.1 local compulsory services: Rucio Storage Element (configured to use SKA IAM) JupyterHub (specifically JupyterHub, not another service that provides Jupyter Notebooks) at least one Visualisation Service out of: CARTA, VisIVO, Aladin note: CANFAR can provide this visualisation service SODA Service Gate-Keeper AP I (see via features under SP-4654 ) cavern see description  a Orchestrator Service (e.g. Kubernetes) perfSONAR Local Service Monitoring stack (e.g. Prometheus)
    • 6.5
    • 18
    • 0
    • Team_NLSRC
    • Hide
        AC1 AC2 AC3 AC4 AC5 AC6  
      Service Deployed Integrated Monitorable/Running Documented Supportable GitOps (or ...) Notes
      Rucio Storage Element done done done (we expect a central to service to globally monitor it, e.g., the data lake dashboard and wonder if there will be global alerting) TBD ( the level of detail needs to be clairfied and agreed upon)  done N/A  
      JupyterHub TBD (notebooks can be started via modules and.or containers) TBD (need to discuss what this means) TBD (likely not possible from the outside as its delivered as a module or container inside the Slurm platform) TBD ( the level of detail needs to be clairfied and agreed upon)  TBD (via EESSI and/or other external repos/parties) N/A In the EoI we agreed to  std. Jupyter notebooks (in line with the docs at that time) and not a Hub. Std. notebooks are better aligned with platform and can be provided if agreed upon. How/where do we discuss this?
      Visualisation Service In progress In progress In progress In progress In progress   We will likely provide Aladin (Desktop) via CVMFS in containerized form.
      SODA Service TBD (Determining whether a WebDAV mount will allow for dCache RSE hosted files to be visible as posx-like for SODA and whether ACLs are properly preserved) TBD TBD TBD TBD   Procedural (incl. ownership), functional and security descriptions are missing atm. for this intended service. Also, in the EoI this was grouped in the options with the other visualization services.
      Gate-Keeper API TBD TBD TBD TBD TBD   This service appears to not yet be ready from a development perspective. Hence, we can not deploy it (yet) at this time. 
      cavern TBD (The Slurm platform comes with collaborative RBAC project spaces) TBD TBD TBD TBD   It is unlikely that this can be deployed on the chosen platform for NL-SRC. Hence this requires discussion and potential alternatives. Will plan a meeting with P. Dowler to investigated.
      Orchestrator Service N/A (Internally we use Gitlab, Ansible and Terraform with our private cloud) N/A N/A N/A N/A   Not applicable for NL-SRC
      perfSONAR done done done (but the central service and the technical validation criteria - if any - should be specified) TBD ( the level of detail needs to be clairfied and agreed upon)  done    
      Local Service Monitoring TBD (We use a combination of different monitoring tools internally e.g., zabbix, prometheus. Monitoring of the Slurm cluster nodes and jobs is available to authenticated users). N/A? TBD (monitor a local monitor?) N/A (parts can be documented e.g., https://servicedesk.surf.nl/wiki/display/WIKI/NLSRC) TBD   Likely it is done at NL-SRC, but the service needs a definition as well as a description and validation criteria.
      Show
        AC1 AC2 AC3 AC4 AC5 AC6   Service Deployed Integrated Monitorable/Running Documented Supportable GitOps (or ...) Notes Rucio Storage Element done done done (we expect a central to service to globally monitor it, e.g., the data lake dashboard and wonder if there will be global alerting) TBD ( the level of detail needs to be clairfied and agreed upon)   done N/A   JupyterHub TBD (notebooks can be started via modules and.or containers) TBD (need to discuss what this means) TBD (likely not possible from the outside as its delivered as a module or container inside the Slurm platform) TBD ( the level of detail needs to be clairfied and agreed upon)   TBD (via EESSI and/or other external repos/parties) N/A In the EoI we agreed to  std. Jupyter notebooks (in line with the docs at that time) and not a Hub. Std. notebooks are better aligned with platform and can be provided if agreed upon. How/where do we discuss this? Visualisation Service In progress In progress In progress In progress In progress   We will likely provide Aladin (Desktop) via CVMFS in containerized form. SODA Service TBD (Determining whether a WebDAV mount will allow for dCache RSE hosted files to be visible as posx-like for SODA and whether ACLs are properly preserved) TBD TBD TBD TBD   Procedural (incl. ownership), functional and security descriptions are missing atm. for this intended service. Also, in the EoI this was grouped in the options with the other visualization services. Gate-Keeper API TBD TBD TBD TBD TBD   This service appears to not yet be ready from a development perspective. Hence, we can not deploy it (yet) at this time.  cavern TBD (The Slurm platform comes with collaborative RBAC project spaces) TBD TBD TBD TBD   It is unlikely that this can be deployed on the chosen platform for NL-SRC. Hence this requires discussion and potential alternatives. Will plan a meeting with P. Dowler to investigated. Orchestrator Service N/A (Internally we use Gitlab, Ansible and Terraform with our private cloud) N/A N/A N/A N/A   Not applicable for NL-SRC perfSONAR done done done (but the central service and the technical validation criteria - if any - should be specified) TBD ( the level of detail needs to be clairfied and agreed upon)   done     Local Service Monitoring TBD (We use a combination of different monitoring tools internally e.g., zabbix, prometheus. Monitoring of the Slurm cluster nodes and jobs is available to authenticated users). N/A? TBD (monitor a local monitor?) N/A (parts can be documented e.g., https://servicedesk.surf.nl/wiki/display/WIKI/NLSRC ) TBD   Likely it is done at NL-SRC, but the service needs a definition as well as a description and validation criteria.
    • PI24 - UNCOVERED

    • PI24-PB SRCNet0.1
    • SPO-3480

    Description

      For this feature, these deployments may be hosted on temporary infrastructure and/or infrastructure that does not meet the full operational requirements committed to via the v0.1 EoIs

      Some of these deployments may already be completed. The size of this feature will vary per SRC. Do not include optional services as part of this feature.

      See the implementation plan doc and miro for additional information and ongoing SRC plans.

      cavern has been conditionally added to the compulsory service list. Early PI24 development work / architectural decisions will answer if this is required to deliver the Data Management API and/or SODA Service.

      Deployment is intended to be a simple process. Additional development work required, per service or per SRC, may be split out into other features or contained here - whichever approach best enables teams to plan their efforts.

      For some local compulsory services, the technology stack is not fixed. If a technology is chosen that differs from proven implementations, it is the SRCs responsibility to plan and complete any additional work necessary to integrate their deployment with SRCNet's global services, including service monitoring. 

      Multiple other features are required to enable this work and should be linked and sequenced accordingly, IBNLT SP-4598, SP-4570, SP-4570, SP-4569, SP-4517

       


       

       

      Attachments

        Issue Links

          Structure

            Activity

              People

                Jesus.Salgado Salgado, Jesus
                Robert.Perry Perry, Robert
                Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (0%)

                  Feature Estimate: 6.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete00.0
                  Total00.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel