Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4658

CNSRC - Deploy all v0.1 local compulsory services

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      Deployed, monitorable, documented and supportable services: will enable initial operations activities, including test campaigns; will (later) enable test users to interact with our systems; will provide the SOG with valuable operating experience; and will provide stakeholders with a tangible, assessable demonstration of SRCNet's progress so far.

      Deploying all local compulsory services is one requirement for SRCNet to "confirm" that an SRC will be a v0.1 node.

      A sufficient number of nodes must be "confirmed" for SRCnet to begin the v0.1 phase of work. This will be a landmark achievement for SRCNet. 

      Show
      Deployed, monitorable, documented and supportable services: will enable initial operations activities, including test campaigns; will (later) enable test users to interact with our systems; will provide the SOG with valuable operating experience; and will provide stakeholders with a tangible, assessable demonstration of SRCNet's progress so far. Deploying all local compulsory services is one requirement for SRCNet to "confirm" that an SRC will be a v0.1 node. A sufficient number of nodes must be "confirmed" for SRCnet to begin the v0.1 phase of work. This will be a landmark achievement for SRCNet. 
    • Hide

      For this SRC, all v0.1 local compulsory services are:

      AC1: Deployed locally

      AC2: Local integration test(s) are completed to check the connection with global services - as informed by deployment documentation (to be populated)

      AC3: Monitorable via the centralised service monitoring dashboards and shown to be running successfully

      AC4: Documented - meaning that any SRC-specific deployment instructions / troubleshooting are captured and (where applicable) added to ska-src-docs-operator

      AC5: Supportable - meaning that an operator (a SOG member if possible) is able to access the service deployment and provide support next PI

      AC6: Deployed via a GitOps Tool (e.g. ArgoCD, FluxCD) OR repeatable deployment methods fully documented with deployment velocity expectations (e.g. new versions are deployed within 1 working day)

      v0.1 local compulsory services:

      • Rucio Storage Element (configured to use SKA IAM)
      • JupyterHub (specifically JupyterHub, not another service that provides Jupyter Notebooks)
      • at least one Visualisation Service out of: CARTA, VisIVO, Aladin
        note: CANFAR can provide this visualisation service
      • SODA Service
      • Gate-Keeper API (see via features under SP-4654)
      • cavern see description 
      • a Orchestrator Service (e.g. Kubernetes)
      • perfSONAR
      • Local Service Monitoring stack (e.g. Prometheus)
      Show
      For this SRC, all v0.1 local compulsory services are: AC1: Deployed locally AC2: Local integration test(s) are completed to check the connection with global services - as informed by deployment documentation ( to be populated ) AC3: Monitorable via the centralised service monitoring dashboards and shown to be running successfully AC4: Documented - meaning that any SRC-specific deployment instructions / troubleshooting are captured and (where applicable) added to ska-src-docs-operator AC5: Supportable - meaning that an operator (a SOG member if possible) is able to access the service deployment and provide support next PI AC6: Deployed via a GitOps Tool (e.g. ArgoCD, FluxCD) OR repeatable deployment methods fully documented with deployment velocity expectations (e.g. new versions are deployed within 1 working day) v0.1 local compulsory services: Rucio Storage Element (configured to use SKA IAM) JupyterHub (specifically JupyterHub, not another service that provides Jupyter Notebooks) at least one Visualisation Service out of: CARTA, VisIVO, Aladin note: CANFAR can provide this visualisation service SODA Service Gate-Keeper API (see via features under SP-4654 ) cavern see description  a Orchestrator Service (e.g. Kubernetes) perfSONAR Local Service Monitoring stack (e.g. Prometheus)
    • 3
    • 3
    • 0
    • Team_GOLD
    • Hide
        AC1 AC2 AC3 AC4 AC5 AC6
      Service Deployed Integrated Monitorable/Running Documented Supportable GitOps (or ...)
      Rucio Storage Element 23/09 - Access issue, whitelist of IPs could be helpful. Can be accessed from within China.
      Otherwise the RSE is available.
       
      08/10 - Problem with CNSRC, log files sent to Rob Berwick, error message in the dashboard - to be followed up in a couple of days.
       
      Discussion with Ian (Purple Team) in Beijing on 9th October regarding the certification process and IHEP, Cern - AAI and Rucio.
      Yes, cnSRC-XRD[DONE].
       
      cnSRC-STORM[in the RSE list]
      Data transfer is ok, but some checksum had issue, will dig out more with Pablo.
       
      (22/10 - Blocked, and in conversation with James Walder.
      ETA: To be resolved in this week)
       
      29/10 - Data transfer is happening. The script given by James W - will be tried and results to be updated. Issue is still existing.
           
      JupyterHub 23/09 - Yes   Completed.
      29/10 - But to be redeployed using ArgoCD this week.
           
      Visualisation Service 23/09 - Problem in running the software, to be taken up in Iteration 4 - 
       
      Giuseppe/Fabio - please talk to the Orange Team
       
      08/10 - No update yet, Gold Team will be reaching out this Sprint 3.
       
      For services documentation - please check with Manu/Coral Team
       
      08/10 - Discussion to be had in Sprint 3. 
        22/10 - To contact via slack for possible solutions.
      Please talk to the Tangerine/Orange Team - potential blocker 
       
      29/10 - CARTA service has been running on CANFAR platform - stops in the opening page, error reported to the Slack channel - Potential inputs from Manu - anyone in the channel.
           
      SODA Service 23/09 - to be done in Iteration 4   22/10 - TBD
       
      29/10 - To be deployed this week.
           
      Data Management API 08/10 - Dependency on completion of the API.   TBD (SOG Update)      
      cavern            
      Orchestrator Service 23/09 - K8s installed and working   Completed      
      perfSONAR 08/10 - Please refer the attached arch. diagram.
       
      15/10 - To be deployed in a separate machine and connect to the compute machines which has K8s instance. Integrate statistics with graphana dashboard
      Contact with Purple Team. 29/10 - In Progress - has been reached out to Mathias Myer - which port needs to be exposed for monitoring.      
      Local Service Monitoring 23/09 - Victoriametrics has been installed and needs to be checked if it can be used instead of prometheus
       
      08/10 - Prometheus also installed.
        Completed      
      Show
        AC1 AC2 AC3 AC4 AC5 AC6 Service Deployed Integrated Monitorable/Running Documented Supportable GitOps (or ...) Rucio Storage Element 23/09 - Access issue, whitelist of IPs could be helpful. Can be accessed from within China. Otherwise the RSE is available.   08/10 - Problem with CNSRC, log files sent to Rob Berwick, error message in the dashboard - to be followed up in a couple of days.   Discussion with Ian (Purple Team) in Beijing on 9th October regarding the certification process and IHEP, Cern - AAI and Rucio. Yes, cnSRC-XRD [DONE] .   cnSRC-STORM [in the RSE list] Data transfer is ok, but some checksum had issue, will dig out more with Pablo.   (22/10 - Blocked , and in conversation with James Walder. ETA: To be resolved in this week)   29/10 - Data transfer is happening. The script given by James W - will be tried and results to be updated. Issue is still existing.       JupyterHub 23/09 - Yes   Completed. 29/10 - But to be redeployed using ArgoCD this week.       Visualisation Service 23/09 - Problem in running the software, to be taken up in Iteration 4 -    Giuseppe/Fabio - please talk to the Orange Team   08/10 - No update yet, Gold Team will be reaching out this Sprint 3.   For services documentation - please check with Manu/Coral Team   08/10 - Discussion to be had in Sprint 3.    22/10 - To contact via slack for possible solutions. Please talk to the Tangerine/Orange Team - potential blocker     29/10 - CARTA service has been running on CANFAR platform - stops in the opening page, error reported to the Slack channel - Potential inputs from Manu - anyone in the channel.       SODA Service 23/09 - to be done in Iteration 4   22/10 - TBD   29/10 - To be deployed this week.       Data Management API 08/10 - Dependency on completion of the API.   TBD (SOG Update)       cavern             Orchestrator Service 23/09 - K8s installed and working   Completed       perfSONAR 08/10 - Please refer the attached arch. diagram.   15/10 - To be deployed in a separate machine and connect to the compute machines which has K8s instance. Integrate statistics with graphana dashboard Contact with Purple Team. 29/10 - In Progress - has been reached out to Mathias Myer - which port needs to be exposed for monitoring.       Local Service Monitoring 23/09 - Victoriametrics has been installed and needs to be checked if it can be used instead of prometheus   08/10 - Prometheus also installed.   Completed      
    • PI24 - UNCOVERED

    • PI24-PB SRCNet0.1
    • SPO-3480

    Description

      For this feature, these deployments may be hosted on temporary infrastructure and/or infrastructure that does not meet the full operational requirements committed to via the v0.1 EoIs

      Some of these deployments may already be completed. The size of this feature will vary per SRC. Do not include optional services as part of this feature.

      See the implementation plan doc and miro for additional information and ongoing SRC plans.

      cavern has been conditionally added to the compulsory service list. Early PI24 development work / architectural decisions will answer if this is required to deliver the Data Management API and/or SODA Service.

      Deployment is intended to be a simple process. Additional development work required, per service or per SRC, may be split out into other features or contained here - whichever approach best enables teams to plan their efforts.

      For some local compulsory services, the technology stack is not fixed. If a technology is chosen that differs from proven implementations, it is the SRCs responsibility to plan and complete any additional work necessary to integrate their deployment with SRCNet's global services, including service monitoring. 

      Multiple other features are required to enable this work and should be linked and sequenced accordingly, IBNLT SP-4598, SP-4570, SP-4570, SP-4569, SP-4517

       


       

       

      Attachments

        1. cnSRC-SRCNet-Operation-Group-SOG-ppt_2024-09-18.pdf
          1.43 MB
          Mitra, Debashis
        2. image-2024-10-08-15-47-07-453.png
          245 kB
          Mitra, Debashis
        3. image-2024-10-15-11-22-36-001.png
          80 kB
          Mitra, Debashis

        Issue Links

          Structure

            Activity

              People

                Jesus.Salgado Salgado, Jesus
                Robert.Perry Perry, Robert
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (0%)

                  Feature Estimate: 3.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete00.0
                  Total00.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel