Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4326

Service monitoring v0.1 infrastructure

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      To enable a centralised dashboard view on the local and global services running at the v0.1 SRCNodes, we need a data source containing information from these services. This monitoring 'plumbing' infrastructure is necessary to capture SRCNode services running at Sites

      • as they start to come online in preparation for v0.1
      • provide the ability to monitor service health centrally
      • provide historical data for services running at Sites to capture reliability of services over time
      Show
      To enable a centralised dashboard view on the local and global services running at the v0.1 SRCNodes, we need a data source containing information from these services. This monitoring 'plumbing' infrastructure is necessary to capture SRCNode services running at Sites as they start to come online in preparation for v0.1 provide the ability to monitor service health centrally provide historical data for services running at Sites to capture reliability of services over time
    • Hide

      A data source configured on a Grafana instance that is collecting persistent information, and the 'explore' functionality has be used to see the information from 2 sets of services running at 2 different sites.

      Show
      A data source configured on a Grafana instance that is collecting persistent information, and the 'explore' functionality has be used to see the information from 2 sets of services running at 2 different sites.
    • 1.5
    • 2
    • 0
    • Team_CHOCOLATE, Team_CORAL
    • Sprint 4
    • Show
      Dashboard: https://grafana.dev.skach.org/goto/HZ0E1mqIR?orgId=1   Deploying the monitoring stack, scraping SPSRC and CHSRC, and deploying persistent metrics storage. https://confluence.skatelescope.org/display/SRCSC/%5BCHOC-16%5D+Setup+prometheus+exporter+for+CHSRC+services https://confluence.skatelescope.org/display/SRCSC/CHOC-40%3A+Migrate+Prometheus%2C+DB%2C+Grafana+setup+to+ArgoCD https://confluence.skatelescope.org/display/SRCSC/%5BCHOC-33%2C+CHOC-53%5D+Deploy+long-time+and+persistent+DB+for+Prometheus
    • 24.3
    • Stories Completed, Outcomes Reviewed
    • PI24 - UNCOVERED

    • SRC23-PB SRCNet0.1 Team_Chocolate multi-team operations-and-infrastructure tests-compilation

    Description

      To enable a centralised dashboard view on the local and global services running at the v0.1 SRCNodes, we need a data source containing information from these services.

      This will entail information being collected in a central data store and a dashboard (Grafana) to build a meaningful view of SRCNet service monitoring metrics.
      There can be several ways of doing this

      • In its simplest form, this could be events from services being pushed into a central DB serving as a Prometheus data source that is used to build dashboard views. 
      • It could also mean sites running an metrics/events exporter service that is probed by/scraped by the central monitoring service. In either approach (push vs pull, activeMQ, Kafka etc), an initial format for the corresponding events needs to be established.

      Attachments

        Issue Links

          Structure

            Activity

              People

                j.collinson Collinson, James
                r.joshi Joshi, Rohini
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 1.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1425.0
                  Total1425.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel