Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-2693

Extract/refactor monitoring and logging for Infra layer

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Services
    • Hide

      In order to support the efficient use of AA0.5 ITF Platform Environments, A solution is required that provides observability, and notification of platform state/events.

      • This requires a decoupling of the infrastructure monitoring/logging from the services and application layers
      • The introduction of targeted alerts - eg: network/compute/storage/accessibility (include core services eg: ceph/elastic/k8s
      • Log tagging, views, and dashboards that provide platform insight
      • Visualisations that provide infra and core services insight
      Show
      In order to support the efficient use of AA0.5 ITF Platform Environments, A solution is required that provides observability, and notification of platform state/events. This requires a decoupling of the infrastructure monitoring/logging from the services and application layers The introduction of targeted alerts - eg: network/compute/storage/accessibility (include core services eg: ceph/elastic/k8s Log tagging, views, and dashboards that provide platform insight Visualisations that provide infra and core services insight
    • Hide
      • monitoring and logging deployment and node/service integration is abstracted away (in Ansible) so that it can be independently deployed
      • Probes exist for basic compute, network storage for each node - can be used for VMs and baremetal
      • Probes exist for core service availability - ceph, elastic, core databases (Prometheus?), k8s (platform availability and loading)
      • Alerts exist that will show core service accessibility/health issues
      • Alerts exist that expose core infra/service resource issues such as excess load, diskspace
      • Log tagging, views, and dashboards that provide platform insight
      • Visualisations that provide infra and core services insight - OS level metrics

       

       

       

       

      Show
      monitoring and logging deployment and node/service integration is abstracted away (in Ansible) so that it can be independently deployed Probes exist for basic compute, network storage for each node - can be used for VMs and baremetal Probes exist for core service availability - ceph, elastic, core databases (Prometheus?), k8s (platform availability and loading) Alerts exist that will show core service accessibility/health issues Alerts exist that expose core infra/service resource issues such as excess load, diskspace Log tagging, views, and dashboards that provide platform insight Visualisations that provide infra and core services insight - OS level metrics        
    • 3
    • 3
    • 0
    • Team_BANG
    • Sprint 5
    • Hide
      • monitoring and logging deployment and node/service integration is abstracted away (in Ansible) so that it can be independently deployed
      • Probes exist for basic compute, network storage for each node - can be used for VMs and baremetal
      • Probes exist for core service availability - ceph, elastic, core databases (Prometheus?), k8s (platform availability and loading)
      • Alerts exist that will show core service accessibility/health issues
      • Alerts exist that expose core infra/service resource issues such as excess load, diskspace
      • Log tagging, views, and dashboards that provide platform insight
      • Visualisations that provide infra and core services insight - OS level metrics
      Show
      monitoring and logging deployment and node/service integration is abstracted away (in Ansible) so that it can be independently deployed Probes exist for basic compute, network storage for each node - can be used for VMs and baremetal Probes exist for core service availability - ceph, elastic, core databases (Prometheus?), k8s (platform availability and loading) Alerts exist that will show core service accessibility/health issues Alerts exist that expose core infra/service resource issues such as excess load, diskspace Log tagging, views, and dashboards that provide platform insight Visualisations that provide infra and core services insight - OS level metrics
    • 17.6
    • Stories Completed, Integrated, Outcomes Reviewed, Satisfies Acceptance Criteria, Accepted by FO
    • PI24 - UNCOVERED

    • Team_BANG

    Description

      Refactor existing platform monitoring/logging/alerting to provide infra observability: service performance, availability, and capacity

      Attachments

        Issue Links

          Structure

            Activity

              People

                m.deegan Deegan, Miles
                P.Harding Harding, Piers
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 3.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete510.5
                  Total510.5

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel