Loading...

Change Owns to Parent Ofs

Set start and due date...

Xporter

XML

Word

Printable

Details

Type: Feature
Priority: Should have
Fix Version/s: PI23
Component/s: SRCnet Science Enabling
Labels:

ARTs:

SRCnet
Benefit hypothesis:

Hide

The SRC workloads Gitlab repository hosts workloads that are representative of what users will be running at SRC sites. The STARS script collects metrics (currently only processing duration) for the tasks, which are recorded on confluence. This feature would use existing monitoring infrastructure (likely Elastic + Grafana) to store the metric data output from STARS. This will be the first step towards enabling us to view in real time the performance of different SRC machines, a key component of the eventual compute tests described in the SRCNet 0.1 Implementation Plan.

Show
The SRC workloads Gitlab repository hosts workloads that are representative of what users will be running at SRC sites. The STARS script collects metrics (currently only processing duration) for the tasks, which are recorded on confluence. This feature would use existing monitoring infrastructure (likely Elastic + Grafana) to store the metric data output from STARS. This will be the first step towards enabling us to view in real time the performance of different SRC machines, a key component of the eventual compute tests described in the SRCNet 0.1 Implementation Plan.
Acceptance criteria:

Hide

AC1: At least one task in the src-workloads repo has a means of verifying that the task completed successfully each time it is run (via checksum or hardcoded success criteria)

AC2: Script is added to the src-workloads repo, or STARS is modified, such that when the task(s) in AC1 are run, the performance data is pushed (along with easily obtainable machine data) to a user-specified Elastic (or other NoSQL document store) DB.

AC3: New dashboard added to existing Grafana instance to display results from task runs at different sites.

Show
AC1: At least one task in the src-workloads repo has a means of verifying that the task completed successfully each time it is run (via checksum or hardcoded success criteria) AC2: Script is added to the src-workloads repo, or STARS is modified, such that when the task(s) in AC1 are run, the performance data is pushed (along with easily obtainable machine data) to a user-specified Elastic (or other NoSQL document store) DB. AC3: New dashboard added to existing Grafana instance to display results from task runs at different sites.
Feature Points:
2.5
Initial Size:
2
WSJF:
0
Epic Link:
SKA Regional Centre Workflow Pack
Agile Teams:

Team_MAGENTA
Due Sprint:
Sprint 5
Story Point Burn-up:
Overdue:
Outcomes:

Hide

AC1: Source finding task has an integration test script: https://gitlab.com/ska-telescope/src/src-workloads/-/blob/master/tasks/source-finding-pybdsf/scripts/test/integration-test-sourcefinding.sh

AC2/3: Workloads dashboard: https://monit.srcdev.skao.int/grafana/d/adqjpmxh900lcc/workload-tasks?orgId=1

Demo: https://drive.google.com/file/d/1Xp7zgH2f33wLxNWR1OP3DgRb2__txh2K/view?usp=drive_link
See repo for documentation: https://gitlab.com/ska-telescope/src/src-workloads/-/tree/master/bench

Show
AC1: Source finding task has an integration test script: https://gitlab.com/ska-telescope/src/src-workloads/-/blob/master/tasks/source-finding-pybdsf/scripts/test/integration-test-sourcefinding.sh AC2/3: Workloads dashboard: https://monit.srcdev.skao.int/grafana/d/adqjpmxh900lcc/workload-tasks?orgId=1 Demo: https://drive.google.com/file/d/1Xp7zgH2f33wLxNWR1OP3DgRb2__txh2K/view?usp=drive_link See repo for documentation: https://gitlab.com/ska-telescope/src/src-workloads/-/tree/master/bench
Resolved PI.Sprint:
24.3

Feature Checklist:

Stories Completed, Outcomes Reviewed, Satisfies Acceptance Criteria

Requirement Status:

PI24 - UNCOVERED
Labels_MIRO:
PI23 SRC23-PB SRCNet0.x example-workflows-and-benchmarks tests-compilation

Description

The deployment of new monitoring infrastructure has been removed from the previous version of this feature, since it is possible to use existing DB/dashboard instances.

This feature is now to focus on the development needed within the src-workloads repository. Before automated performance metrics about task runs can be captured, it needs to be possible to automatically determine whether a task ran successfully (or failed). This can use e.g. hardcoded checksum validation (or even just verify that expected output files created successfully) for now.

Once this has been done, the STARS script should be modified so that it can push the performance score, together with other easily obtainable (or user-set) metadata, to a running Elastic instance (or other document-type NoSQL DB). During local development this can use an ephemeral Elastic container.

Finally, a simple dashboard view of this data can be added to an existing Grafana instance.

Feature Point estimate sized to give time to learn about relevant technologies.

Attachments

Issue Links

relates to

SP-4326 Service monitoring v0.1 infrastructure

Done

SP-4394 Retain src-workloads execution statistics in a DB that enables performance monitoring

Discarded

mentioned in: Page Loading...; Page Loading...

Structure

Activity

People

Assignee:: Salgado, Jesus

Reporter:: Clarke, Alex

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Feature Progress

Story Point Burn-up: (100.00%)

Feature Estimate: 2.5

	Issues	Story Points
To Do	0	0.0
In Progress	0	0.0
Complete	10	18.0
Total	10	18.0

Dates

Created:: 10/May/24 4:00 PM

Updated:: 28/Oct/24 11:44 PM

Resolved:: 15/Oct/24 6:15 AM

Due Sprint Date:: 20/Aug/24

Add benchmark metrics into a dashboard