Loading...

Change Owns to Parent Ofs

Set start and due date...

Xporter

XML

Word

Printable

Details

Type: Spike
Priority: Not Assigned
Fix Version/s: None
Component/s: None
Labels:
- CSP.LMC
- goal_O3

ARTs:

Obs Mgt & Controls
Benefit hypothesis:

Hide

Detect limits of performance and possible identify if source of performance issues is the result of current deployment architecture. If identified issues can be addressed in early stages of development, so that if the issues cannot be fixed in current design, the SKA project can pivot to more suitable design choices.

Show
Detect limits of performance and possible identify if source of performance issues is the result of current deployment architecture. If identified issues can be addressed in early stages of development, so that if the issues cannot be fixed in current design, the SKA project can pivot to more suitable design choices.
Acceptance criteria:

Hide

Primary acceptance criterion will be to "Get a baseline metric of to what extent we can scale on the integration k8s cluster, using trivial devices" This should however closely replicate the way CSP.LMC are currently proposing to deploy ( in terms of expected hierarchy of devices, containers, Pods etc.) while its not required to use actual CSP code.

Synchronous commands replaced with asynchronous commands where required (see feature description).

Source of the performance issues identified. This is a nice to have, else document the level to which no performance degradation identified.

Plan for mitigation / resolution proposed if the analysis reveals clearly what the issues are, else identify next steps for further work.

Show
Primary acceptance criterion will be to "Get a baseline metric of to what extent we can scale on the integration k8s cluster, using trivial devices" This should however closely replicate the way CSP.LMC are currently proposing to deploy ( in terms of expected hierarchy of devices, containers, Pods etc.) while its not required to use actual CSP code. Synchronous commands replaced with asynchronous commands where required (see feature description). Source of the performance issues identified. This is a nice to have, else document the level to which no performance degradation identified. Plan for mitigation / resolution proposed if the analysis reveals clearly what the issues are, else identify next steps for further work.
Feature Points:
2
WSJF:
7.5
Epic Link:
MVP Consolidation and Extension
Due Sprint:
Sprint 5
Story Point Burn-up:
Overdue:
Resolved PI.Sprint:
16.3

Description

We consider a short timeboxed Spike of a Sprint to look at the measurement possibilities and As Is performance. Further work could be covered later (A cloned Feature ~~SP-712~~ created to capture all the initial descriptions)

Serious performance issues have been detected when a realistic number of CSP sub-arrays and other TANGO Servers and Devices are instantiated. The cause has not been identified, it may be inherent to pyTango implementation, caused by sub-optimal container configuration, blocking due to extensive use of forwarded attributes, use of synchronous (as opposed of asynchronous commands) commands.

In this Spike the team can look at baselining the current performance limits under a very similar deployment architecture replicating as closely as possible the number and hierarchy (of Pods, Containers, Tango Devices) as expected in CSP.LMC and document the results. If possible extend and scale till the performance degrades seriously or devices crash. This should be run on the Engage cluster.

Some of the description below may not occur this Spike but could be part of the follow up feature:

These are some of the options that will be explored:

Where appropriate replace synchronous with asynchronous commands (in particular when command completion depends on large number of other devices or involves parsing of large JSON objects).
Experiment using CSP.LMC and Mid.CBF images that instantiate large number of TANGO Servers/Devices.
Experiment with different container configurations.

Attachments

Issue Links

is cloned by

SP-712 Measure CSP.LMC performance, investigate causes of performance degradation: Further work after SPIKE

Discarded

SP-932 Measure CSP.LMC performance, investigate causes of performance degradation CLONE of SP-636

Discarded

Structure

Activity

People

Assignee:: Mohile, Vivek

Reporter:: Vrcic, Sonja

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Feature Progress

Story Point Burn-up: (0%)

Feature Estimate: 2.0

	Issues	Story Points
To Do	0	0.0
In Progress	0	0.0
Complete	0	0.0
Total	0	0.0

Dates

Created:: 13/Nov/19 1:36 AM

Updated:: 18/Oct/22 10:16 AM

Resolved:: 18/Oct/22 10:16 AM

Measure performance for CSP.LMC like deployment architecture