Loading...

Xporter

XML

Word

Printable

Details

Type: Spike
Priority: Must have
Fix Version/s: PI11
Component/s: COM SDP SW
Labels:
None

ARTs:

Data Processing
Benefit hypothesis:

Hide

A processing function library, with well-defeined interfaces and abstraction, is needed to provide the foundation and building blocks for future SDP workflow development. Addressing remaining design questions early is important to proceed with SDP workflow development in construction.

Show
A processing function library, with well-defeined interfaces and abstraction, is needed to provide the foundation and building blocks for future SDP workflow development. Addressing remaining design questions early is important to proceed with SDP workflow development in construction.
Acceptance criteria:
Hide

Conduct a spike meeting with the involved teams, leading the following agreed-on artefacts:

Processing component library repository created

First kernel interface defined (gridding?) - Add documentation of interface, i.e. data models and abstract calling convention
Show
Conduct a spike meeting with the involved teams, leading the following agreed-on artefacts: Processing component library repository created First kernel interface defined (gridding?) - Add documentation of interface, i.e. data models and abstract calling convention
Feature Points:
3
Initial Size:
3
WSJF:
20
Epic Link:
SDP Initial workflows
Agile Teams:

Team_HIPPO, Team_NZAPP, Team_SCHAAP, Team_YANDA
Due Sprint:
Sprint 3
Story Point Burn-up:
Overdue:
Outcomes:

Hide

Two broad meetings were held where the scope of the processing functions was discussed, together with interfaces. Without clear consensus on the scope of the processing functions, we have proposed a unified interface for processing functions that would allow execution of the processing functions by a scheduler without explicitly knowing functions attributes which could be defined elsewhere independently on the scheduler. For example, in a DAG together with a list of functions to execute. The code is at this GitHub repository: https://github.com/KAdamek/task_runner and its description at https://confluence.skatelescope.org/display/SE/HIP-27+-+Proposal+of+the+Interface+for+Processing+Functions

Show
Two broad meetings were held where the scope of the processing functions was discussed, together with interfaces. Without clear consensus on the scope of the processing functions, we have proposed a unified interface for processing functions that would allow execution of the processing functions by a scheduler without explicitly knowing functions attributes which could be defined elsewhere independently on the scheduler. For example, in a DAG together with a list of functions to execute. The code is at this GitHub repository: https://github.com/KAdamek/task_runner and its description at https://confluence.skatelescope.org/display/SE/HIP-27+-+Proposal+of+the+Interface+for+Processing+Functions
Resolved PI.Sprint:
11.6

Feature Checklist:

Stories Completed, Solution Intent Updated, Outcomes Reviewed, Satisfies Acceptance Criteria, Accepted by FO

Description

Spike: review, discuss, and progress detailed processing function library interface.

Discuss, define and document ( ! ) a number of processing function interfaces. At minimum:
- "High level", like visibility stream currently implemented by receive (containing by design all meta-data to interpret data)
- "Low level", like gridding (focusing on a very particular problem)
- Especially consider modifiability mechanisms:
  - How do we make sure that we can adjust interfaces if we find them to hinder progress?
  - How do we handle having multiple implementations of the same interface?
Define API(s) we would use to interact with them ("wrappers")
- We likely want at minimum an easy-to-use way to call these from Python. There should be a Python library you can install to gain access to processing functions
- However, we might also want to consider providing more general-purpose interfaces to allow execution engines to bind against it (i.e. something for tools like EAGLE to work from).
- Note that the Python library might just act as a thin interface layer to external tools. E.g. for visibility streaming it might just wrap the ska-sdp-dal call with some schema checks.
Data model considerations
- Are we still okay with Arrow as the base data models?
- Do we need a more lower-level representation for CUDA/C?
- Specifically for the Python implementation wrapping/unwrapping into more useful high-level data wrappers (numpy/awkward) is likely appropriate. Try to shoot for Xarray?
- Consider interaction with accelerators - e.g. would we try to treat on-accelerator memory objects the same, or have separate types/kernels? Responsibility for managing this would normally be on execution engines, but maybe we would also have wrappers do this automatically?
Consideration of design constraints from current processing interfaces compared against SDP architecture.

Attachments

Issue Links

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...