Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-1675

Define processing function library interface design

Details

    • Spike
    • Must have
    • PI11
    • COM SDP SW
    • None
    • Data Processing
    • Hide

      A processing function library, with well-defeined interfaces and abstraction, is needed to provide the foundation and building blocks for future SDP workflow development. Addressing remaining design questions early is important to proceed with SDP workflow development in construction. 

      Show
      A processing function library, with well-defeined interfaces and abstraction, is needed to provide the foundation and building blocks for future SDP workflow development. Addressing remaining design questions early is important to proceed with SDP workflow development in construction. 
    • Hide

      Conduct a spike meeting with the involved teams, leading the following agreed-on artefacts:

      • Processing component library repository created
      • First kernel interface defined (gridding?) - Add documentation of interface, i.e. data models and abstract calling convention
      Show
      Conduct a spike meeting with the involved teams, leading the following agreed-on artefacts: Processing component library repository created First kernel interface defined (gridding?) - Add documentation of interface, i.e. data models and abstract calling convention
    • 3
    • 3
    • 20
    • Team_HIPPO, Team_NZAPP, Team_SCHAAP, Team_YANDA
    • Sprint 3
    • Hide

      Two broad meetings were held where the scope of the processing functions was discussed, together with interfaces. Without clear consensus on the scope of the processing functions, we have proposed a unified interface for processing functions that would allow execution of the processing functions by a scheduler without explicitly knowing functions attributes which could be defined elsewhere independently on the scheduler. For example, in a DAG together with a list of functions to execute. The code is at this GitHub repository: https://github.com/KAdamek/task_runner and its description at https://confluence.skatelescope.org/display/SE/HIP-27+-+Proposal+of+the+Interface+for+Processing+Functions

      Show
      Two broad meetings were held where the scope of the processing functions was discussed, together with interfaces. Without clear consensus on the scope of the processing functions, we have proposed a unified interface for processing functions that would allow execution of the processing functions by a scheduler without explicitly knowing functions attributes which could be defined elsewhere independently on the scheduler. For example, in a DAG together with a list of functions to execute. The code is at this GitHub repository: https://github.com/KAdamek/task_runner and its description at https://confluence.skatelescope.org/display/SE/HIP-27+-+Proposal+of+the+Interface+for+Processing+Functions
    • 11.6
    • Stories Completed, Solution Intent Updated, Outcomes Reviewed, Satisfies Acceptance Criteria, Accepted by FO

    Description

      Spike: review, discuss, and progress detailed processing function library interface.

      • Discuss, define and document ( ! ) a number of processing function interfaces. At minimum:
        • "High level", like visibility stream currently implemented by receive (containing by design all meta-data to interpret data)
        • "Low level", like gridding (focusing on a very particular problem)
        • Especially consider modifiability mechanisms:
          • How do we make sure that we can adjust interfaces if we find them to hinder progress?
          • How do we handle having multiple implementations of the same interface?
      • Define API(s) we would use to interact with them ("wrappers")
        • We likely want at minimum an easy-to-use way to call these from Python. There should be a Python library you can install to gain access to processing functions
        • However, we might also want to consider providing more general-purpose interfaces to allow execution engines to bind against it (i.e. something for tools like EAGLE to work from).
        • Note that the Python library might just act as a thin interface layer to external tools. E.g. for visibility streaming it might just wrap the ska-sdp-dal call with some schema checks.
      • Data model considerations
        • Are we still okay with Arrow as the base data models?
        • Do we need a more lower-level representation for CUDA/C?
        • Specifically for the Python implementation wrapping/unwrapping into more useful high-level data wrappers (numpy/awkward) is likely appropriate. Try to shoot for Xarray?
        • Consider interaction with accelerators - e.g. would we try to treat on-accelerator memory objects the same, or have separate types/kernels? Responsibility for managing this would normally be on execution engines, but maybe we would also have wrappers do this automatically?
      • Consideration of design constraints from current processing interfaces compared against SDP architecture.

      Attachments

        Issue Links

          Structure

            Activity

              People

                p.wortmann Wortmann, Peter
                f.graser Graser, Ferdl
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 3.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete81.0
                  Total81.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel