Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-2588

Objective evaluation of DALiuGE (vs Dask) for a workflow from RASCIL based on its Python functions

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Enabler
    • Should have
    • PI15
    • COM SDP SW
    • None
    • Data Processing
    • Hide

      The SDP approach to execution frameworks is that by having common processing functions we can leverage both or either off-the-shelf and custom solutions depending on where it is appropriate. Given that we now have a fairly decent integration of RASCIL processing functions with DALiuGE, this means we can now highlight this flexibility by demonstrating where one might have advantages over the other.

      Show
      The SDP approach to execution frameworks is that by having common processing functions we can leverage both or either off-the-shelf and custom solutions depending on where it is appropriate. Given that we now have a fairly decent integration of RASCIL processing functions with DALiuGE, this means we can now highlight this flexibility by demonstrating where one might have advantages over the other.
      • Objective scenarios (with measures) where DALiuGE would provide advantages over an off-the-shelf solution like Dask
      • Demonstration of such measures using a (prototype of a) realistic workflow
    • Intra Program
    • 5
    • 5
    • 0
    • Team_YANDA
    • Sprint 5
    • Hide

      All artifacts are linked to this Miro board: https://miro.com/app/board/uXjVOrMlbNU=/

      • Frame 1 has a summary of use cases, key actors and required skill sets.
      • Frame 2 to the right has a spider diagram representing the pedigree of the comparison between DALiuGE and Dask in the context of workflow management.
      • Frames 3 and 4 provide reference to system engineering artifacts, that is the architectural scope and requirement traceability.
      • Further frames depict five use cases, some with sub-variants.
      • Further down are runtime scenarios S1 and S2, also with variants.
      • An article about the SDP workflow scheduling problem was specifically written up: https://confluence.skatelescope.org/pages/viewpage.action?pageId=188646031

      AC1: Four use cases demonstrating DALiuGE-only capabilities have been identified and analyzed: navigate to Miro frames titled W1* and W3*, W4* and W5*.

      AC2: A fifth use case W2* and associated runtime scenario S2* based on a continuum imaging workflow using RASCIL contrasts DALiuGE against Dask, i.e., against a COTS solution. Again, see the spider diagram in Miro frame 2.

      Show
      All artifacts are linked to this Miro board: https://miro.com/app/board/uXjVOrMlbNU=/ Frame 1 has a summary of use cases, key actors and required skill sets. Frame 2 to the right has a spider diagram representing the pedigree of the comparison between DALiuGE and Dask in the context of workflow management. Frames 3 and 4 provide reference to system engineering artifacts, that is the architectural scope and requirement traceability. Further frames depict five use cases, some with sub-variants. Further down are runtime scenarios S1 and S2, also with variants. An article about the SDP workflow scheduling problem was specifically written up: https://confluence.skatelescope.org/pages/viewpage.action?pageId=188646031 AC1: Four use cases demonstrating DALiuGE-only capabilities have been identified and analyzed: navigate to Miro frames titled W1* and W3*, W4* and W5*. AC2: A fifth use case W2* and associated runtime scenario S2* based on a continuum imaging workflow using RASCIL contrasts DALiuGE against Dask, i.e., against a COTS solution. Again, see the spider diagram in Miro frame 2.
    • 17.5
    • Accepted by FO
    • PI22 - UNCOVERED

    • SPO-1784

    Description

      The overall aim of this ticket is to demonstrate that a custom solution has a concrete benefits vs an off-the-shelf solution in a situation which is relevant to our application.

      This is inspired by https://docs.google.com/presentation/d/1HniykqYz0FV2REUKLOQmQVKg7mtGU2DAajgjxvY9_DQ, which attempts to highlight strength & weaknesses of the different software systems, specifically:

      DALiuGE provides a clean separation between components, workflows, scheduling, partitioning and execution. It scales inversely proportional to the number of nodes up to at least 20 million graph nodes*. EAGLE provides a fairly complete workflow management system. DALiuGE does not (want to) touch on complex data structure parallelization directly, but leaves that up to the components. DALiuGE is implementing static workflow scheduling. Small user base.

      vs

      Dask is really strong when it comes to implementing complex, data-structure based algorithms. Widely used around the world. Does not (want to) provide a clean separation between components, workflows and execution. It scales directly proportional with number of nodes and number of edges. Does not provide any workflow management, or more abstract workflow development.

      Dask is using dynamic task scheduling.

      The observation is that quite a bit of this is quite subjective:

      • What counts as a "clean separation"? Is it even universally desirable in every situation? Integrated approaches often are quite unreasonably effective, and can inherently cover more edge cases
      • The fact that DALiuGE provides a "complete framework" could also be characterised as a disadvantage architecturally, as it makes it trickier to integrate without subscribing to many of its design decisions. If we are willing to believe that scheduling and executing tasks is the hardest part of it, presumably adding the remaining bits around it would be just as much work for off-the-shelf software as it was for DAliuGE (sunk cost nonwithstanding).
      • How exactly "workflow scheduling" differs from "task scheduling" for practical purposes is also not quite clear. The argument seems to be that we can embed one into the other?
      • The performance claims are hard to believe unless basically all interesting execution framework responsibilities (like scheduling and graph partitioning) are excluded.

      There's clearly different valid viewpoints on this that have been discussed in the past. However, the fact that we now have a RASCIL pipeline implemented in DALiuGE should offer us the possibility of exploring some of these points in a bit more depth. What we want to show are scenarios, relevant for SKA, where the off-the-shelf solution (Dask) is not sufficiently good but where the custom solution (DALiuGE) does better.

      Who?

      • Workflow developers

      What? (outcomes)

      • Objective measures of in what situations using DALiuGE over an off-the-shelf solution like Dask would provide advantages
      • Demonstration of such measures using a realistic workflow

      Why?

      • Demonstrate that we can make use of the best execution framework technology where and when appropriate

      References

      • ...

      Attachments

        Issue Links

          Structure

            Activity

              People

                b.nikolic Nikolic, Bojan
                b.mort Mort, Ben
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 5.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1530.0
                  Total1530.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel