Details
-
Enabler
-
Should have
-
None
-
Data Processing
-
-
- Objective scenarios (with measures) where DALiuGE would provide advantages over an off-the-shelf solution like Dask
- Demonstration of such measures using a (prototype of a) realistic workflow
-
Intra Program
-
5
-
5
-
0
-
Team_YANDA
-
Sprint 5
-
-
-
-
17.5
-
Accepted by FO
-
-
SPO-1784
Description
The overall aim of this ticket is to demonstrate that a custom solution has a concrete benefits vs an off-the-shelf solution in a situation which is relevant to our application.
This is inspired by https://docs.google.com/presentation/d/1HniykqYz0FV2REUKLOQmQVKg7mtGU2DAajgjxvY9_DQ, which attempts to highlight strength & weaknesses of the different software systems, specifically:
DALiuGE provides a clean separation between components, workflows, scheduling, partitioning and execution. It scales inversely proportional to the number of nodes up to at least 20 million graph nodes*. EAGLE provides a fairly complete workflow management system. DALiuGE does not (want to) touch on complex data structure parallelization directly, but leaves that up to the components. DALiuGE is implementing static workflow scheduling. Small user base.
vs
Dask is really strong when it comes to implementing complex, data-structure based algorithms. Widely used around the world. Does not (want to) provide a clean separation between components, workflows and execution. It scales directly proportional with number of nodes and number of edges. Does not provide any workflow management, or more abstract workflow development.
Dask is using dynamic task scheduling.
The observation is that quite a bit of this is quite subjective:
- What counts as a "clean separation"? Is it even universally desirable in every situation? Integrated approaches often are quite unreasonably effective, and can inherently cover more edge cases
- The fact that DALiuGE provides a "complete framework" could also be characterised as a disadvantage architecturally, as it makes it trickier to integrate without subscribing to many of its design decisions. If we are willing to believe that scheduling and executing tasks is the hardest part of it, presumably adding the remaining bits around it would be just as much work for off-the-shelf software as it was for DAliuGE (sunk cost nonwithstanding).
- How exactly "workflow scheduling" differs from "task scheduling" for practical purposes is also not quite clear. The argument seems to be that we can embed one into the other?
- The performance claims are hard to believe unless basically all interesting execution framework responsibilities (like scheduling and graph partitioning) are excluded.
There's clearly different valid viewpoints on this that have been discussed in the past. However, the fact that we now have a RASCIL pipeline implemented in DALiuGE should offer us the possibility of exploring some of these points in a bit more depth. What we want to show are scenarios, relevant for SKA, where the off-the-shelf solution (Dask) is not sufficiently good but where the custom solution (DALiuGE) does better.
Who?
- Workflow developers
What? (outcomes)
- Objective measures of in what situations using DALiuGE over an off-the-shelf solution like Dask would provide advantages
- Demonstration of such measures using a realistic workflow
Why?
- Demonstrate that we can make use of the best execution framework technology where and when appropriate
References
- ...