Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-673

Gain experience of Jupyter + Dask + k8s workflows in an SRC-like environment

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Data Processing
    • Hide

      Jupyter notebooks, accelerated by a scalable data processing framework such as Dask and deployed using Kubernetes, havev already been adopted by big-data geoscience platforms like Pangeo. This model may prove to be useful for developing and deploying SKA workflows. In order to assess if this is the case, it is useful to gain more experience with these technologies.

      Show
      Jupyter notebooks, accelerated by a scalable data processing framework such as Dask and deployed using Kubernetes, havev already been adopted by big-data geoscience platforms like Pangeo. This model may prove to be useful for developing and deploying SKA workflows. In order to assess if this is the case, it is useful to gain more experience with these technologies.
    • Hide
      • Identify, demonstrate, and document steps taken to run a simple Jupyter notebook, accelerated by a Dask cluster deployed using Kubernetes. This notebook should implement a simple yet radio astronomy data processing relevant processing script, examples of which can already be found in the SDP ARL.
      • Stretch: Helm deployment script developed and added to the SKA helm chart repo.
      Show
      Identify, demonstrate, and document steps taken to run a simple Jupyter notebook, accelerated by a Dask cluster deployed using Kubernetes. This notebook should implement a simple yet radio astronomy data processing relevant processing script, examples of which can already be found in the SDP ARL. Stretch: Helm deployment script developed and added to the SKA helm chart repo.
    • 2
    • 2
    • 2
    • Team_ESCAPEES
    • Sprint 5
    • Show
      See: https://confluence.skatelescope.org/pages/viewpage.action?pageId=96174408
    • 5.6
    • PI24 - UNCOVERED

    • Team_ESCAPEES goal_D1

    Description

      Gain the experience required to test and demonstrate the use of Jupyter notebooks (possibly via Jupyter hub), accelerated by a Dask cluster, and deployed using Kubernetes for executing SKA workflows on an SRC-like environment.

      This approach and technology stack is gaining rapid adoption and ever increasing sophistication in recent years and has already been adopted by big-data geoscience platforms such as Pangeo. It would therefore be useful to assess if this model might also prove to be a useful approach for developing SKA1 workflows. This is an attractive option for workflow development as it would allow semi-interactive data analysis during development and commissioning of the telescope, but also batch execution (eg using papermill) in the SDP system using the same representation.

      Note that a number of Jupyter notebook Dask workflows are available as part of the SDP ARL that could be used for this test.

      Attachments

        Issue Links

          Structure

            Activity

              People

                b.mort Mort, Ben
                b.mort Mort, Ben
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 2.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete523.0
                  Total523.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel