Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3521

DM integration with compute with a Dask workflow discovering Rucio data with IVOA metadata (Dask + Rucio + IVOA)

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Feature
    • Must have
    • PI19
    • None
    • SRCnet
    • Hide

      Previously, Jupyterhub integration with Rucio data has been demonstrated with data discovered via the Rucio plugin (based on DIDs) and download it to a locally mounted Rucio storage element. This feature would explore other aspects of running user workflows on Jupyterhub, namely with IVOA discovered data and processing that data via Dask data frames utilising auth tokens without downloading it all locally. Uploading processed data back to the DL has not been demonstrated in the past and this is essential to complete the data lifecycle loop.

      Show
      Previously, Jupyterhub integration with Rucio data has been demonstrated with data discovered via the Rucio plugin (based on DIDs) and download it to a locally mounted Rucio storage element. This feature would explore other aspects of running user workflows on Jupyterhub, namely with IVOA discovered data and processing that data via Dask data frames utilising auth tokens without downloading it all locally. Uploading processed data back to the DL has not been demonstrated in the past and this is essential to complete the data lifecycle loop.
    • Hide

      Demo showing: In a JupyterHub notebook, List Rucio DIDs based on IVOA metadata, get Rucio PFNs from Datalink service, have Dask workers point to these storage locations with a valid access token , 'process' the data to generate new data with Dask, upload new data with new metadata to the DL.

      Show
      Demo showing: In a JupyterHub notebook, List Rucio DIDs based on IVOA metadata, get Rucio PFNs from Datalink service, have Dask workers point to these storage locations with a valid access token , 'process' the data to generate new data with Dask, upload new data with new metadata to the DL.
    • 1.5
    • 1.5
    • 0
    • Team_MAGENTA
    • Sprint 5
    • Show
      Workflow temporarily placed at: https://confluence.skatelescope.org/display/SRCSC/SP-3521+DM+integration+with+compute+with+a+Dask+workflow Demoed in: https://confluence.skatelescope.org/display/SRCSC/2023-08-17+SRC+ART+System+Demo+19.5+Part+2 Pre-recorded demo link: https://drive.google.com/file/d/1ZL0TuxLmLXBbPBZ4gJqRKONEoTfgmiIR/view?usp=drive_link  
    • 20.4
    • Stories Completed, Integrated, Outcomes Reviewed, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO
    • PI24 - UNCOVERED

    • PI19-PB

    Description

      This will involve

      • building a user environment with both a Dask client and rucio client (such that data and compute actions can be performed from one place)(this may/may not involve centos based image) 
      • Build dask data frames with tokens in request headers
      • understand how token-based access to remote files would work for Dask to build data frames by passing tokens in the header
      • understand how to run the workflow (using Bob's workflow built in PI18)
      • add the data, metadata, replicate as needed 

      Attachments

        Issue Links

          Structure

            Activity

              People

                Jesus.Salgado Salgado, Jesus
                r.bolton Bolton, Rosie
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 1.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1013.0
                  Total1013.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel