Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3795

Prototype distributed visibility retrieval from buffer

Details

    • Feature
    • Should have
    • PI20
    • COM SDP SW
    • None
    • Data Processing
    • Hide

      Who?

      • Pipeline developers
      • Platform developers

      What?

      • A better idea of how effectively the considered technologies could utilise storage infrastructure for the purpose of retrieving visibilites

      Why?

      • Persistent source of bottlenecks in existing software
      • Potentially far-reaching consequences for design of new pipeline software
      • Might also affect platform design and interfaces (object stores?)
      Show
      Who? Pipeline developers Platform developers What? A better idea of how effectively the considered technologies could utilise storage infrastructure for the purpose of retrieving visibilites Why? Persistent source of bottlenecks in existing software Potentially far-reaching consequences for design of new pipeline software Might also affect platform design and interfaces (object stores?)
    • Hide
      • Develop a prototype that loads a visibility data set from disk in a distributed fashion
        • Sized appropriately for an SKA use case - at least AA2, ideally AA3+
        • Utilise technologies currently envisioned to be used mid-term (xarray, zarr, dask)
        • Implement different access patterns (base: frequency slices, then time order, advanced: by u/v/w chunks)
      • Identify platforms to run benchmarks on
      • Measure throughput of prototype for identified datasets and access patterns
      Show
      Develop a prototype that loads a visibility data set from disk in a distributed fashion Sized appropriately for an SKA use case - at least AA2, ideally AA3+ Utilise technologies currently envisioned to be used mid-term (xarray, zarr, dask) Implement different access patterns (base: frequency slices, then time order, advanced: by u/v/w chunks) Identify platforms to run benchmarks on Measure throughput of prototype for identified datasets and access patterns
    • 2
    • 2
    • 0
    • Team_PANDO
    • Sprint 4
    • Hide

      Links

      Demo slides: /edit?usp=drive_link&ouid=105107180717003823827&rtpof=true&sd=true

      GitHub repo (to be migrated to SKA GitLab): https://github.com/sstansill/msconverter/tree/main

      GitHub repo including demo notebook: https://github.com/sstansill/msconverter/tree/converter_demo

       

      Outcomes

      We have produced a prototype which converts a measurement set to a chunked and compressed zarr store. Parallel reading is supported by the xarray library, and chunk size is optimised to ~>100MB as recommended by Dask best practises.

      The converter was demoed (slides: https://docs.google.com/presentation/d/1MPrI3OMuR3_Zn5T7U31TMZ2z_Sc4k88T) to DP ART. Local benchmarking on an M2 Pro MacBook Pro (512GB) shows that we can saturate the SSD I/O using 7/8 cores (~3GB/s read). Preliminary CSD3 benchmarking shows the Lustre I/O is not saturated using 32 cascade lake CPU cores.

      CSD3 zarr effective read rate on zarr store converted from a 43G MeasurementSet:

       

      Cores Memory (GiB) Read Throughput (GiB/s)
      8 16 1.52
      16 32 2.18
      32 64 3.31

       

      Benchmarking was an embarrassingly parallel summation of the complex visibilities, used as a proxy for raw read speeds. Casacore read rates on a MeasurementSet peaks at ~250MB/s using CSD3's Lustre storage giving a ~10x speedup.

       

      Next Steps

      Show
      Links Demo slides: /edit?usp=drive_link&ouid=105107180717003823827&rtpof=true&sd=true GitHub repo (to be migrated to SKA GitLab): https://github.com/sstansill/msconverter/tree/main GitHub repo including demo notebook: https://github.com/sstansill/msconverter/tree/converter_demo   Outcomes We have produced a prototype which converts a measurement set to a chunked and compressed zarr store. Parallel reading is supported by the xarray library, and chunk size is optimised to ~>100MB as recommended by Dask best practises. The converter was demoed (slides: https://docs.google.com/presentation/d/1MPrI3OMuR3_Zn5T7U31TMZ2z_Sc4k88T ) to DP ART. Local benchmarking on an M2 Pro MacBook Pro (512GB) shows that we can saturate the SSD I/O using 7/8 cores (~3GB/s read). Preliminary CSD3 benchmarking shows the Lustre I/O is not saturated using 32 cascade lake CPU cores. CSD3 zarr effective read rate on zarr store converted from a 43G MeasurementSet:   Cores Memory (GiB) Read Throughput (GiB/s) 8 16 1.52 16 32 2.18 32 64 3.31   Benchmarking was an embarrassingly parallel summation of the complex visibilities, used as a proxy for raw read speeds. Casacore read rates on a MeasurementSet peaks at ~250MB/s using CSD3's Lustre storage giving a ~10x speedup.   Next Steps Migrate GitHub repo to GitLab (risk: we should merge into xradio and mirror xradio) Add tests to comply with SKA coding standards Improve chunk management for AA* scale data sets which will produce billions of tasks in a Dask graph Implement partitions as well as chunks to work towards planned self-calibration design ( https://docs.google.com/document/d/1AtTlGSHDuwnPDL_c8aTp1oJNDWPyLEIRvlev81UInfc/edit?usp=sharing ) Investigate u/v/w traversal either within a chunk or across the whole data set to work towards planned self-calibration design ( https://docs.google.com/document/d/1AtTlGSHDuwnPDL_c8aTp1oJNDWPyLEIRvlev81UInfc/edit?usp=sharing ) Improve metadata coverage by the converter (risk 1: SKA Measurement Set standard not finalised. risk 2: compatibility with other radio telescopes not guaranteed which prevents merging with xradio)
    • PI22 - UNCOVERED

    Description

      For visibility processing we will need to read back petabytes of visibility data from the SDP buffer, and potentially hundreds of nodes in parallel. This is not what existing data access layers (e.g. casacore) were designed to do, and it is therefore expected that we will need to investigate alternative technologies.

      As part of the NRAO collaboration, we are specifically considering using xarray APIs with zarr backends. This feature is to develop prototypes to investigate whether this can achieve throughput appropriate for SKA-style use cases.

      This could potentially inform a number of ongoing efforts, both long term (data modelling / API design, technology selection) as well as short term (option for fixing WSClean bottlenecks / use for benchmarking prospective platforms).

      Attachments

        Issue Links

          Structure

            Activity

              People

                p.wortmann Wortmann, Peter
                p.wortmann Wortmann, Peter
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (86.36%)

                  Feature Estimate: 2.0

                  IssuesStory Points
                  To Do13.0
                  In Progress   00.0
                  Complete619.0
                  Total722.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel