Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4651

Further increase parallelism in ICAL pipeline

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Enabler
    • High
    • PI25
    • COM SDP SW
    • True
    • Data Processing
    • Hide

      We want our ICAL pipelines to be able to keep up with the Low and Mid telescopes at AA2 scale, i.e., these pipelines should take at most twice the duration of the processed observation.

      Show
      We want our ICAL pipelines to be able to keep up with the Low and Mid telescopes at AA2 scale, i.e., these pipelines should take at most twice the duration of the processed observation.
    • Hide
      1. Systematic profile analysis, identifying phases and what currently bottlenecks their CPU utilisation
      2. Resolve the most significant bottleneck
      3. Provide benchmarks for Low and Mid self-calibration pipelines including extrapolation to AA2 scale
      Show
      Systematic profile analysis, identifying phases and what currently bottlenecks their CPU utilisation Resolve the most significant bottleneck Provide benchmarks for Low and Mid self-calibration pipelines including extrapolation to AA2 scale
    • 8
    • 8
    • 0
    • PI24 - UNCOVERED

    • AA2

    Description

      Profiles of our runs show that we have largely successfully managed to distribute the expensive operations of ICAL. However, it looks like (now/still) a lot of time is spent in phases where not much parallel work is happening at all - in some cases it actually looks like we spend hours using just a few (or even just one) core on the master node. The net result is that we are likely only using <5% of the compute available to us.

      What?

      • Identify all major phases where CPU utilisation drops below ~50% for the 3-node run. This might require improving instrumentation / loging. Ideally we would use more representative dataset sizes where possible, and check why we are sometimes seeing different results despite using the same parameters.
      • Determine (informally) why these phases currently take as long as they do, and why they don't use more nodes (or threads)
      • Resolve the most significant bottleneck by working on and possibly re-distributing processing functions - by doing (at least) one of the following:
      • Should have a serious look at sky model filtering, and whether it can be sped up or distributed effectively (or integrate existing solution?)
      • Attempt to distribute deconvolution processing functions (thinking about how to make this work with RADLER would be very valuable long-term)
      • Reduce memory usage of calibration to prevent swapping (average visibilities / normal eqs?). Ideally have a mechanism to balance gridding efficiency (favours large time and frequency intervals) and memory usage (favours short time and frequency intervals).

       

      See frame on DP ART board: https://miro.com/app/board/uXjVK6Lrdw4=/?moveToWidget=3458764597687428890&cot=14 

      Attachments

        Structure

          Activity

            People

              p.wortmann Wortmann, Peter
              f.graser Graser, Ferdl
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Feature Progress

                Story Point Burn-up: (0%)

                Feature Estimate: 8.0

                IssuesStory Points
                To Do00.0
                In Progress   00.0
                Complete00.0
                Total00.0

                Dates

                  Created:
                  Updated:

                  Structure Helper Panel