Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-2926

Consolidate / evolve SDP (AA2) Low self-calibration pipeline

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Data Processing
    • Hide

      A configurable and maintainable self-calibration workflow for doing direction-dependent self-calibration and imaging at AA2+ scale is needed for SKA-LOW to demonstrate the SDP strategy for scaling beyond current data processing systems and to SKA scale. 

      The current epic completes well before AA2 starts observing; the goal is not complete functionality but rather a demonstration of the scaling on the challenging problem.

      Show
      A configurable and maintainable self-calibration workflow for doing direction-dependent self-calibration and imaging at AA2+ scale is needed for SKA-LOW to demonstrate the SDP strategy for scaling beyond current data processing systems and to SKA scale.  The current epic completes well before AA2 starts observing; the goal is not complete functionality but rather a demonstration of the scaling on the challenging problem.
    • Hide
      • A bash script is developed that uses the command line interface of DP3 and WSClean to define a direction-dependent calibration and imaging pipeline suitable for reducing the LOFAR test data set.
      • This bash script is used to make an initial assessment of performance bottlenecks in this pipeline. This assessment is augmented with experience from the LOFAR production pipelines, which will be collected on a Confluence page.
      • A RASCIL workflow is developed reproducing the results obtained on the LOFAR test data set by DP3 and WSClean as presented in PI16. This workflow is the starting point for evolving towards the SKA-LOW workflow needed to test performance scaling of the SKA architecture towards AA2 scale and beyond.
      • As a first step in this evaluation, multiple major cycles are added in deconvolution, thereby enabling a performance comparison between a monolithic implementation (WSClean) and a more modular implementation adhering to the SKA architecture.
      Show
      A bash script is developed that uses the command line interface of DP3 and WSClean to define a direction-dependent calibration and imaging pipeline suitable for reducing the LOFAR test data set. This bash script is used to make an initial assessment of performance bottlenecks in this pipeline. This assessment is augmented with experience from the LOFAR production pipelines, which will be collected on a Confluence page. A RASCIL workflow is developed reproducing the results obtained on the LOFAR test data set by DP3 and WSClean as presented in PI16. This workflow is the starting point for evolving towards the SKA-LOW workflow needed to test performance scaling of the SKA architecture towards AA2 scale and beyond. As a first step in this evaluation, multiple major cycles are added in deconvolution, thereby enabling a performance comparison between a monolithic implementation (WSClean) and a more modular implementation adhering to the SKA architecture.
    • Inter Program
    • 13
    • 13
    • 0
    • Team_SCHAAP
    • Sprint 5
    • Hide
      • A bash script was developed performing DD calibration and imaging on a LOFAR test data set of limited size, which required some careful tuning. The bash script is described on https://confluence.skatelescope.org/display/SE/Selfcal+bash+script, the tuning efforts are described on https://confluence.skatelescope.org/display/SE/DD+calibration+pipeline
      • The tuning efforts indicate that the AA2 pipeline should follow a strategy like the LOFAR production pipeline, in which amplitude and phase calibration are done separately to ensure both perform robustly, thus avoiding a trial-and-error process to find suitable algorithm settings.
        Experience on performance bottlenecks in the current LOFAR production pipelines was shared during the architecture workshop at ASTRON. Notes from this workshop can be found at https://confluence.skatelescope.org/display/SE/2022-12-15+ASTRON+architecture+workshop. Visibility prediction is a major bottleneck. A significant step forward would be to be able to buffer predicted visibilities for re-use during a number of consecutive phase-only and amplitude-only calibration steps. This should not require buffering predicted visibilities for the full data set, but only for a number of solution intervals as required by the calibration functions. A feature (AST-3186) has been proposed for this.
      • We have developed a pipeline in RASCIL combining direction-independent (DI) calibration using DP3's GainCal step, applying calibration solutions to get calibrated visibilities using DP3's ApplyCal step, gridding of the calibrated visibilities and FFT using native RASCIL functions and image-based deconvolution in Radler. The results are reported on https://confluence.skatelescope.org/pages/viewpage.action?pageId=212309056. Unfortunately, we discovered a bug (AST-1190) which prevented us from running the workflow successfully on a LOFAR test data set. Despite the fact that we progressed less far than we anticipated, these integration issues and associated efforts stirred on number of useful discussions on how to progress the SKA-LOW pipeline over the coming PIs.
      • As integration issues prevented us from getting to the point of running a linear workflow on a LOFAR test data set (demonstrating AA2 scale) and it was found that the not the imaging part but the calibration part is the bottleneck, we started working on making a transition from the use of Casacore arrays to XTensor as described on https://confluence.skatelescope.org/display/SE/Transition+to+XTensor a.o. to facilitate further performance improvements to DP3. We have raised a feature (SP-3205) to finish this work in the next PI at a sensible point (enabling use of XTensor, but without cutting the dependency on Casacore arrays and without making XTensor optimisations to the code). Although this is a deviation from the original plan for this PI, we felt this was a better direction than having even more development capacity spent unfruitfully in waiting for a compute node to finish for a pipeline to complete. As discussed with our PM (Danielle), this was one of the main things we learned during this PI: we need to strike a balance between pipeline development and code development to improve / add new functionality in order to use our resources effectively.
      • This feature was demoed during System Demo 17.6.
      Show
      A bash script was developed performing DD calibration and imaging on a LOFAR test data set of limited size, which required some careful tuning. The bash script is described on https://confluence.skatelescope.org/display/SE/Selfcal+bash+script , the tuning efforts are described on https://confluence.skatelescope.org/display/SE/DD+calibration+pipeline .  The tuning efforts indicate that the AA2 pipeline should follow a strategy like the LOFAR production pipeline, in which amplitude and phase calibration are done separately to ensure both perform robustly, thus avoiding a trial-and-error process to find suitable algorithm settings. Experience on performance bottlenecks in the current LOFAR production pipelines was shared during the architecture workshop at ASTRON. Notes from this workshop can be found at https://confluence.skatelescope.org/display/SE/2022-12-15+ASTRON+architecture+workshop . Visibility prediction is a major bottleneck. A significant step forward would be to be able to buffer predicted visibilities for re-use during a number of consecutive phase-only and amplitude-only calibration steps. This should not require buffering predicted visibilities for the full data set, but only for a number of solution intervals as required by the calibration functions. A feature (AST-3186) has been proposed for this. We have developed a pipeline in RASCIL combining direction-independent (DI) calibration using DP3's GainCal step, applying calibration solutions to get calibrated visibilities using DP3's ApplyCal step, gridding of the calibrated visibilities and FFT using native RASCIL functions and image-based deconvolution in Radler. The results are reported on https://confluence.skatelescope.org/pages/viewpage.action?pageId=212309056 . Unfortunately, we discovered a bug (AST-1190) which prevented us from running the workflow successfully on a LOFAR test data set. Despite the fact that we progressed less far than we anticipated, these integration issues and associated efforts stirred on number of useful discussions on how to progress the SKA-LOW pipeline over the coming PIs. As integration issues prevented us from getting to the point of running a linear workflow on a LOFAR test data set (demonstrating AA2 scale) and it was found that the not the imaging part but the calibration part is the bottleneck, we started working on making a transition from the use of Casacore arrays to XTensor as described on  https://confluence.skatelescope.org/display/SE/Transition+to+XTensor  a.o. to facilitate further performance improvements to DP3. We have raised a feature ( SP-3205 ) to finish this work in the next PI at a sensible point (enabling use of XTensor, but without cutting the dependency on Casacore arrays and without making XTensor optimisations to the code). Although this is a deviation from the original plan for this PI, we felt this was a better direction than having even more development capacity spent unfruitfully in waiting for a compute node to finish for a pipeline to complete. As discussed with our PM (Danielle), this was one of the main things we learned during this PI: we need to strike a balance between pipeline development and code development to improve / add new functionality in order to use our resources effectively. This feature was demoed during System Demo 17.6 .
    • 19.3
    • Outcomes Reviewed, Demonstrated
    • PI24 - UNCOVERED

    Description

      Description

      At the end of PI16, a rudimentary SKA-LOW pipeline consisting of a direction-independent calibration step, correction for direction-independent effects, imaging and deconvolution (single major cycle) was demonstrated on a small LOFAR data set (outcome of SP-2697). This pipeline needs to evolve / be consolidated further towards a self-calibration pipeline for SKA-Low at AA2+ scale, including required documentation and sample data. Possible next steps could be

      • Direction-dependent calibration – this is required for the scaling demonstration
      • For self-calibration, an initial source model for the observed field is required. LOFAR solves this problem by transferring solution from a calibrator observation on a nearby calibrator sources, so that initial direction-independent corrections can be made to bootstrap the self-calibration cycle. Although calibration transfer is not mentioned explicitly in the use cases for SKA-LOW, it is mentioned in the use case for SKA-MID, making this a potentially interesting feature.
      • Extend / augment rudimentary pipeline with a self-calibration loop and demonstrate this on a LOFAR observation (~AA2 scale).

      Who?

      • SKA Software Stakeholders

      What?

      • The goal is reaching the scaling demonstration. First step of constructing reference pipeline entirely from existing software is the starting point
      • The strategy for reaching the scaling is to evolve the existing LOFAR toward better scaling according to the SDP Software architecture.
        • The goal is not blind pursuit of the strategy. If following strategy does not seem to completing the goal, we change the strategy.
      • Tested on LOFAR data or other low-frequency data with qualities similar to Low
      • Defines a sequence of processing steps that defines a minimal workflow producing a limited-quality but meaningful end product.
      • Capability to be iterated into full deconvolution and DD calibration pipeline
      • Can use coarse-grained processing components adapted from existing software

      Why?

      • The key challenge of SKA data processing that we have identified in advance is the scaling challenge. Even at AA2 observations are going to be of order 100 TBs, which is difficult to process in good time with existing software. Demonstrating the scaling to AA2 scale in test by Q1 2023 is essential in order to have confidence of scaling to the full SKA scale, in production, in time for the completion of the arrays.

      Attachments

        Issue Links

          Structure

            Activity

              People

                b.nikolic Nikolic, Bojan
                s.wijnholds Wijnholds, Stefan
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 13.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete1232.0
                  Total1232.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel