Loading...

Xporter

XML

Word

Printable

Details

Type: Feature
Priority: Must have
Fix Version/s: PI20
Component/s: COM SDP SW
Labels:
None

ARTs:

Data Processing
Benefit hypothesis:

Hide

See description

Show
See description
Acceptance criteria:
Hide

The threading performance issue in DP3 identified during benchmarking in PI19 has been resolved.

The I/O performance of WSClean has been improved thereby reducing the I/O performance bottleneck identified in PI19.

New benchmarking results for the SKA-Low workflow have been provided after addressing the performance issues identified in PI19. The results are reported in SI (https://confluence.skatelescope.org/display/SWSI/Scaling+progress+report), including evidence to demonstrate performance improvements.

Besides this, a more detailed report on the benchmarking results is provided assessing the extend to which the current workflow is representative for what is needed for the AA2( +) scaling challenge defined on https://confluence.skatelescope.org/pages/viewpage.action?pageId=200031968 to enable a reasonable estimate of where we stand with respect to achieving that goal as well as identifying the "next bottlenecks" in the workflow thereby informing work for the next PI and starting the next Plan-Do-Check-Adjust cycle.

A pipeline release containing details of the pipeline, instructions for accessing a test dataset alongside instructions of pipeline setup and usage (with parameter settings) to enable those outside of the development team to run the pipeline.

Regular discussion with FO to communicate progress.
Show
The threading performance issue in DP3 identified during benchmarking in PI19 has been resolved. The I/O performance of WSClean has been improved thereby reducing the I/O performance bottleneck identified in PI19. New benchmarking results for the SKA-Low workflow have been provided after addressing the performance issues identified in PI19. The results are reported in SI ( https://confluence.skatelescope.org/display/SWSI/Scaling+progress+report ), including evidence to demonstrate performance improvements. Besides this, a more detailed report on the benchmarking results is provided assessing the extend to which the current workflow is representative for what is needed for the AA2( +) scaling challenge defined on https://confluence.skatelescope.org/pages/viewpage.action?pageId=200031968 to enable a reasonable estimate of where we stand with respect to achieving that goal as well as identifying the "next bottlenecks" in the workflow thereby informing work for the next PI and starting the next Plan-Do-Check-Adjust cycle. A pipeline release containing details of the pipeline, instructions for accessing a test dataset alongside instructions of pipeline setup and usage (with parameter settings) to enable those outside of the development team to run the pipeline. Regular discussion with FO to communicate progress.
Feature Points:
18
Initial Size:
18
WSJF:
0
Epic Link:
Data Processing Pipeline Development for AA2 onwards
Agile Teams:

Team_PANDO, Team_SCHAAP
Due Sprint:
Sprint 5
Story Point Burn-up:
Overdue:
Outcomes:
Hide

The threading issue in DP3 was resolved. This was demonstrated during System Demo 20.4.

Team Pando investigated the raw I/O performance of WSClean and arrived at the conclusion that that is not an issue. Team Pando therefore picked up the work on more efficient reading of the visibilities in facet imaging aiming for reading once for all facets instead of once per facet. They have developed a prototype that is currently undergoing architectural review. Continuation of this work is a good match for SP-3821, so we hope to continue this effort in PI21.

New benchmarking results for the SKA-Low pipeline are available. These are reported in a new, more informative template, see https://confluence.skatelescope.org/display/SE/November+2023+SKA-LOW+SelfCal+Benchmarks. These results have been discussed during the Processing CoP on Monday, November 20 and presented during the PI Demo on Thursday, November 23.

Due to several cases of illness in Team Schaap, we did not manage to make all pipeline improvements that we planned for (AST-1397, AST-1398, AST-1399 and AST-1401) and did not manage to make a release of the SKA-Low pipeline (AST-1430). This will roll over to the next PI and fits well under SP-3859.
Show
The threading issue in DP3 was resolved. This was demonstrated during System Demo 20.4. Team Pando investigated the raw I/O performance of WSClean and arrived at the conclusion that that is not an issue. Team Pando therefore picked up the work on more efficient reading of the visibilities in facet imaging aiming for reading once for all facets instead of once per facet. They have developed a prototype that is currently undergoing architectural review. Continuation of this work is a good match for SP-3821 , so we hope to continue this effort in PI21. New benchmarking results for the SKA-Low pipeline are available. These are reported in a new, more informative template, see https://confluence.skatelescope.org/display/SE/November+2023+SKA-LOW+SelfCal+Benchmarks . These results have been discussed during the Processing CoP on Monday, November 20 and presented during the PI Demo on Thursday, November 23. Due to several cases of illness in Team Schaap, we did not manage to make all pipeline improvements that we planned for (AST-1397, AST-1398, AST-1399 and AST-1401) and did not manage to make a release of the SKA-Low pipeline (AST-1430). This will roll over to the next PI and fits well under SP-3859 .

Demos:
- DP_ART_20.4
Requirement Status:

PI22 - UNCOVERED
Goals_MIRO:
SDP-G1

Description

We need to be able to self-calibrate AA2-scale data (define what this means) efficiently and given the large datasets, this will require highly parallelised and distributed processing. In order to understand how we effectively scale (i.e. distribute the processing) we need to provide a pipeline that includes the minimal functionality required that reflects the computationally and I/O intensive parts of this processing. For the Mid calibration pipeline this includes DI calibration, DD calibration, sky model prediction, imaging (gridding/degridding) and deconvolution (optionally distributed imaging?).

Current progress has provided this functionality for Low, with a recently established ability to use multiple nodes (SP-3446) by distributing over time (in calibration) and frequency (in imaging). Given the necessary lead-time to optimise and incorporate the full functionality eventually needed, it is important that we test this pipeline distribution as soon as possible to ensure this will scale as required and make improvements as needed.

In order to achieve this aim, in PI20 we need to test this pipeline over multiple (3-5) nodes with a the LOFAR test data as well as a Low representative (scaled-down for development) simulated dataset. It is also key that we can establish and understand the performance of this pipeline (both per node and over multiple nodes) so we are able to confirm that it scales as expected. As part of this work, we need to make improvements/optimisations to the overall performance based on areas of the processing shown to be limiting performance. One primary area highlighted so far is the imaging stage for which intra- and inter-node improved distribution and performance should be investigated. Specifically for the current implementation with wsclean this could build on potential optimisations of the I/O usage within wsclean by memory-storage and re-use of information between sub-tasks.

The scope of this will need to be finalised fully during planning following discussions with the contributing teams, though should ensure that a specific parameter-set and dataset is agreed for any testing and performance comparisons and that contributions from different teams is captured in updated acceptance criteria.

As we are now at a stage of the development where we have a strong internal (multiple team contribution) and external (e.g. SRC) requests we need to provide pipeline releases, this work should also include minimal effort to establish a pipeline release artefact to be placed in the CAR.

Continuation of work in PI19 to demonstrate and assess the performance of the distributed Low self-calibration pipeline. Should consider:

testing with representative simulated Low data as well as LOFAR test set currently being used
Compared overall performance over single to several nodes to assess scaling effectiveness
Assessment of distribution design

Attachments

Issue Links

mentioned in: Page Loading...

Structure

Activity

People

Assignee:: Fenech, Danielle

Reporter:: Fenech, Danielle

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Feature Progress

Story Point Burn-up: (100.00%)

Feature Estimate: 18.0

	Issues	Story Points
To Do	0	0.0
In Progress	0	0.0
Complete	28	92.0
Total	28	92.0

Dates

Created:: 13/Aug/23 2:40 PM

Updated:: 15/Apr/24 9:36 AM

Distributed Low-pipeline