Details
-
Enabler
-
High
-
True
-
Data Processing
-
-
-
8
-
8
-
0
-
-
-
-
AA2
Description
Profiles of our runs show that we have largely successfully managed to distribute the expensive operations of ICAL. However, it looks like (now/still) a lot of time is spent in phases where not much parallel work is happening at all - in some cases it actually looks like we spend hours using just a few (or even just one) core on the master node. The net result is that we are likely only using <5% of the compute available to us.
What?
- Identify all major phases where CPU utilisation drops below ~50% for the 3-node run. This might require improving instrumentation / loging. Ideally we would use more representative dataset sizes where possible, and check why we are sometimes seeing different results despite using the same parameters.
- Determine (informally) why these phases currently take as long as they do, and why they don't use more nodes (or threads)
- Resolve the most significant bottleneck by working on and possibly re-distributing processing functions - by doing (at least) one of the following:
- Should have a serious look at sky model filtering, and whether it can be sped up or distributed effectively (or integrate existing solution?)
- Attempt to distribute deconvolution processing functions (thinking about how to make this work with RADLER would be very valuable long-term)
- Reduce memory usage of calibration to prevent swapping (average visibilities / normal eqs?). Ideally have a mechanism to balance gridding efficiency (favours large time and frequency intervals) and memory usage (favours short time and frequency intervals).
See frame on DP ART board: https://miro.com/app/board/uXjVK6Lrdw4=/?moveToWidget=3458764597687428890&cot=14