Details
-
Feature
-
Could have
-
None
-
None
-
Data Processing
-
-
-
Inter Program, Intra Program
-
3
-
3
-
0
-
-
-
-
DP-SDP: G3
Description
Who?
- AA0.5 Operator
- AIV engineer
- Commissioning Scientist
What? (outcomes)
- Updated, packaged, and released visibility receive workflow capable of being configured to timely receive data from thousands of streams and is ready for testing at PSI MID and/or PSI LOW.
Why?
It is likely in AA0.5 that with so few baselines the SPEAD2 HEAPs will be very small. Although there has been some agreement to aggregate channels into the HEAPs to mitigate this, it is not at all clear how much aggregation LOW and MID will be able to support. With many thousands of channels and limited aggregation this will result in many hundreds, if not thousands, of SPEAD streams. Testing has revealed that as the aggregate bandwidth is low it is possible to configure a receiver to capture this data rate even with one I/O thread. However there are other limitations:
- The current scheme of many thousands of small writes per second to a single measurement set is an anti-pattern. It is the worst way to access a table based storage format like the measurement set. This can be alleviated either by either writing multiple measurement sets (from different threads/processes), or by some sort of type of data aggregation, either buffering before writing, or aggregating it before it goes into the plasma store.
- While reception of heaps from the network occurs fast, further testing reveals that unpacking the heap data into their respective ItemGroups can become a CPU bottleneck. For the ICD heap definition our tests reveal that the receiver can unpack ~3000 heaps/s. At a dump time of 0.9s, this means a single receiver can handle the same order of streams/s at most. For more streams one would need to investigate different strategies, like starting different receivers to deal with different streams; other potential solutions would be investigate if we can shard the load in a the single receiver process, and/or double-check if we are doing the unpacking of the heaps as efficiently as possible.
- Likewise, the sender's limits are also around the generation of heaps to be sent to the network, this time at ~3500 heaps/s. Again, the simplest solution in the case of needing to send more streams would be to start more senders, but we can study other options as well.
This feature supports the development of flexibility in the visibility receive to accommodate some or all of these elements. In particular, this would entail making sure the use-case of starting many receiver processes (and senders) during a scan is supported on SDP integration, as this is the option with the most probability of being useful in the longer term.
References