The aim of this exercise is to carry out calculations in support of SP-1428: Review Compute Requirements (rollout plan) for AA0.5-3, and to double check the accuracy of long held assumptions regarding the computing requirements for early array releases. This is to ensure that we have: a) adequate rack space and power; b) can estimate the costs and confirm that running NEC4 contracts for the early array releases for a sub-set of products is not necessary; and c) ensure that the networking roll-out plans have sufficient point to point bandwidth between facilities and to the outside world, and sufficient Ethernet switching capabilities within facilities.

In fact the aim is now somewhat different due to the introduction of AA3*. Therefore the scope of this exercise will not be reduced to cover AA0.5-2. In terms of the set of products covered by this exercise we are looking specifically at SDP, PSS and PST. A future piece of work which is needed is an analysis of the sizing and BoM for TM (including the EDA) and MCCS.

AA3 roll-out requirements will be looked at separately and will be quite an involved exercise, especially in the case of SDP. A proper analysis requires an understanding of what the impact will be on the HPSOs and the scheduling policies of the observatory in the case that AA3 becomes the end point for SKA1. The aim will be to calculate the impact of the proposed cuts to the hardware budgets.

Let's start with calculating the number of visibilities and the data these will produce per second that needs to be ingested by the SDP.

The unaveraged number of visibilites (which have a complex 32-bit floating point representation) produced per second is given by the following formula:

This formula is effectively a worst case scenario due to there being no averaging. However, according to the roll-out plans, the early array releases will not have an averaging capability, so the formula is adequate for our purposes.

How much does SDP have to ingest from the correlator? The input from CSP is 10 bytes per visibility (source: SDP Parametric Model memo SKA-TEL-SDP-0000013). The numbers for the two telescopes and the various array releases are as follows.

Now let's look at the processing and storage requirements for some rudimentary pipelines for the early array releases.

In the case of AA0.5 there will be a limited set of imaging capability and no non-imaging (pulsar) capability. According to current MID and LOW roll-out plans, it is expected that basic continuum and spectral line imaging will be available and this will be carried out offline using the CASA package. We expect that the main technical requirement for SDP hardware at this point is to be able to ingest and write to disk the visibilities and measurement sets. As can be seen from the calculations above there isn't a scenario where typical current servers with LOM Ethernet cards won't be able to cope with the data ingest. If the server is then configured with a few 10s of TBs of hard drives and a typical amount of RAM (64/128 GB) the set up should be more than sufficient. (In the case that RDMA for the CSP-SDP link is adopted then the servers will need to be supplied with compatible NICs at an increased cost of around 500-1000 Euros per node.) We suggest that the equipment used for the PSI work should be kept on and redployed from the ITF to the KAPB in the case of MID and CPF in the case of LOW.

AA1 sees the addition of another set of dishes and antennas and the data rates out of the correlator increase due to the higher number of baselines compared to AA0.5. It is planned that SDP shall be able to run basic real-time calibration pipelines and be able to support calibration for pulsar timing beams (but this PST work might not require this capability). Again, it is not anticipated that this will increase the workload sufficiently to require an expanded set of SDP hardware.

The data rates clearly increase for AA2 for both telescopes due to the significant increase over AA1 in terms of dishes and antennas. Even so the tasks of ingest and subsequent offline processing using CASA should be easily accomplished with a small COTS cluster using current technology. As there is no need to buffer visibilities and keep up in near real-time with the rest of a telescope working most of the time, the processing can be done in a more leisurely fashion than will be the case in full operations.

We conclude that the notion of a 'milli SDP' as put forward by the SDP consortium, i.e. one one thousandth of the full system at AA4, is about right for AA0.5, AA1 and AA2. Therefore a simple cluster of two servers along with networking and external storage of a few hundred TBs for capturing both scientific and engineering data for the duration of the AA0.5-2 should be sufficient. Profiling of the upcoming PSI tests for both telescopes to confirm resource usage by simple pipelines will help define a tighter and cost-effective specification and BoM ahead of the proposed cash procurement to be done outside of the NEC4 framework. A SAFe feature for this work will need to be drafted. It should be noted that there is a need currently to install the servers in the KAPB/CPF for AA0.5 and AA1 and then to move to the SPCs for AA3/AA3*. This should not be a difficult operation.

In terms of costs, clearly this is a small figure - similar to the expenditure on PSI equipment. In fact we might ask whether PSI equipment might be recycled to avoid procurement ahead of AA0.5. If this is not possible then clearly it should be a straightforward matter to run a small cash procurement for this purpose. Running a full NEC4 ECC procurement for such a small system is clearly a waste of time and money and we could end up in a situation whereby the time expended on such an effort would cost more than the equipment itself.

Let us now turn our attention to the sizing of PSS and PST systems.

We know that for both MID and LOW that there is no requirement to do searching or timing for both AA0.5 and AA1. Therefore our concern is just AA2. In the case of MID, due to ECP-200048 there isn't a requirement to host PSS or PST equipment in the KAPB and therefore the installation will be carried out in Cape Town alongside that of the SDP. The decision for the LOW equivalent (ECP-200049) is pending. At the time of writing the baseline in this case is to still host the PSS and PST systems in the CPF.

The sizing problem for PSS is more straightforward and is simply a matter of taking the number of beams for AA2 and dividing by 3 - the number of beams to be processed be each server - thus giving the number of servers required.

For MID: AA2 - 16 beams to be processed, therefore 6 servers are needed.

For LOW: AA2 - according to the latest roll-out plans the CBF will not produced beamformed data for pulsar search. There will be beamformed data from LFAA. It is estimated that 2 servers are required for this.

For PST the number of beams to be processed and number of servers required has a one-to-one relationship. The figures are as follows.

For MID:

AA1 - 1 beam, 1 server

AA2 - 6 beams, 6 servers

For LOW:

AA2 - 4 beams, 4 servers

Conclusions

For the early array releases the proposed approach of having a lightweight procurement model outside of the NEC4 framework is confirmed to be the correct approach due to the very small sizes of the systems in question. We have confirmed that the data ingest, processing and data storage requirements can be adequately met by the proposed approach.