Final, end of PI23, status of this feature:
Very significant progress has been achieved, but not all of the acceptance criteria have been met, mostly due to integration issues and uncertainties around the deployment infrastructure. As a summary one can say that the individual teams have done their part of the work as far as possible, but the integration onto the actual LOW platform and also the integration between various subsystems (DLM, DPD, SDP rcv, MCCS) has not been done and is showing a deficiency of system integration activities or at least support for them. Since we can't have features crossing PI boundaries and since we have the follow-up features already in PI24 we are releasing this one now.
There are multiple sets of acceptance criteria for this feature, one overall, one for each of the teams involved YANDA, NALEDI, BANG and SST. The following tables are addressing the status of each of them:
Overall Feature Acceptance Criteria
Acceptance criteria |
Status |
For SDP final and intermediate data products, generated by the SDP processing pipelines, descriptions of data products are available on confluence (confluence page per product following standard template under https://confluence.skatelescope.org/display/SWSI/Data+Products) |
The definition of data products is responsibility of the subsystem generating those data products, not the DLM or any of the supporting technical teams. Currently the page only lists a single product. |
Details about how to deploy the ska-data-lifecycle solution are sorted with respect to deployment locations, database access and configuration. Deployment of the ska-data-lifecycle and all auxiliary services is described on confluence in the Solution Intent space. |
Not finalized, but we made a very good start for SKA-LOW identifying the actual target locations. Far too many unknowns and planned changes to the SKA-LOW platform to finalize this. Cluster will be expanded and amended with AAVS machines and even just the simple question on where the DLM and the Dashboard would run on the cluster could not be resolved. This is the main goal for the new features in PI24. |
Introduce service that can be used to track for intermediate and final data products by ID (as of ADR-54). |
We have not tackled this at all, since there are far more fundamental issues to be solved before we can even think about that level of tracking detail. |
Integrate into SDP:
- the data mangement services are available in the DP Integration platform
- have data product dashboard show information from it
- consider authentication (this would need to go via execution block ID metadata)
|
The DLM is deployed on the DP cluster in the YANDA namespace. The integration with the Data Product Dashbboard failed due to integration issues, but we have now marked that as technical dept and will resolve it asap. |
Enable data migration from MCCS server |
This is fully related to the |
YANDA Acceptance Criteria
Acceptance criteria |
Status |
Review with SDP the necessary APIs to Initialise and Register data products. |
done, but no final agreement reached: https://confluence.skatelescope.org/display/SWSI/ADR-101+SDP+buffer+management+and+interface+to+Data+Lifecycle+Management |
Configure the DLM for the necessary storage locations: Ceph storage on data processing cluster and AA0.5 plus the MCCS |
pending, targets unclear/undefined |
Develop and integrate the workflow to integrate with the DPD (as a starting point when Data Product is "finished") |
done: DLM is sending metadata to DPD |
Update the RCV pipeline to integrate with the DLM service |
done: However, there is no automatic integration test for this system level integration |
Demonstrate the DLM functionalities in DP integration environment |
partially done: DLM is on the DP cluster, but we have not done an official demo. We have now a ticket to write a guideline on how to use the DLM inside the DP cluster. |
Create documentation about how to publish new data products to the DLM |
done: The documentation is auto-generated. There is likely still room for improvement since the current documentation is still fairly low level and there are mutliple ways to 'publish' data products to the DLM. |
Define a list of groups that are necessary to be implemented in the AAA system for integration with DLM |
done: pointed again to the extended group system used by ALMA, which is almost certainly also applicable to SKA. The actual implementation is not a DLM issue, but a observatory wide activity. |
NALEDI Acceptance Criteria
Acceptance criteria |
Status |
Implement and expose Update Metadata API endpoints as needed. |
done |
Integrate the elastic persistence layer |
?? |
BANG Acceptance Criteria
Acceptance criteria |
Status |
Support YANDA in configuring the storage locations |
pending |
Provide a PGSQL instances to be used in the DP integration cluster as per request from teams. |
done |
Support DLM integration on the LOW AA0.5 PGSQL instance |
pending |
Provide DLM and DPD with DNS name entries in the different environments |
partially done |
Provide a dedicated Elastic index for DPD integration |
?? |
SST Acceptance Criteria
Acceptance criteria |
Status |
Initiate the conversation about the need and schedule to expand our storage capacity |
done, but the actual implementation is still pending, which blocks the registration on the DLM target endpoints. |
NALEDI
In this PI the way that metadata is handles in the DPD have been changes. It now makes use of a persistent PostgreSQL database for storage of metadata (if available, the in-memory functionality has been updated and maintained to assist with the users during the changeover to PostgreSQL). The search implementation has also been updated to make use of the Elasticsearch instance made available by the !Bang team. (Again, the in-memory search has been updated and maintained to assist with the users during the changeover to Elasticsearch)
To enable these changes, restructuring of the API and Dashboard were needed. These changes included updates to the data structure used to serve data to the MUI DataGrid component on the dashboard, rework of the API project structure to improve the logical separation between different modes of operation (In-memory or making use of databases when available), updating all the supporting methods to align with the requirements of saving data in PostgreSQL and Elasticsearch, improved handling of env variables and various other improvements.
For a full change log of all the changes done as part of this feature, please see the respective change logs of each application:
Updates to the SDP Data Product Dashboard API:
Changelog — ska-sdp-dataproduct-api 0.8.0 documentation (skao.int)
Updates to the SDP Data Product Dashboard API:{}
Changelog — ska-sdp-dataproduct-dashboard 0.8.2 documentation (skao.int)
Releases:
<PENDING improvements of test coverage as part of NAL-1157>
A temporary deployment can be accessed here: https://sdhp.stfc.skao.int/dp-naledi-andre/dashboard/
https://sdhp.stfc.skao.int/dp-naledi-andre/api/status