Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4276

Find and access AA0.5 Data Products for further analysis

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Data Processing
    • Hide
      • This feature is intended to integrate and bring to fruition the data management products that have been prototyped so far: https://gitlab.com/ska-telescope/ska-data-lifecycle , https://gitlab.com/ska-telescope/sdp/ska-sdp-dataproduct-dashboard . The objectives we had in PI21 have not been reached and the intention is to resolve the blockers that got in the way of making that happen.
      • The immediate benefit hypothesis is to enable discoverability and access to data for the Low AA0.5 telescope (July 2024). Initially this applies to data products generated by the SDP (ingested visibilities, calibration solutions) as well as to station-level data products generated by the MCCS (station visibilities, unchannelised data, channelised data).
      • This provides fundamental service to enable the ability to ingest, track, trace, find and access data products of any kind within the SKAO. This will be using a generic data management schema initially, but allow for product specific schemas as well.
      Show
      This feature is intended to integrate and bring to fruition the data management products that have been prototyped so far: https://gitlab.com/ska-telescope/ska-data-lifecycle , https://gitlab.com/ska-telescope/sdp/ska-sdp-dataproduct-dashboard . The objectives we had in PI21 have not been reached and the intention is to resolve the blockers that got in the way of making that happen. The immediate benefit hypothesis is to enable discoverability and access to data for the Low AA0.5 telescope (July 2024). Initially this applies to data products generated by the SDP (ingested visibilities, calibration solutions) as well as to station-level data products generated by the MCCS (station visibilities, unchannelised data, channelised data). This provides fundamental service to enable the ability to ingest, track, trace, find and access data products of any kind within the SKAO. This will be using a generic data management schema initially, but allow for product specific schemas as well.
    • Hide
      • For SDP final and intermediate data products, generated by the SDP processing pipelines, descriptions of data products are available on confluence (confluence page per product following standard template under https://confluence.skatelescope.org/display/SWSI/Data+Products)
      • Details about how to deploy the ska-data-lifecycle solution are sorted with respect to deployment locations, database access and configuration. Deployment of the ska-data-lifecycle and all auxiliary services is described on confluence in the Solution Intent space.
      • Introduce service that can be used to track for intermediate and final data products by ID (as of ADR-54). The DLM API allows to:
        • Add a new data products
        • Register new storage locations / removal of storage locations for these data products (i.e. also notify of moves/copies)
        • Update data products on their status (finished)
        • Obtain Location information about specific data product (should allow a number of storage backends, including informal ones like something that is in storage on an HPC system like CSD3, STFC cloud or Pawsey)
        • Lifecycle information for specific data product (at minimum lifetime)
        • (stretch) Update user annotations
      • Integrate into SDP
        • the data mangement services are available in the DP Integration platform
        • have data product dashboard show information from it
        • consider authentication (this would need to go via execution block ID metadata)
      • Enable data migration from MCCS server

      Based on https://confluence.skatelescope.org/display/SE/PI23+Planning+-+DP+-+DLM+design+session

      YANDA

      • Review with SDP the necessary APIs to Initialise and Register data products.
      • Configure the DLM for the necessary storage locations: Ceph storage on data processing cluster and AA0.5 plus the MCCS
      • Develop and integrate the workflow to integrate with the DPD (as a starting point when Data Product is "finished")
      • Update the RCV pipeline to integrate with the DLM service
      • Demonstrate the DLM functionalities in DP integration environment
      • Create documentation about how to publish new data products to the DLM
      • Define a list of groups that are necessary to be implemented in the AAA system for integration with DLM

      NALEDI

      • Implement and expose Update Metadata API endpoints as needed.
      • Integrate the elastic persistence layer

      BANG

      • Support YANDA in configuring the storage locations
      • Provide a PGSQL instances to be used in the DP integration cluster as per request from teams.
      • Support DLM integration on the LOW AA0.5 PGSQL instance
      • Provide DLM and DPD with DNS name entries in the different environments
      • Provide a dedicated Elastic index for DPD integration

      SST

      • Initiate the conversation about the need and schedule to expand our storage capacity
      Show
      For SDP final and intermediate data products, generated by the SDP processing pipelines, descriptions of data products are available on confluence (confluence page per product following standard template under https://confluence.skatelescope.org/display/SWSI/Data+Products ) Details about how to deploy the ska-data-lifecycle solution are sorted with respect to deployment locations, database access and configuration. Deployment of the ska-data-lifecycle and all auxiliary services is described on confluence in the Solution Intent space. Introduce service that can be used to track for intermediate and final data products by ID (as of ADR-54 ). The DLM API allows to: Add a new data products Register new storage locations / removal of storage locations for these data products (i.e. also notify of moves/copies) Update data products on their status (finished) Obtain Location information about specific data product (should allow a number of storage backends, including informal ones like something that is in storage on an HPC system like CSD3, STFC cloud or Pawsey) Lifecycle information for specific data product (at minimum lifetime) (stretch) Update user annotations Integrate into SDP the data mangement services are available in the DP Integration platform have data product dashboard show information from it consider authentication (this would need to go via execution block ID metadata) Enable data migration from MCCS server Based on https://confluence.skatelescope.org/display/SE/PI23+Planning+-+DP+-+DLM+design+session YANDA Review with SDP the necessary APIs to Initialise and Register data products. Configure the DLM for the necessary storage locations: Ceph storage on data processing cluster and AA0.5 plus the MCCS Develop and integrate the workflow to integrate with the DPD (as a starting point when Data Product is "finished") Update the RCV pipeline to integrate with the DLM service Demonstrate the DLM functionalities in DP integration environment Create documentation about how to publish new data products to the DLM Define a list of groups that are necessary to be implemented in the AAA system for integration with DLM NALEDI Implement and expose Update Metadata API endpoints as needed. Integrate the elastic persistence layer BANG Support YANDA in configuring the storage locations Provide a PGSQL instances to be used in the DP integration cluster as per request from teams. Support DLM integration on the LOW AA0.5 PGSQL instance Provide DLM and DPD with DNS name entries in the different environments Provide a dedicated Elastic index for DPD integration SST Initiate the conversation about the need and schedule to expand our storage capacity
    • 5
    • 5
    • 0
    • Team_NALEDI, Team_YANDA
    • Sprint 5
    • Hide

      Final, end of PI23, status of this feature:

      Very significant progress has been achieved, but not all of the acceptance criteria have been met, mostly due to integration issues and uncertainties around the deployment infrastructure. As a summary one can say that the individual teams have done their part of the work as far as possible, but the integration onto the actual LOW platform and also the integration between various subsystems (DLM, DPD, SDP rcv, MCCS) has not been done and is showing a deficiency of system integration activities or at least support for them. Since we can't have features crossing PI boundaries and since we have the follow-up features already in PI24 we are releasing this one now.

      There are multiple sets of acceptance criteria for this feature, one overall, one for each of the teams involved YANDA, NALEDI, BANG and SST. The following tables are addressing the status of each of them: 

      Overall Feature Acceptance Criteria

      Acceptance criteria Status
      For SDP final and intermediate data products, generated by the SDP processing pipelines, descriptions of data products are available on confluence (confluence page per product following standard template under https://confluence.skatelescope.org/display/SWSI/Data+Products) The definition of data products is responsibility of the subsystem generating those data products, not the DLM or any of the supporting technical teams. Currently the page only lists a single product.
      Details about how to deploy the ska-data-lifecycle solution are sorted with respect to deployment locations, database access and configuration. Deployment of the ska-data-lifecycle and all auxiliary services is described on confluence in the Solution Intent space. Not finalized, but we made a very good start for SKA-LOW identifying the actual target locations. Far too many unknowns and planned changes to the SKA-LOW platform to finalize this. Cluster will be expanded and amended with AAVS machines and even just the simple question on where the DLM and the Dashboard would run on the cluster could not be resolved. This is the main goal for the new features in PI24.
      Introduce service that can be used to track for intermediate and final data products by ID (as of ADR-54). We have not tackled this at all, since there are far more fundamental issues to be solved before we can even think about that level of tracking detail.
      Integrate into SDP:
      • the data mangement services are available in the DP Integration platform
      • have data product dashboard show information from it
      • consider authentication (this would need to go via execution block ID metadata)
      The DLM is deployed on the DP cluster in the YANDA namespace. The integration with the Data Product Dashbboard failed due to integration issues, but we have now marked that as technical dept and will resolve it asap.
      Enable data migration from MCCS server This is fully related to the 

      YANDA Acceptance Criteria

      Acceptance criteria Status
      Review with SDP the necessary APIs to Initialise and Register data products. done, but no final agreement reached: https://confluence.skatelescope.org/display/SWSI/ADR-101+SDP+buffer+management+and+interface+to+Data+Lifecycle+Management
      Configure the DLM for the necessary storage locations: Ceph storage on data processing cluster and AA0.5 plus the MCCS pending, targets unclear/undefined
      Develop and integrate the workflow to integrate with the DPD (as a starting point when Data Product is "finished") done: DLM is sending metadata to DPD
      Update the RCV pipeline to integrate with the DLM service done: However, there is no automatic integration test for this system level integration
      Demonstrate the DLM functionalities in DP integration environment partially done: DLM is on the DP cluster, but we have not done an official demo. We have now a ticket to write a guideline on how to use the DLM inside the DP cluster.
      Create documentation about how to publish new data products to the DLM done: The documentation is auto-generated. There is likely still room for improvement since the current documentation is still fairly low level and there are mutliple ways to 'publish' data products to the DLM.
      Define a list of groups that are necessary to be implemented in the AAA system for integration with DLM done: pointed again to the extended group system used by ALMA, which is almost certainly also applicable to SKA. The actual implementation is not a DLM issue, but a observatory wide activity.

      NALEDI Acceptance Criteria

      Acceptance criteria Status
      Implement and expose Update Metadata API endpoints as needed. done
      Integrate the elastic persistence layer ??

       

      BANG Acceptance Criteria

      Acceptance criteria Status
      Support YANDA in configuring the storage locations pending
      Provide a PGSQL instances to be used in the DP integration cluster as per request from teams. done
      Support DLM integration on the LOW AA0.5 PGSQL instance pending
      Provide DLM and DPD with DNS name entries in the different environments partially done
      Provide a dedicated Elastic index for DPD integration ??

       

      SST Acceptance Criteria

      Acceptance criteria Status
      Initiate the conversation about the need and schedule to expand our storage capacity done, but the actual implementation is still pending, which blocks the registration on the DLM target endpoints.

       

      NALEDI

      In this PI the way that metadata is handles in the DPD have been changes. It now makes use of a persistent PostgreSQL database for storage of metadata (if available, the in-memory functionality has been updated and maintained to assist with the users during the changeover to PostgreSQL). The search implementation has also been updated to make use of the Elasticsearch instance made available by the !Bang team. (Again, the in-memory search has been updated and maintained to assist with the users during the changeover to Elasticsearch)

      To enable these changes, restructuring of the API and Dashboard were needed. These changes included updates to the data structure used to serve data to the MUI DataGrid component on the dashboard, rework of the API project structure to improve the logical separation between different modes of operation (In-memory or making use of databases when available), updating all the supporting methods to align with the requirements of saving data in PostgreSQL and Elasticsearch, improved handling of env variables and various other improvements.

      For a full change log of all the changes done as part of this feature, please see the respective change logs of each application:

       

      Updates to the SDP Data Product Dashboard API:

      Changelog — ska-sdp-dataproduct-api 0.8.0 documentation (skao.int)

      Updates to the SDP Data Product Dashboard API:{}

      Changelog — ska-sdp-dataproduct-dashboard 0.8.2 documentation (skao.int)

       

      Releases:

      <PENDING improvements of test coverage as part of NAL-1157>

      A temporary deployment can be accessed here: https://sdhp.stfc.skao.int/dp-naledi-andre/dashboard/

      https://sdhp.stfc.skao.int/dp-naledi-andre/api/status

      Show
      Final, end of PI23, status of this feature: Very significant progress has been achieved, but not all of the acceptance criteria have been met, mostly due to integration issues and uncertainties around the deployment infrastructure. As a summary one can say that the individual teams have done their part of the work as far as possible, but the integration onto the actual LOW platform and also the integration between various subsystems (DLM, DPD, SDP rcv, MCCS) has not been done and is showing a deficiency of system integration activities or at least support for them. Since we can't have features crossing PI boundaries and since we have the follow-up features already in PI24 we are releasing this one now. There are multiple sets of acceptance criteria for this feature, one overall, one for each of the teams involved YANDA, NALEDI, BANG and SST. The following tables are addressing the status of each of them:  Overall Feature Acceptance Criteria Acceptance criteria Status For SDP final and intermediate data products, generated by the SDP processing pipelines, descriptions of data products are available on confluence (confluence page per product following standard template under  https://confluence.skatelescope.org/display/SWSI/Data+Products ) The definition of data products is responsibility of the subsystem generating those data products, not the DLM or any of the supporting technical teams. Currently the page only lists a single product. Details about how to deploy the ska-data-lifecycle solution are sorted with respect to deployment locations, database access and configuration. Deployment of the ska-data-lifecycle and all auxiliary services is described on confluence in the Solution Intent space. Not finalized, but we made a very good start for SKA-LOW identifying the actual target locations. Far too many unknowns and planned changes to the SKA-LOW platform to finalize this. Cluster will be expanded and amended with AAVS machines and even just the simple question on where the DLM and the Dashboard would run on the cluster could not be resolved. This is the main goal for the new features in PI24. Introduce service that can be used to track for intermediate and final data products by ID (as of  ADR-54 ). We have not tackled this at all, since there are far more fundamental issues to be solved before we can even think about that level of tracking detail. Integrate into SDP: the data mangement services are available in the DP Integration platform have data product dashboard show information from it consider authentication (this would need to go via execution block ID metadata) The DLM is deployed on the DP cluster in the YANDA namespace. The integration with the Data Product Dashbboard failed due to integration issues, but we have now marked that as technical dept and will resolve it asap. Enable data migration from MCCS server This is fully related to the  YANDA Acceptance Criteria Acceptance criteria Status Review with SDP the necessary APIs to Initialise and Register data products. done, but no final agreement reached: https://confluence.skatelescope.org/display/SWSI/ADR-101+SDP+buffer+management+and+interface+to+Data+Lifecycle+Management Configure the DLM for the necessary storage locations: Ceph storage on data processing cluster and AA0.5 plus the MCCS pending, targets unclear/undefined Develop and integrate the workflow to integrate with the DPD (as a starting point when Data Product is "finished") done: DLM is sending metadata to DPD Update the RCV pipeline to integrate with the DLM service done: However, there is no automatic integration test for this system level integration Demonstrate the DLM functionalities in DP integration environment partially done: DLM is on the DP cluster, but we have not done an official demo. We have now a ticket to write a guideline on how to use the DLM inside the DP cluster. Create documentation about how to publish new data products to the DLM done: The documentation is auto-generated. There is likely still room for improvement since the current documentation is still fairly low level and there are mutliple ways to 'publish' data products to the DLM. Define a list of groups that are necessary to be implemented in the AAA system for integration with DLM done: pointed again to the extended group system used by ALMA, which is almost certainly also applicable to SKA. The actual implementation is not a DLM issue, but a observatory wide activity. NALEDI Acceptance Criteria Acceptance criteria Status Implement and expose Update Metadata API endpoints as needed. done Integrate the elastic persistence layer ??   BANG Acceptance Criteria Acceptance criteria Status Support YANDA in configuring the storage locations pending Provide a PGSQL instances to be used in the DP integration cluster as per request from teams. done Support DLM integration on the LOW AA0.5 PGSQL instance pending Provide DLM and DPD with DNS name entries in the different environments partially done Provide a dedicated Elastic index for DPD integration ??   SST Acceptance Criteria Acceptance criteria Status Initiate the conversation about the need and schedule to expand our storage capacity done, but the actual implementation is still pending, which blocks the registration on the DLM target endpoints.   NALEDI In this PI the way that metadata is handles in the DPD have been changes. It now makes use of a persistent PostgreSQL database for storage of metadata (if available, the in-memory functionality has been updated and maintained to assist with the users during the changeover to PostgreSQL). The search implementation has also been updated to make use of the Elasticsearch instance made available by the !Bang team. (Again, the in-memory search has been updated and maintained to assist with the users during the changeover to Elasticsearch) To enable these changes, restructuring of the API and Dashboard were needed. These changes included updates to the data structure used to serve data to the MUI DataGrid component on the dashboard, rework of the API project structure to improve the logical separation between different modes of operation (In-memory or making use of databases when available), updating all the supporting methods to align with the requirements of saving data in PostgreSQL and Elasticsearch, improved handling of env variables and various other improvements. For a full change log of all the changes done as part of this feature, please see the respective change logs of each application:   Updates to the SDP Data Product Dashboard API: Changelog — ska-sdp-dataproduct-api 0.8.0 documentation (skao.int) Updates to the SDP Data Product Dashboard API: { } Changelog — ska-sdp-dataproduct-dashboard 0.8.2 documentation (skao.int)   Releases: <PENDING improvements of test coverage as part of NAL-1157> A temporary deployment can be accessed here: https://sdhp.stfc.skao.int/dp-naledi-andre/dashboard/ https://sdhp.stfc.skao.int/dp-naledi-andre/api/status
    • PI24 - UNCOVERED

    Attachments

      Issue Links

        Structure

          Activity

            People

              m.bartolini Bartolini, Marco
              b.mort Mort, Ben
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Feature Progress

                Story Point Burn-up: (88.57%)

                Feature Estimate: 5.0

                IssuesStory Points
                To Do24.0
                In Progress   00.0
                Complete1531.0
                Total1735.0

                Dates

                  Created:
                  Updated:

                  Structure Helper Panel