Details
-
Feature
-
Not Assigned
-
None
-
SRCnet
-
-
-
2
-
0
-
-
-
23.4
-
-
data-lifecycle team_DAAC
Description
While creating the v0.1 node specification, it is clear we need to further work to build consensus around the system architecture, that can support the implementation of the existing SRCNet architecture document.
The plan is to make specific concrete proposals we can discuss in the areas of:
- Storage
- Stage 1 - Define the storage tiers (building on the definition from the top level roadmap)
- Stage 1 - Define how Rucio can help track where data is currently accessible, where data is archived and could be made accessible, and how to request a change in what data is currently accessible
- Later Stages - Define how Rucio is expected to move ODPs and SDPs
- Later Stages - Define how data is distributed to sites that do not have an archive storage, that have smaller amounts of online and/or scratch storage
- Later Stages - Link to higher level discussions around metadata and data discovery linking to global URIs that can then be resolved to a local location via Rucio.
THE FOLLOWING ALL TO BE ADDRESSED IN LATER STAGES ...
- Compute
- Services and Interactive Compute
- Define how there are long running services and APIs available at each site
- For accessible data, define how to manage the resources needed to visualize that currently accessible data, ensuring the resources are available at a time that is convenient for those wanting to visualize the data.
- Define how a user can request that a data set is made accessible for visualization, when it is archived and not currently accessible via online storage
- Discuss the management of "scratch" storage local to the compute
- Batch Compute
- Discuss users requesting the creation of ADPs, and when ready getting access to visualize those ADPs.
- Discuss the adoption of the IOVA execution broker, or similar, to implement a global federation job queue.
- Discuss how jobs can be run on pre-existing Slurm clusters shared with non SKA SRCNet users and workloads.
- Workflow Development
- We will need to support CI/CD workflows to update and publish the services, interactive environments and batch compute templates
- Direct access to development environments, with appropriate data access, to create new interactive and batch compute templates
- SDP pipeline development, with appropriate access to visibilities
- Services and Interactive Compute
- Resource accounting
- We need to document how we track resource usage
- We should document how we implement the recommendations for resource allocation, and the architecturally constraints we need to place on that allocation process
- Security and User Traceability
- This should not overlap with the ongoing policy work.
- Rather we talk about the expectations on various aspects of the system around provenance of the binaries being executed and tracability of what specific user is triggering the specific resource usage
The success of this exercise will be measured by the acceptance of an architecture paper by SRCNet Architecture group and the wider SRCNet community.