Details
-
Epic
-
Not Assigned
-
None
-
Distributed Data Computing v0.1 - Roadmap
-
SRCnet
-
0
Description
Key Functionalities Achieved
Unified API Endpoint:
- The API has been streamlined to operate through a single ExecutionBroker web service endpoint, moving away from the previous dual-endpoint structure (ExecutionBroker and ExecutionWorker). This simplifies the interaction for users and clients accessing the service.
OpenAPI Specification:
- An OpenAPI specification has been created, formally defining the service methods and messages. This provides a clear framework for developers, ensuring that the API is well-documented and easier to integrate with various client applications.
Initial Prototype Functionalities:
- Container Compatibility Check: The Execution Broker can respond to requests like "Can you run this container?" by checking if the specified container is supported by the underlying execution platform (initially CANFAR).
- Data Availability Check: It can verify if the necessary data for a requested computation is locally accessible. For example, if asked, "Can you run this container with this DATA?" the Broker checks if the data is available in the local Rucio node, providing immediate feedback.
Modular Design:
- The architecture of the Execution Broker has been designed to be modular. This allows for future extensions to easily integrate with other storage and compute platforms without major overhauls to the core system. For instance, it can be adapted to work with different backend systems like S3 or Slurm in the future.
Resource Management Improvements:
- Integration of resource management capabilities through Blazar has been explored, enabling users to reserve cloud resources more effectively, reducing issues like "no valid host" errors when trying to allocate resources.
Networking Evaluation:
- Initial assessments of long-haul data movement and networking capabilities have been conducted using both manual scripts and perfSONAR. This comparative analysis helps identify potential issues with the networking infrastructure.
Container Execution Analysis:
- Ongoing investigations into the “two-level container” problem (executing containers within containers) have been initiated. This includes analyzing existing solutions from other communities and looking at potential strategies to manage nested container execution effectively.
Documentation and Reference Library:
- Development of a comprehensive reference library that documents the workflows and provides guidance for users and developers on how to utilize the Execution Broker effectively. This is essential for ensuring consistency and usability across the SRCNet ecosystem.
Collaboration with Other Teams:
••Progress and findings have been shared with SRCNet teams to ensure that the work is aligned and contributes to broader objectives, fostering a collaborative environment for troubleshooting and knowledge sharing.
Attachments
Issue Links
- Child Of
-
SP-4746 Distributed Data Computing All Versions - Roadmap
- Funnel
-
SP-4873 Development VS v0.1 - Roadmap
- Funnel
- Parent Of
-
SP-4294 Draw out the internal compute APIs as a plugin infrastructure
- Funnel
-
SP-4411 Analyse usefulness of Workload Management Systems for SRCNet workloads
- Funnel
-
SP-4536 Compare networking tests with perfSONAR mesh results
- Funnel
-
SP-4635 Analyse relation between EB and WMS
- Funnel
-
SP-4282 Resource Allocation OpenStack: help users pick when the cloud has space
- Program Backlog
-
SP-4511 ExecutionBroker modular plug-in design
- Program Backlog
-
SP-4279 Resource Allocation OpenStack: clarity about when the cloud is full
- Implementing
-
SP-4280 Resource Allocation OpenStack: Cloud credits limit size of resource reservations
- Implementing
-
SP-4513 Establish a plan for SRCNet sites to run SDP pipelines (from PI25)
- Implementing
-
SP-4643 Update the Execution Broker IVOA standard
- Implementing
-
SP-4283 Verify Execution Broker functionality by executing a "real" SRCNet task
- Implementing
-
SP-4644 Report on OpenAPI code generation for the Execution Broker service interface
- Implementing
-
SP-4493 Investigate when and where the two-level container issue will (may) impact SRCNet
- Releasing
-
SP-3317 Completion of federated OS work
- Done
-
SP-4060 Inialise Higher Performance Networking Inside OpenStack Environments
- Done
-
SP-4241 Prototype an Execution broker within CANFAR Science platform
- Done
-
SP-3296 Accessing FITS Metadata in an Object-store
- Done