Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4868

Distributed Data Computing v0.1 - Roadmap

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Distributed Data Computing v0.1 - Roadmap
    • SRCnet
    • 0

    Description

      Key Functionalities Achieved

      Unified API Endpoint:

      • The API has been streamlined to operate through a single ExecutionBroker web service endpoint, moving away from the previous dual-endpoint structure (ExecutionBroker and ExecutionWorker). This simplifies the interaction for users and clients accessing the service.

      OpenAPI Specification:

      • An OpenAPI specification has been created, formally defining the service methods and messages. This provides a clear framework for developers, ensuring that the API is well-documented and easier to integrate with various client applications.

      Initial Prototype Functionalities:

      • Container Compatibility Check: The Execution Broker can respond to requests like "Can you run this container?" by checking if the specified container is supported by the underlying execution platform (initially CANFAR).
      • Data Availability Check: It can verify if the necessary data for a requested computation is locally accessible. For example, if asked, "Can you run this container with this DATA?" the Broker checks if the data is available in the local Rucio node, providing immediate feedback.

      Modular Design:

      • The architecture of the Execution Broker has been designed to be modular. This allows for future extensions to easily integrate with other storage and compute platforms without major overhauls to the core system. For instance, it can be adapted to work with different backend systems like S3 or Slurm in the future.

      Resource Management Improvements:

      • Integration of resource management capabilities through Blazar has been explored, enabling users to reserve cloud resources more effectively, reducing issues like "no valid host" errors when trying to allocate resources.

      Networking Evaluation:

      • Initial assessments of long-haul data movement and networking capabilities have been conducted using both manual scripts and perfSONAR. This comparative analysis helps identify potential issues with the networking infrastructure.

      Container Execution Analysis:

      • Ongoing investigations into the “two-level container” problem (executing containers within containers) have been initiated. This includes analyzing existing solutions from other communities and looking at potential strategies to manage nested container execution effectively.

      Documentation and Reference Library:

      • Development of a comprehensive reference library that documents the workflows and provides guidance for users and developers on how to utilize the Execution Broker effectively. This is essential for ensuring consistency and usability across the SRCNet ecosystem.

      Collaboration with Other Teams:

      ••Progress and findings have been shared with SRCNet teams to ensure that the work is aligned and contributes to broader objectives, fostering a collaborative environment for troubleshooting and knowledge sharing.

      Attachments

        Issue Links

          Structure

            Activity

              People

                Debashis.Mitra Mitra, Debashis
                Debashis.Mitra Mitra, Debashis
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                  Created:
                  Updated:

                  Structure Helper Panel