Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4098

Add capability to ingestion service to ingest data into "non-deterministic" Rucio storage elements

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      BH1: Having ingestion use a nondeterministic RSE will reduce data duplication during ingestion.
      BH2: Experience in setting up and using nondeterministic RSEs fills a knowledge gap.

      Show
      BH1: Having ingestion use a nondeterministic RSE will reduce data duplication during ingestion. BH2: Experience in setting up and using nondeterministic RSEs fills a knowledge gap.
    • Hide

      AC: Document and demo the use of a non-deterministic RSE working with the ingestion service.

      Show
      AC: Document and demo the use of a non-deterministic RSE working with the ingestion service.
    • 1
    • 2
    • 0
    • Team_MAGENTA
    • Sprint 5
    • Show
      Demoed: https://confluence.skatelescope.org/pages/viewpage.action?pageId=265846042   Documentation updated: https://gitlab.com/ska-telescope/src/ska-src-ingestion/-/tree/sp4098-ingest-nondeterministic?ref_type=heads#rucio-non-det-backend---rucio-non-deterministic-ingestion  
    • 22.6
    • Stories Completed, Outcomes Reviewed, Demonstrated, Satisfies Acceptance Criteria, Accepted by FO
    • PI23 - UNCOVERED

    • data-ingestion-dissemination-and-replication

    Description

      By default Rucio uses "deterministic" Rucio Storage Elements (RSEs). All but one of the RSEs in the prototype Rucio datalake are deterministic, but there exists another option, "non-deterministic", which may be more a appropriate type to use in some SRCNet context, especially data ingest.

      For deterministic RSEs, Rucio handles translation between a logical file name (LFN) to physical file name (PFN). This is the mapping between the internal data identifier used by Rucio and the replica path on storage. In this mode, to ingest data one would have to both upload the data (so that Rucio can generate this mapping) and register it (flag it as belonging to the datalake). The upload step means that the data is duplicated.

      For "non-deterministic" RSEs, the operator tells Rucio the mapping between the LFN and the PFN. This way, data that already exists can be ingested by simply registering it; no upload (and corresponding data duplication) is required.

      Currently the ingestion service (for Rucio) use deterministic RSEs.

      Attachments

        Issue Links

          Structure

            Activity

              People

                r.bolton Bolton, Rosie
                r.bolton Bolton, Rosie
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 1.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete59.0
                  Total59.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel