Loading...

Change Owns to Parent Ofs

Set start and due date...

Xporter

XML

Word

Printable

Details

Type: Feature
Priority: Not Assigned
Fix Version/s: None
Component/s: SRCnet Science Enabling
Labels:
- SRC-Multi-Team
- SRCNet0.x

ARTs:

SRCnet
Epic Link:
Science Gateway v0.2 - Roadmap
Story Point Burn-up:
Overdue:

Requirement Status:

PI24 - UNCOVERED
Labels_MIRO:
SRC-Multi-Team SRCNet0.x

Description

If the SKA Precursor experiences are anything to go by, there will be a need in the SRCNet for a Science Platform to support HPC-based data reduction and generation of Level 7 science products, but at a much larger scale. Such a platform would compliment the CANFAR science platform currently under development by the SRCNet.

This platform would uniquely identify itself by:

supporting MPI and multi-processing compute,

supporting Singularity containerisation,

scaling up to provide data-reduction analyses on 1 TB and above datasets,

running pipelines across multiple VMs, and

supporting on-the-fly data compression/decompression.

Rationale:

Considering the assumed data output rate from the SKAO, timely data-processing is fundamental to the success of the project - data cannot be allowed to back-up over days (or probably even hours) when there is limited fast-storage (Tier 0 or 1) available.

The CANFAR platform addresses the subset of processing that can be run within a VM or orchestrated Docker container environment. However, best-in-class algorithms (such as SoFiA-2 source-finding, LINMOS and MIRIAD ImMerge mosaicing), when used against datacubes of 500GB or more, require huge memory and CPU resources, of the scale only available across multiple cores and VMs. This is typically met by running these algorithms within an HPC parallel environment; there is no single compute core (or node) that could support timely processing on such a scale.

Although there are SRC nodes that have developed some aspects of HPC processing, the SRCNet would benefit from having a science platform that could be deployed at SRC nodes that will have access to the appropriate HPC compute resources, providing a single user-interface that can be accessed by the rest of the SRCNet user community.

Requirements:

Such a platform would minimally require:

a flexible HPC workflow/pipeline orchestrator,

a single user-portal interface for running user-defined algorithms,

a jobs database,

AAI,

ability to auto-trigger on external events (eg incomming data),

ability to automatically deploy on HPC resources such as SLURM,

ability to monitor running user jobs,

ability to run MPI processes (either containerised or natively),

ability to run embarrasingly parallel mulit-processes, and

ability to interface with Tier 0, 1 and 2 storage (Objectstore S3 or POSIX)

Planning:

This feature will span many Program Increments, and be the parent feature of individual features that span one or two PIs.

Attachments

Issue Links

Child Of

SP-4795 Science Gateway v0.2 - Roadmap

Funnel

Parent Of

SP-4885 Design considerations for an imaging pipeline

Funnel

SP-4605 Development of a scalable, multi-container processing pipeline in CADC CANFAR

Implementing

Structure

Activity

People

Assignee:: Mort, Ben

Reporter:: German, Gordon

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Feature Progress

Story Point Burn-up: (0%)

Feature Estimate: 0.0

	Issues	Story Points
To Do	1	0.0
In Progress	1	1.0
Complete	0	0.0
Total	2	1.0

Dates

Created:: 30/Aug/24 2:54 AM

Updated:: 9 hours ago

HPC-based Science Platform