Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4169

Source finding on larger LOTSS dataset (100 images, ~400 GB)

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      Having tasks that use a significant amount of data is important for testing the hardware infrastructure at SRC sites. Currently all the tasks on the SRC workload Gitlab repository are minimal examples.

       

      Show
      Having tasks that use a significant amount of data is important for testing the hardware infrastructure at SRC sites. Currently all the tasks on the SRC workload Gitlab repository are minimal examples.  
    • Hide
      • AC1: Curate 100 LOFAR LOTSS images to run source-finding on with PYBDSF, and include a mechanism to download this data and run it.
      • AC2: Write a script that can do this in parallel, e.g. use a SLURM cluster or use python multiprocessing on a CPU cluster with 100 cores and 100+ GB RAM.
      • AC3: Add this task to the SRC workloads repo PYBDSF task consistent with that format.
      • AC4: provide a demo.
      Show
      AC1: Curate 100 LOFAR LOTSS images to run source-finding on with PYBDSF, and include a mechanism to download this data and run it. AC2: Write a script that can do this in parallel, e.g. use a SLURM cluster or use python multiprocessing on a CPU cluster with 100 cores and 100+ GB RAM. AC3: Add this task to the SRC workloads repo PYBDSF task consistent with that format. AC4: provide a demo.
    • 1
    • 1
    • 0
    • Team_LAVENDER
    • Sprint 2
    • Show
      AC1 was met.  AC2 was met using python multiprocessing. A report is here: https://confluence.skatelescope.org/display/SRCSC/%28LAV-317%29+Write+a+report+on+Confluence%2C+add+the+task+to+the+repository%2C+and+provide+a+demo AC3 was met. AC4 was met. Demo here:   https://confluence.skatelescope.org/pages/viewpage.action?pageId=265846037
    • 24.3
    • Stories Completed, Outcomes Reviewed
    • PI24 - UNCOVERED

    • PI22 PI23 SRC23-PB example-workflows-and-benchmarks tests-compilation

    Description

      The source-finding task currently runs on a single image, and we have a script that loops it over 15 images. This feature would increase this to run on ~100 images, and include a way to do this in parallel. We would use a VM with significant hardware attached, e.g. 100 CPU cores and 200 GB RAM. Alternatively, provide an example using a Slurm cluster. This can be added to the PYBDSF task in the SRC workloads repo as a separate runnable script. 

      Attachments

        Issue Links

          Structure

            Activity

              People

                A.Clarke Clarke, Alex
                A.Clarke Clarke, Alex
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (100.00%)

                  Feature Estimate: 1.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete77.0
                  Total77.0

                  Dates

                    Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel