Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4170

We are able to test workflows against a huge (real) data set

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      To update:

      Having a GPU heavy astronomical task would be beneficial for testing GPU clusters at SRC sites. Currently we only have the radio galaxy image classifier which utilises one GPU and is not data heavy (300 images, completes in < 2 minutes), and can't be scaled up without significant development time.

      A spectrum classifier could use 4 million spectra from SDSS, and a GPU cluster to classify them. It would be a much easier task to develop than scaling up the image classifier task. This data intensive task would be highly beneficial for testing both CPU and GPU clusters at SRC sites.

      Show
      To update: Having a GPU heavy astronomical task would be beneficial for testing GPU clusters at SRC sites. Currently we only have the radio galaxy image classifier which utilises one GPU and is not data heavy (300 images, completes in < 2 minutes), and can't be scaled up without significant development time. A spectrum classifier could use 4 million spectra from SDSS, and a GPU cluster to classify them. It would be a much easier task to develop than scaling up the image classifier task. This data intensive task would be highly beneficial for testing both CPU and GPU clusters at SRC sites.
    • Hide
      • Ingest a large dataset O(1M files) into SRCNet Data Lake
      • Enable at least one workflow to run against the dataset on CPU
      • Report scores from one site

      Stretch:

      • Enable at least one workflow to run against the dataset on GPU clusters
      • Add a spectrum classifier to the SRC workloads repo
      • Report scores from multiple sites
      Show
      Ingest a large dataset O(1M files) into SRCNet Data Lake Enable at least one workflow to run against the dataset on CPU Report scores from one site Stretch: Enable at least one workflow to run against the dataset on GPU clusters Add a spectrum classifier to the SRC workloads repo Report scores from multiple sites
    • 1
    • 1
    • 0
    • PI24 - UNCOVERED

    • PI23 SRC23-PB example-workflows-and-benchmarks tests-compilation

    Description

      To update:

      The Sloan Digital Sky Survey has spectra for millions of sources. These are mainly analysed by fitting templates which is slow. A neural network classifier could achieve this much faster, and such classifiers are becoming more mainstream in such research problems. Such a workflow enables us to test GPU clusters with high data throughput, which we currently do not have. 

      This will involve downloading 4 million spectra from SDSS, cleaning the data, and building a CNN classifier to assign the labels to them. Star/galaxy/quasar are the more broad labels, but finer labels could be done depending on the quality of the spectra (e.g. star type). Such a workflow is highly applicable to radio spectral data as well.

      70% of this task has already been completed by Alex Clarke. They anticipate continuing this in I&P and pulling it as a feature when near completion.

       

      Former title: Get a spectrum CNN classifier working on optical SDSS data

      Attachments

        Issue Links

          Structure

            Activity

              People

                Unassigned Unassigned
                A.Clarke Clarke, Alex
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (0%)

                  Feature Estimate: 1.0

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete00.0
                  Total00.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel