Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3869

Workload tracking - wandb.ai / Nsight / Intel Adviser

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      The SRC Workload GitLab repository contains various projects that allow us to test the functionality of SRC-like services. These tasks and workflows perform scientific analysis, giving the user a result. They do not currently provide any information that tracks the compute load or intermediary metrics. Wandb.ai (weights and biasses) provides a remote server that tracks all metrics when a task/workflow is run, allowing us to tune parameters and refine how hardware is used. It is particularly designed for machine learning applications, allowing you to keep track of your models, loss functions and metrics for any run. In our case, we can make use of it for the ML task, but it will provide benefits to all the tasks in the repo. We also want to compare this to other metric tracking systems which can provide a much more complete picture of the compute load without bloat - this could in future include testing things like Nsight and Intel Advisor, and collecting the metric tracking information in a consistent way.

      Show
      The SRC Workload GitLab repository contains various projects that allow us to test the functionality of SRC-like services. These tasks and workflows perform scientific analysis, giving the user a result. They do not currently provide any information that tracks the compute load or intermediary metrics. Wandb.ai (weights and biasses) provides a remote server that tracks all metrics when a task/workflow is run, allowing us to tune parameters and refine how hardware is used. It is particularly designed for machine learning applications, allowing you to keep track of your models, loss functions and metrics for any run. In our case, we can make use of it for the ML task, but it will provide benefits to all the tasks in the repo. We also want to compare this to other metric tracking systems which can provide a much more complete picture of the compute load without bloat - this could in future include testing things like Nsight and Intel Advisor, and collecting the metric tracking information in a consistent way.
    • Hide

      AC1: Add wandb.ai to the CNN task

      AC2: Demo it in use

      AC3: (stretch) include other metric tracking - Nsighte/Intel Advisor

      Show
      AC1: Add wandb.ai to the CNN task AC2: Demo it in use AC3: (stretch) include other metric tracking - Nsighte/Intel Advisor
    • 0.5
    • 1
    • 0
    • PI23 - UNCOVERED

    • example-workflows-and-benchmarks tests-compilation

    Description

      Wandb.ai provides a remote server that tracks metrics when a task/workflow is run, allowing us to tune parameters and refine how hardware is used (e.g. CPU/GPU/RAM use). It is particularly designed for machine learning applications, allowing you to keep track of your models, loss functions and metrics for any run. In our case, we can make use of it for the ML task, but it will provide benefits to all the tasks in the repo. 

      It is free for academic institutions and personal accounts. 

      We also want to compare this to other metric tracking systems which can provide a much more complete picture of the compute load without bloat - this includes testing things like Nsight and Intel Advisor, and collecting the metric tracking information in a consistent way.

      Attachments

        Issue Links

          Structure

            Activity

              People

                P.Llopis Llopis, Pablo
                A.Clarke Clarke, Alex
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (0%)

                  Feature Estimate: 0.5

                  IssuesStory Points
                  To Do00.0
                  In Progress   00.0
                  Complete00.0
                  Total00.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel