Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-3109

Configure PSS machines Kelvin and Tengu, and write first version of CI script for PSS to run on these machines

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • Feature
    • Should have
    • PI17
    • COM PSS SW
    • None
    • Data Processing
      • Extending our PSS CI coverage to include running executables on specialised hardware
      • Triggering CI pipelines directly from panda and cheetah repository events (e.g commit/MR/Merge)
    • Hide

      Given: We currently have no running CICD for code check ins
      And: We can't use SKAO gitlab docker images for FPGA CICD

      When: We setup our own reference platform
      And: Configure gitlab runners to use it

      Then: Our gitlab runners will be correctly publishing their available resources as gitlab tags (os, cuda, nasm, boost, gcc, cmake, rabbit, intel_fpga)
      And: code check in events on the panda and cheetah repos will launch CI pipelines which will include building and unit testing.
      And: We will have a set of recipes we can reuse to configure extra runners/restore or restore a damaged machine

      Show
      Given: We currently have no running CICD for code check ins And: We can't use SKAO gitlab docker images for FPGA CICD When: We setup our own reference platform And: Configure gitlab runners to use it Then: Our gitlab runners will be correctly publishing their available resources as gitlab tags (os, cuda, nasm, boost, gcc, cmake, rabbit, intel_fpga) And: code check in events on the panda and cheetah repos will launch CI pipelines which will include building and unit testing. And: We will have a set of recipes we can reuse to configure extra runners/restore or restore a damaged machine
    • 2.5
    • 2.5
    • 0
    • Team_PSS
    • Sprint 5
    • Hide

      We fully setup our Quad machine to act as a gitlab runner for non Cuda and Cuda builds, in release and debug modes for Panda and Cheetah and are now in the situation where these run on feature branches and the dev branch whenever there are commits. 

      The runners publish the currently relevant tags to be picked up by our CICD pipelines.

      We had a couple of issues with testing in both cases. There is an issue with the drivers on the Quad machine so on this machine the Panda Cuda tests fail for this reason only the non Cuda tests work fine.  The Cheetah unit tests we set up to run using the gitlab runner on Quad but due to some of them failing intermittently we decided to remove from the configuration for the time being.   

      We used a combination of manual installs (e.g. Cuda, boost) and Ansible recipes (gitlab runner) in the end to achieve the configuration of Quad for now. A number of the recipes required for future (e.g. boost, cmake) were written but haven't yet made it passed MR into the dev branch however the gitlab runner recipe is fully useable. 

      MR's for Panda and Cheetah:

      https://gitlab.com/ska-telescope/pss/ska-pss-panda/-/merge_requests/1

      https://gitlab.com/ska-telescope/pss/ska-pss-cheetah/-/merge_requests/24

      Screen shots of Panda and Cheetah pipelines respectively:

      Show
      We fully setup our Quad machine to act as a gitlab runner for non Cuda and Cuda builds, in release and debug modes for Panda and Cheetah and are now in the situation where these run on feature branches and the dev branch whenever there are commits.  The runners publish the currently relevant tags to be picked up by our CICD pipelines. We had a couple of issues with testing in both cases. There is an issue with the drivers on the Quad machine so on this machine the Panda Cuda tests fail for this reason only the non Cuda tests work fine.  The Cheetah unit tests we set up to run using the gitlab runner on Quad but due to some of them failing intermittently we decided to remove from the configuration for the time being.    We used a combination of manual installs (e.g. Cuda, boost) and Ansible recipes (gitlab runner) in the end to achieve the configuration of Quad for now. A number of the recipes required for future (e.g. boost, cmake) were written but haven't yet made it passed MR into the dev branch however the gitlab runner recipe is fully useable.  MR's for Panda and Cheetah: https://gitlab.com/ska-telescope/pss/ska-pss-panda/-/merge_requests/1 https://gitlab.com/ska-telescope/pss/ska-pss-cheetah/-/merge_requests/24 Screen shots of Panda and Cheetah pipelines respectively:
    • 18.1
    • Stories Completed, Outcomes Reviewed, Satisfies Acceptance Criteria, Accepted by FO
    • PI22 - UNCOVERED

    Description

      • Create verified and tested ansible recipes to configure gitlab CI runners suitable for running the PSS software product (within the ska-pss-ci-systems repo) These recipes will be deployed on Kelvin and Tengu.
      • Extend existing PSS gitlab runners kelvin and tengu to be compliant with these recipies thus enabling them to publish the full spectrum of avaialble resources as gitlab tags (currently they only advertise cuda).
      • Write Gitlab manifest for building branches of panda and cheetah for a single spin tailored to each of our reference platforms. The spin chosen will depend on the available resources advertised by the runners and we can scale up manually as ansible recipies are added. There will need to be a trigger for executing this manifest. Ideally this would be on commit, but given the nature of the runners as developer machines we may resort to overnight builds.
      • [STRETCH] A manual trigger with the option to specify the explcit version of the panda and cheetah branches will also be made avaialble. Note that this manifest is a stop-gap solution pending completion of a full coverage & cloud efficient solution being built in the pss-pipeline repository
      • [STRETCH] Setup triggers in the cheetah repository to execute the exisiting pss-pipeline repository ci pipeline whenever the above manifest is run.

      We can't rely on the using SKAO gitlab cloud based docker images for the Cuda or FPGA executable parts of our CICD so we need a reference platform setup for our specific CICD needs. We can then configure runners to run on this platform.

      Attachments

        Structure

          Activity

            People

              A.Noutsos Noutsos, Aristeidis
              L.Levin-Preston Levin-Preston, Lina
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Feature Progress

                Story Point Burn-up: (100.00%)

                Feature Estimate: 2.5

                IssuesStory Points
                To Do00.0
                In Progress   00.0
                Complete511.0
                Total511.0

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Structure Helper Panel