Uploaded image for project: 'SAFe Program'
  1. SAFe Program
  2. SP-4087

Collecting, evaluating and sharing K8s on OpenStack good practice

Change Owns to Parent OfsSet start and due date...
    XporterXMLWordPrintable

Details

    • SRCnet
    • Hide

      Ideally, all SRCNet services can run on Kubernetes across all SRCNet sites.

      It is believed that many SRCNet sites are already running OpenStack.

      This feature is looking at evaluating if Azimuth can help OpenStack sites run performant Kubernetes clusters, within a minimal local effort required.

      In addition, more consistency across SRCNet sites might help simplify rolling out new versions of SRCNet services.

      Show
      Ideally, all SRCNet services can run on Kubernetes across all SRCNet sites. It is believed that many SRCNet sites are already running OpenStack. This feature is looking at evaluating if Azimuth can help OpenStack sites run performant Kubernetes clusters, within a minimal local effort required. In addition, more consistency across SRCNet sites might help simplify rolling out new versions of SRCNet services.
    • Hide

      AC1: Document what HPC and Cloud CoP members do today around support (or not) Kubernetes, including details about how that runs on OpenStack or something else.

      AC2: Demo providing Kubernetes using Azimuth at a site other than Cambridge Arcus, currently RAL STFC cloud are offering to do that.

      AC3: (stretch) Get one or more non-UK sites testing if the Azimuth installation docs are sufficient to get Azimuth running at their site, ideally updating the docs with any gaps we find.

      AC4: (stretch) Ideally at least one SRCNet site is able to test and demo Azimuth's existing built-in support for at least one of: NVidia GPUs and RDMA networking using Mellanox ConnectEx5 (or above) NICs.

      Show
      AC1: Document what HPC and Cloud CoP members do today around support (or not) Kubernetes, including details about how that runs on OpenStack or something else. AC2: Demo providing Kubernetes using Azimuth at a site other than Cambridge Arcus, currently RAL STFC cloud are offering to do that. AC3: (stretch) Get one or more non-UK sites testing if the Azimuth installation docs are sufficient to get Azimuth running at their site, ideally updating the docs with any gaps we find. AC4: (stretch) Ideally at least one SRCNet site is able to test and demo Azimuth's existing built-in support for at least one of: NVidia GPUs and RDMA networking using Mellanox ConnectEx5 (or above) NICs.
    • 2
    • 2
    • 0
    • Team_CORAL, Team_TEAL
    • Sprint 4
    • Overdue
    • PI23 - UNCOVERED

    • SRCNet0.1 Teal-D operations-and-infrastructure site-provisioning

    Description

      Some SRCNet services are expected to be best run on Kubernetes. For SRCNet sites that are already looking to use OpenStack, it would be good if we can share efforts on good practices for creating a performant, maintainable and upgradable Kubernetes cluster running on OpenStack. There are currently many ways to achieve that aim, all with different trade off depending on the use case.

      Azimuth can provide performant on demand and upgradable Kubernetes clusters, using regular project level user access to an OpenStack project. (There is a work in progress effort for a Magnum driver to expose this stack via Magnum APIs, but that does require cloud changes, that can not be assumed).

      Details on how to install Azimuth on an OpenStack are already available. Additional feedback on how repeatable this process is would be very welcome:
      https://stackhpc.github.io/azimuth-config/

      We are proposing that we work with the UK STFC Cloud team to try and get this up and running and following all required security policies. They do already make some use of the capi-helm-charts directly. It would be good to see if there are other SRCNet sites where sharing effort around Kubernetes on OpenStack would be helpful. It would be good to compare the approach with other efforts globally, and start to build consensus on how best to share efforts in this area.

      In particular it would be good to cover areas such as:

      • Baremetal cluster support (via Ironic+Nova)
      • GPU support (VMs with passthrough, not vGPU)
      • RDMA support, Ethernet and Infini-band

      Attachments

        Issue Links

          Structure

            Activity

              People

                r.bolton Bolton, Rosie
                D.Watson Watson, Duncan
                Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Feature Progress

                  Story Point Burn-up: (43.14%)

                  Feature Estimate: 2.0

                  IssuesStory Points
                  To Do620.0
                  In Progress   59.0
                  Complete422.0
                  Total1551.0

                  Dates

                    Created:
                    Updated:

                    Structure Helper Panel