Kubernetes Cluster

The Kubernetes Cluster blueprint triggers the required workflows to provide a Kubernetes cluster with TKGi. The version of Kubernetes cannot be chosen and depends on the version of TKGi deployed by Swisscom. The current installed versions are:

The Kubernetes cluster deployed by TKGi is a standard Kubernetes cluster and most of the standard concepts are available. Please note that there might be some differences with what can be found on various Kubernetes offerings such as available annotations or pre-configured Custom Resource Definitions. Documentation regarding TKGi can be found on the VMware documentationopen in new window. Documentation on Kubernetes is available on its official websiteopen in new window.

The cluster is deployed within the chosen Kubernetes environment and its API is accessible through an IP randomly taken from the VIP Pool specified in your environment, using for instance the kubectl CLI tool on this IP on port 8443. Please note that you should configure your DNS to point the domain selected at provisioning to this IP, though you may also edit your local hosts file or disable host verification in your local Kubernetes CLI configuration.

The plans are described within the service description. Depending on the plan selected, the cluster will be deployed with 1 or 3 Master nodes, on which no containers can be scheduled, and at least 1 Worker node. A Load Balancer is provisioned as well which will by default configured to handle the API service on one IP, with another IP reserved for Ingress traffic, as TKGI comes with a default Ingress Controlleropen in new window provided by VMware NCPopen in new window.

In this version of the service, privileged containers are enabled.

Plans

Flex cluster plans

You could customize the compute instances (CPU, Memory, Storage) for your clusters via Worker Node Pools.

During cluster creation, you need to choose between these two plans:

PlanControl PlaneNodes
basic1 Master w/ 2 vCPUs, 8GB RAM, 64GB StorageCustomizable via worker node pools
advanced3 Masters w/ 4 vCPUs, 16GB RAM, 128GB StorageCustomizable via worker node pools

Legacy Plans

Before flex cluster plans (01.06.2023), the service used T-shirt size plans.

PlanMasterWorkerWorker CountWorker Storage
2c8r.dev1 Master w/ 2 CPU, 8GB RAM2 CPU, 8GB RAM1 - 1030GB
2c8r.std3 Masters w/ 2 CPU, 8GB RAM2 CPU, 8GB RAM1 - 1030GB
4c16r.std3 Masters w/ 4 CPU, 16GB RAM4 CPU, 16GB RAM1 - 3030GB
4c32r.mem3 Masters w/ 4 CPU, 16GB RAM4 CPU, 32GB RAM1 - 5080GB
8c32r.std3 Masters w/ 4 CPU, 16GB RAM8 CPU, 32GB RAM1 - 4080GB
8c64r.mem3 Masters w/ 4 CPU, 16GB RAM8 CPU, 64GB RAM1 - 50150GB
16c64r.std3 Masters w/ 4 CPU, 16GB RAM16 CPU, 64GB RAM1 - 60150GB
16c128r.mem3 Masters w/ 4 CPU, 16GB RAM16 CPU, 128GB RAM1 - 50200GB

Default Storage

If you created, updated or upgraded your cluster after 02.03.2023, it has been migrated to the VMware CSI storage driver. This means that you should use the caas-persistent-storage storage class. Please read more here.

If you are still running on the legacy VCP storage driver, then use the pks-default-thick-storage storage class.

Creating a persistent volume claim will trigger the creation of a ReadWriteOnce Persistent Volume represented as a vmdk volume in the underlying vSphere Datastore associated with your cluster. Each cluster comes with an allowance of 1TB. This storage could be extended as a day-2 action.

Using hostpath as storage is highly discouraged since the updates from Swisscom might delete your worker node.

You could attach up to 45 PVs per worker node.

Kubernetes Services

Services with the type loadbalancer take an IP from the floating pool of your environment. Your workload will be accessible from this IP address. Please note that you can ignore the first IP (169.254.x.y) which is an internal artifact.

$ kubectl get svc http-lb
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP                   PORT(S)        AGE
http-lb   LoadBalancer   10.100.200.155   169.254.128.3,172.16.200.27   80:32499/TCP   28s

The service type nodePort is not supported by TKGi.

Upgrade procedure

Upgrading a Swisscom-provisioned (and TKGI-based) Kubernetes cluster updates the Tanzu Kubernetes Grid Integrated Edition version (TKGi system processes and pods) and the Kubernetes version of the cluster.

The upgrade procedure includes the following steps:

  1. The cluster is positioned in a queue for upgrading (this allows simulateneous triggering of upgrade on many clusters). The cluster could stay in the backend queue for days and eventually the cluster upgrade will get processed.
  2. One extra node is added to compensate for the rolling upgrade of the cluster where the nodes of the cluster are drained and updated one after another sequentially. Important: the extra node is not billed.
  3. The upgrade of the cluster is started:
    1. The masters are updated one by one for availability. This includes TKGi system processes and Kubernetes processes (e.g. kube-apiserver, kube-controller-manager, etc.).

    2. The nodes are updated one by one. Drain might get stuck because of customer defined PodDiscuptionBudgets. In such cases, the drain is forced after 30mins.

      The node update includes TKGi system processes and Kubernetes processes (e.g. kubelet).

      Once the node is updated, it is marked as schedulable again.

  4. The extra node is removed.

Upgrading of a cluster could be useful even when the cluster is already on the latest version.

  1. It propagates any platform changes done by Swisscom.
  2. It could be used as a trigger for one-time migrations (e.g. storage driver migration, container runtime migration). In such cases, additional information is advertised.
  3. It could be used as a trigger for regular maintenance operations (e.g. certificates rotation).
  4. It corrects any configuration drift on the cluster - manual changes on system processes will be reverted.

Important: Skipping MINOR versions when upgrading is unsupported. For example, upgrading from TKGi 1.11.x to 1.13.x is highly not recommended.

Migration to containerd CRI

A container runtime is software that can execute the containers that make up a Kubernetes pod. The kubelet process uses the container runtime interface (CRI) as an abstraction so that you can use any compatible container runtime.

In its earliest releases, Kubernetes offered compatibility with one container runtime: Docker. Later in the Kubernetes project's history, cluster operators wanted to adopt additional container runtimes. The CRI was designed to allow this kind of flexibility - and the kubelet began supporting CRI. However, because Docker existed before the CRI specification was invented, the Kubernetes project created an adapter component, dockershim. The dockershim adapter allows the kubelet to interact with Docker as if Docker were a CRI compatible runtime.

Kubernetes v1.24 removes the support for Docker as a container runtime by removing the built-in dockershim component. Upgrading your cluster to Kubernetes v1.24 will force the migration to containerd.

However, you are able to migrate to containerd before Kubernetes v1.24 is released (TKGi 1.14).

The migration happens automatically if you update or upgrade a cluster after 02.03.2023.

You could continue using the same Docker images after the migration.

You could find more information regarding the official migration process on the official Kubernetes web siteopen in new window.

Before Kubernetes v1.24 is released (TKGi 1.14) we are able to roll back a cluster to Docker. If you need this please open a support ticket.

Migration to VMware CSI storage driver

The Container Storage Interface (CSI) was designed to help Kubernetes replace its existing, in-tree storage driver mechanisms - especially vendor specific plugins. Support for using CSI drivers was introduced to make it easier to add and maintain new integrations between Kubernetes and storage backend technologies.

The VMware's in-tree VCP storage driver is getting deprecated and should be replaced with the VMware's CSI storage driver.

The migration of the existing legacy PVCs to CSI happens seamlessly once triggered.

The migration happens automatically if you update or upgrade a cluster after 02.03.2023.

However, after the cluster is migrated to use CSI, you should start using the new caas-persistent-storage storage class when requesting new PVCs. Using the old pks-default-thick-storage storage class will result in an error.

If you use additional persistent storage, you do not need to define your customer storage class anymore. This is explained here.

Once all clusters are migrated to CSI, the new "caas-persistent-storage" storage class will become the default one.

The CSI driver includes many improvements over the legacy VCP driver, including support for features such as volume expansion. Please read more on the official VMware web siteopen in new window.

Volumes snapshots with the VMware's CSI driver will be avilable with TKGi 1.16.

ETCD encryption

All Kubernetes clusters created after 02.03.2023 are with encrypted ETCD database. The encryption key is managed on the provider side.

Foundation & Failure domains

Once your cluster is created, you will find 2 fields called Foundation and Failure Domains. Named after Swiss rivers such as Limmat, Aare, Sihl, the foundation field represents the production stack where your kubernetes clusters are running. All clusters under the same foundation will be subject to the same maintenance operations, such as new kubernetes versions, general maintenance and so on. This field will be used in our communication regarding maintenance operations or any relevant information. Please specify it if creating a support ticket.

The Failure Domains field indicates the 3 geographic locations of the datacenters used to provide the HA setup. Master and Worker nodes will be distributed across the 3 locations. Therefore one third of your workload will be distributed in each location. This means you will have compute resources (workers) in every location. You can influence the pod placement stategy if you use the kubernetes topology field in your scheduling strategy.

Last Updated: