Upgrading Kubernetes Clusters
This documentation is intended to provide the manual process for upgrading the server Operating Systems, Kubernetes to 1.18.8, and any additional upgrades. This provides example output and should help in troubleshooting should the automated processes experience a problem.
All of the steps required to prepare for an installation should be completed prior to starting this process.
Server and Kubernetes Upgrades
Patch Servers
As part of quarterly upgrades, the Operating Systems for all servers need to be upgraded.
For the control plane, there isn’t a “pool” so just patch each server and reboot it. Do one server at a time and check the status of the cluster before moving to subsequent master servers in the control plane.
For the worker nodes, you’ll need to drain each of the workers before patching and rebooting. Run the following command to both confirm the current version of 1.17.6 and that all nodes are in a Ready state to be patched:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ndld0cuomkube1.internal.pri Ready master 259d v1.17.6
ndld0cuomkube2.internal.pri Ready master 259d v1.17.6
ndld0cuomkube3.internal.pri Ready master 259d v1.17.6
ndld0cuomknode1.internal.pri Ready <none> 259d v1.17.6
ndld0cuomknode2.internal.pri Ready <none> 259d v1.17.6
ndld0cuomknode3.internal.pri Ready <none> 259d v1.17.6
To drain a server, patch, and then return the server to the pool, follow the steps below:
$ kubectl drain [nodename] --delete-local-data --ignore-daemonsets
Then patch the server and reboot:
# yum upgrade -y
# shutdown -t 0 now -r
Finally bring the node back into the pool.
$ kubectl uncordon [nodename]
Update Versionlock Information
Currently the clusters have locked kubernetes to version 1.17.6, kubernetes-cni to version 0.7.5, and docker to 1.13.1-161. The locks on each server need to be removed and new locks put into place for the new version of kubernetes, kubernetes-cni, and docker where appropriate.
Versionlock file location: /etc/yum/pluginconf.d/
Simply delete the existing locks:
/usr/bin/yum versionlock delete "kubelet.*"
/usr/bin/yum versionlock delete "kubectl.*"
/usr/bin/yum versionlock delete "kubeadm.*"
/usr/bin/yum versionlock delete "kubernetes-cni.*"
/usr/bin/yum versionlock delete "docker.*"
/usr/bin/yum versionlock delete "docker-common.*"
/usr/bin/yum versionlock delete "docker-client.*"
/usr/bin/yum versionlock delete "docker-rhel-push-plugin.*"
And then add in the new locks at the desired levels:
/usr/bin/yum versionlock add "kubelet-1.18.8-0.*"
/usr/bin/yum versionlock add "kubectl-1.18.8-0.*"
/usr/bin/yum versionlock add "kubeadm-1.18.8-0.*"
/usr/bin/yum versionlock "docker-1.13.1-162.*"
/usr/bin/yum versionlock "docker-common-1.13.1-162.*"
/usr/bin/yum versionlock "docker-client-1.13.1-162.*"
/usr/bin/yum versionlock "docker-rhel-push-plugin-1.13.1-162.*"
/usr/bin/yum versionlock "kubernetes-cni-0.8.6-0.*"
Then install the updated kubernetes and docker binaries. Note that the versionlocked versions and the installed version must match:
/usr/bin/yum install kubelet-1.18.8-0.x86_64
/usr/bin/yum install kubectl-1.18.8-0.x86_64
/usr/bin/yum install kubeadm-1.18.8-0.x86_64
/usr/bin/yum install docker-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-common-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-client-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-rhel-push-plugin-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install kubernetes-cni-0.8.6-0.x86_64
Upgrade Kubernetes
Using the kubeadm command on the first master server, you can review the plan and then upgrade the cluster:
[root@ndld0cuomkube1 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.17.6
[upgrade/versions] kubeadm version: v1.18.8
I0901 16:37:26.141057 32596 version.go:252] remote version is much newer: v1.19.0; falling back to: stable-1.18
[upgrade/versions] Latest stable version: v1.18.8
[upgrade/versions] Latest stable version: v1.18.8
[upgrade/versions] Latest version in the v1.17 series: v1.17.11
[upgrade/versions] Latest version in the v1.17 series: v1.17.11
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT AVAILABLE
Kubelet 9 x v1.17.6 v1.17.11
Upgrade to the latest version in the v1.17 series:
COMPONENT CURRENT AVAILABLE
API Server v1.17.6 v1.17.11
Controller Manager v1.17.6 v1.17.11
Scheduler v1.17.6 v1.17.11
Kube Proxy v1.17.6 v1.17.11
CoreDNS 1.6.5 1.6.7
Etcd 3.4.3 3.4.3-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.17.11
_____________________________________________________________________
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT AVAILABLE
Kubelet 9 x v1.17.6 v1.18.8
Upgrade to the latest stable version:
COMPONENT CURRENT AVAILABLE
API Server v1.17.6 v1.18.8
Controller Manager v1.17.6 v1.18.8
Scheduler v1.17.6 v1.18.8
Kube Proxy v1.17.6 v1.18.8
CoreDNS 1.6.5 1.6.7
Etcd 3.4.3 3.4.3-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.18.8
_____________________________________________________________________
There are likely newer versions of Kubernetes control plane containers available. In order to maintain consistency across all clusters, only upgrade the masters to 1.18.8.
[root@ndld0cuomkube1 ~]# kubeadm upgrade apply 1.18.8
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.18.8"
[upgrade/versions] Cluster version: v1.17.6
[upgrade/versions] kubeadm version: v1.18.8
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.8"...
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: bd6dbccfa412f07652db6f47485acd35
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 825ea808f14bdad0c2d98e038547c430
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: 1caf2ef6d0ddace3294395f89153cef9
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.8" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests766631209"
W0901 16:44:07.979317 10575 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: bd6dbccfa412f07652db6f47485acd35
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: 19eda19deaac25d2bb9327b8293ac498
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 825ea808f14bdad0c2d98e038547c430
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 9dda1d669f9a43cd117cb5cdf36b6582
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: 1caf2ef6d0ddace3294395f89153cef9
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: cb2a7e4997f70016b2a80ff8f1811ca8
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.18.8". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
Update Control Planes
On the second and third master, run the kubeadm upgrade apply 1.18.8 command and the control plane will be upgraded.
Update File and Directory Permissions
Verify the permissions match the table below once the upgrade is complete:
Path or File | user:group | Permissions |
/etc/kubernetes/manifests/etcd.yaml | root:root | 0644 |
/etc/kubernetes/manifests/kube-apiserver.yaml | | 0644 |
/etc/kubernetes/manifests/kube-controller-manager.yaml | root:root | 0644 |
/etc/kubernetes/manifests/kube-scheduler | root:root | 0644 |
/var/lib/etcd | root:root | 0700 |
/etc/kubernetes/admin.conf | root:root | 0644 |
/etc/kubernetes/scheduler.conf | root:root | 0644 |
/etc/kubernetes/controller-manager.conf | root:root | 0644 |
/etc/kubernetes/pki | root:root | 0755 |
/etc/kubernetes/pki/ca.crt | root:root | 0644 |
/etc/kubernetes/pki/apiserver.crt | root:root | 0644 |
/etc/kubernetes/pki/apiserver-kubelet-client.crt | root:root | 0644 |
/etc/kubernetes/pki/front-proxy-ca.crt | root:root | 0644 |
/etc/kubernetes/pki/front-proxy-client.crt | root:root | 0644 |
/etc/kubernetes/pki/sa.pub | root:root | 0644 |
/etc/kubernetes/pki/ca.key | root:root | 0600 |
/etc/kubernetes/pki/apiserver.key | root:root | 0600 |
/etc/kubernetes/pki/apiserver-kubelet-client.key | root:root | 0600 |
/etc/kubernetes/pki/front-proxy-ca.key | root:root | 0600 |
/etc/kubernetes/pki/front-proxy-client.key | root:root | 0600 |
/etc/kubernetes/pki/sa.key | root:root | 0600 |
/etc/kubernetes/pki/etcd | root:root | 0755 |
/etc/kubernetes/pki/etcd/ca.crt | root:root | 0644 |
/etc/kubernetes/pki/etcd/server.crt | root:root | 0644 |
/etc/kubernetes/pki/etcd/peer.crt | root:root | 0644 |
/etc/kubernetes/pki/etcd/healthcheck-client.crt | root:root | 0644 |
/etc/kubernetes/pki/etcd/ca.key | root:root | 0600 |
/etc/kubernetes/pki/etcd/server.key | root:root | 0600 |
/etc/kubernetes/pki/etcd/peer.key | root:root | 0600 |
/etc/kubernetes/pki/etcd/healthcheck-client.key | root:root | 0600 |
Update Manifests
During the kubeadm upgrade, the current control plane manifests are moved from /etc/kubernetes/manifests into /etc/kubernetes/tmp and new manifest files deployed. There are multiple settings and permissions that need to be reviewed and updated before the task is considered completed.
The kubeadm-config configmap has been updated to point to bldr0cuomrepo1.internal.pri:5000 however it and the various container configurationsshould be checked anyway. One of the issues is if it’s not updated or used, you’ll have to make the update manually including manually editing the kube-proxy daemonset configuration.
Note that when a manifest is updated, the associated image is reloaded. No need to manage the pods once manifests are updated.
etcd Manifest
Verify and update etcd.yaml
- Change imagePullPolicy to Always
- Change image switching g8s.gcr.io with bldr0cuomrepo1.internal.pri:5000
kube-apiserver Manifest
Verify and update kube-apiserver.yaml
- Add AlwaysPullImages and ResourceQuota admission controllers to the –enable-admission-plugins line
- Change imagePullPolicy to Always
- Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000
kube-controller-manager Manifest
Verify and update kube-controller-manager.yaml
- Add ” – –cluster-name=kubecluster-[site]” after ” – –cluster-cidr=192.168.0.0/16″
- Change imagePullPolicy to Always
- Change image switching k8s.gcr.io to bldr0cuomrepo1.internal.pri:5000
kube-scheduler Manifest
Verify and update kube-scheduler.yaml
- Change imagePullPolicy to Always
- Change image switching k8s,gcr.io to bldr0cuomrepo1.internal.pri:5000
Update kube-proxy
You’ll need to edit the kube-proxy daemonset to change the imagePullPolicy. Check the image tag at the same time.
$ kubectl edit daemonset kube-proxy -n kube-system
- Change imagePullPolicy to Always.
- Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000
Save the changes.
Update coredns
You” need to edit the coredns deployment to change the imagePullPolizy. Check the image tag at the same time.
$ kubectl edit deployment coredns -n kube-system
- Change imagePullPolicy to Always
- Change image switching k8s.gcr.io to bldr0cuomrepo1.internal.pri:5000
Save the changes
Restart kubelet
Once done, kubelet and docker needs to be restarted on all nodes.
systemctl daemon-reload
systemctl restart kubelet
systemctl restart docker
Verify
Once kubelet has been restarted on all nodes, verify all nodes are at 1.18.8.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ndld0cuomkube1.intrado.sqa Ready master 259d v1.18.8
ndld0cuomkube2.intrado.sqa Ready master 259d v1.18.8
ndld0cuomkube3.intrado.sqa Ready master 259d v1.18.8
ndld0cuomknode1.intrado.sqa Ready <none> 259d v1.18.8
ndld0cuomknode2.intrado.sqa Ready <none> 259d v1.18.8
ndld0cuomknode3.intrado.sqa Ready <none> 259d v1.18.8
Configuration Upgrades
Configuration files are on the tool servers (lnmt1cuomtool11) in the /usr/local/admin/playbooks/cschelin/kubernetes/configurations directory and the expectation is you’ll be in that directory when directed to apply configurations.
Calico Upgrade
In the calico directory, run the following command:
$ kubectl apply -f calico.yaml
configmap/calico-config configured
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.io/calico-node unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
daemonset.apps/calico-node configured
serviceaccount/calico-node unchanged
deployment.apps/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged
After calico is applied, the calico-kube-controllers pod will restart and then the calico-node pod restarts to retrieve the updated image.
Pull the calicoctl binary and copy it to /usr/local/bin, then verify the version. Note that this has likely already been done on the tool server. Verify it before pulling the binary.
$ curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.16.0/calicoctl
Verification
Verify the permissions of the files once the upgrade is complete.
Path or File | user:group | Permissions |
/etc/cni/net.d/10-calico-conflist | root:root | 0644 |
/etc/cni/net.d/calico-kubeconfig | root:root | 0644 |
metrics-server Upgrade
In the metrics-server directory, run the following command:
$ kubectl apply -f components.yaml
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
serviceaccount/metrics-server unchanged
deployment.apps/metrics-server configured
service/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
Once the metrics-server deployment has been updated, the pod will restart.
kube-state-metrics Upgrade
In the kube-state-metrics directory, run the following command:
$ kubectl apply -f .
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics configured
clusterrole.rbac.authorization.k8s.io/kube-state-metrics configured
deployment.apps/kube-state-metrics configured
serviceaccount/kube-state-metrics configured
service/kube-state-metrics configured
Once the kube-state-metrics deployment is updated, the pod will restart.
Filebeat Upgrade
Filebeat uses Elastic Stack clusters in four environments. Filebeat itself is installed on all clusters. Ensure you’re managing the correct cluster when upgrading the filebeat container as configurations are specific to each cluster.
Change to the appropriate cluster context directory and run the following command:
$ kubectl apply -f filebeat-kubernetes.yaml
Verification
Essentially monitor each cluster. You should see the filebeat containers restarting and returning to a Running state.
$ kubectl get pods -n monitoring -o wide