Backing Up A vCenter Appliance

It’s actually built into the 6.5 vCenter Appliance where you can back up the server data. Log in to https://appliance:5480 and under Summary select Backup.

Couple of interesting tips as I proceeded though.

Backup logs are located in /var/log/vmware/applmgmt as backup.log. Could help determine why a backup isn’t working.

In the Appliance, check your settings. My DNS server and Default Gateway were incorrect and no Time Server was set. I made those updates however was getting an error for the Default Gateway. I was able to correct it on the command line of the server though.

# /opt/vmware/share/vami/vami_config_net

 Main Menu

0)      Show Current Configuration (scroll with Shift-PgUp/PgDown)
1)      Exit this program
2)      Default Gateway
3)      Hostname
4)      DNS
5)      Proxy Server
6)      IP Address Allocation for eth0
Enter a menu number [0]: 2

Warning: if any of the interfaces for this VM use DHCP,
the Hostname, DNS, and Gateway parameters will be
overwritten by information from the DHCP server.

Type Ctrl-C to go back to the Main Menu

0)      eth0
Choose the interface to associate with default gateway [0]:
Gateway will be associated with eth0
IPv4 Default Gateway [192.168.1.254]:
IPv6 Default Gateway []:
Reconfiguring eth0...
net.ipv6.conf.eth0.disable_ipv6 = 1
Network parameters successfully changed to requested values

 Main Menu

0)      Show Current Configuration (scroll with Shift-PgUp/PgDown)
1)      Exit this program
2)      Default Gateway
3)      Hostname
4)      DNS
5)      Proxy Server
6)      IP Address Allocation for eth0
Enter a menu number [0]: 1

For the actual backup, the Location field needs a bleeding backslash. And it’s absolute. So 192.168.104.60/home/cschelin/vcenter/backups. The process also creates the directory, mkdir -p /home/cschelin/vcenter/backups.

The other error was the statsmonitor wasn’t running. A bit of hunting and found that as well.

# service-control --start vmware-statsmonitor
Perform start operation. vmon_profile=None, svc_names=['vmware-statsmonitor'], include_coreossvcs=False, include_leafossvcs=False
2020-11-25T16:39:26.028Z   Service statsmonitor state STOPPED

Successfully started service statsmonitor

And once that was done, I had a successful backup of the database.

Posted in Computers, VMware | Tagged , | Leave a comment

Upgrading VMware vCenter

I’m upgrading my VMware cluster to support VMware 7. Currently I’m running 6.5 on three older Dell R710’s and a Dell R410. I’ve rebuilt the R410 to run Ubuntu and run KVM on it as it’s a different VM management solution and the new work uses KVM.

The installation is working fine following the VMware instructions and my environment is somewhat small (but not tiny 🙂 ). I did get an error though. The installer was complaining that my credentials to log in to the existing 6.5 vCenter Appliance had expired. I tried to log in and was successful so it should be working. The stupid thing as noted in the log is it specifically says the password has not expired. So head scratch and google search.

2020-11-25T15:26:36.420Z - info: Password not expired
2020-11-25T15:26:36.421Z - error: sourcePrecheck: error in getting source Info: ServerFaultCode: Failed to authenticate with the guest operating system using the supplied credentials.

Found a solution. Turns out the root credentials actually did expire but expired credentials doesn’t prevent me from logging in.

root@lnmt1cuomvcenter [ ~ ]# grep -i chage /var/log/messages
2020-11-25T03:51:01.687836+00:00 lnmt1cuomvcenter chage[43395]: pam_unix(chage:account): expired password for user root (root enforced)
2020-11-25T03:51:01.688283+00:00 lnmt1cuomvcenter chage[43395]: Authentication token is no longer valid; new one required
2020-11-25T03:51:01.693029+00:00 lnmt1cuomvcenter chage[43398]: pam_unix(chage:account): expired password for user root (root enforced)
2020-11-25T03:51:01.693399+00:00 lnmt1cuomvcenter chage[43398]: Authentication token is no longer valid; new one required

Well shoot. Log in, change password, and try the installation again. And it works! Success. How annoying though 🙂

Posted in Computers, VMware | Tagged , | Leave a comment

kubeadm kubelet Problem

Over the past year and a bit, I’ve been using kubeadm to build and upgrade my clusters starting with Kubernetes 1.14. I switched from the home grown scripts I’d initially created for the 1.2 installation and continued through 1.12 to kubeadm in large part due to the automatic certificate renewals done when upgrading but also due to all the changes that needed to be followed up on between versions. The upgrade from 1.9 to 1.10 required a total rebuild of the clusters due to changes in the underlying networking tools and at 1.12, the certificates had expired causing no end of problems.

Every quarter, I’d research the upgrades, write up a page on what was changing, and create a doc on the upgrade process.

Recently when the first Master Node was rebooted, it failed to start up. Researching the problem found that the second and third Master Nodes started up without a problem. A search of the differences found /etc/kubernetes had a kubelet.conf and a bootstrap.kubelet.conf file and apparently the bootstrap.kubelet.conf file was refreshing the kubelet certificate however there isn’t a bootstrap.kubelet.conf file on the first Master Node.

While certificate management is done automatically by the kubeadm upgrade process, kubelet is not part of this process. It’s a separate binary that does need the certificate but it isn’t updated by kubeadm.

Further review found that as of Kubernetes 1.17, a bug was fixed. See, the kubelet.conf file in older versions contains the certificate for access to the cluster however the developers had identified that as a bug because the certificate was being upgraded, but in a separate file in /var/lib/kubelet/pki/kubelet-client-current.pem. But the kubelet.conf file wasn’t being upgraded to point to the updated file. It still contained the old, expired certificate.

Modifying the file to point to the current certificate took care of the problem and resolved it for future upgrades as well.

The bootstrap.kubelet.conf file was identified as a security issue and at least with 1.18.8 (what I’m currently running), it has been deleted after it was used to bootstrap the new Master Nodes into the cluster.

Posted in Computers, Kubernetes | Leave a comment

FreeIPA/Red Hat IDM

I’m working on bringing my 100+ servers under FreeIPA, aka a centralized Identity Management system. Since FreeIPA is an upstream source for Red Hat IDM, I added it to the title.

Installing FreeIPA on servers is bog simple. Run

# ipa-client-install
WARNING: ntpd time&date synchronization service will not be configured as
conflicting service (chronyd) is enabled
Use --force-ntpd option to disable it and force configuration of ntpd

Discovery was successful!
Client hostname: bldr0cuomshift.internal.pri
Realm: INTERNAL.PRI
DNS Domain: internal.pri
IPA Server: lnmt1cuomifidm1.internal.pri
BaseDN: dc=internal,dc=pri

Continue to configure the system with these values? [no]: yes
Skipping synchronizing time with NTP server.
User authorized to enroll computers: admin
Password for admin@INTERNAL.PRI:
Successfully retrieved CA cert
    Subject:     CN=Certificate Authority,O=INTERNAL.PRI
    Issuer:      CN=Certificate Authority,O=INTERNAL.PRI
    Valid From:  2020-06-27 03:52:06
    Valid Until: 2040-06-27 03:52:06

Enrolled in IPA realm INTERNAL.PRI
Created /etc/ipa/default.conf
New SSSD config will be created
Configured sudoers in /etc/nsswitch.conf
Configured /etc/sssd/sssd.conf
Configured /etc/krb5.conf for IPA realm INTERNAL.PRI
trying https://lnmt1cuomifidm1.internal.pri/ipa/json
[try 1]: Forwarding 'schema' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/json'
trying https://lnmt1cuomifidm1.internal.pri/ipa/session/json
[try 1]: Forwarding 'ping' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/session/json'
[try 1]: Forwarding 'ca_is_enabled' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/session/json'
Systemwide CA database updated.
Adding SSH public key from /etc/ssh/ssh_host_rsa_key.pub
Adding SSH public key from /etc/ssh/ssh_host_ed25519_key.pub
Adding SSH public key from /etc/ssh/ssh_host_ecdsa_key.pub
[try 1]: Forwarding 'host_mod' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/session/json'
Could not update DNS SSHFP records.
SSSD enabled
Configured /etc/openldap/ldap.conf
Configured /etc/ssh/ssh_config
Configured /etc/ssh/sshd_config
Configuring internal.pri as NIS domain.
Client configuration complete.
The ipa-client-install command was successful

Then I migrate local accounts over to use IDM instead. This has been working just fine on CentOS and Red Hat 7. The script I use was running:

# getent -s sss passwd [account]

This returns only accounts that are managed in IDM. So I can then change the file ownerships and group ownerships of files the local account owns.

Issue though. With CentOS 8 (and I assume Red Hat 8), the command returns information for local non-IDM accounts. This is unexpected behavior and breaks my script. Not killer of course but it does mean I would have to manually identify local users and make sure they’re in IDM before trying to convert them. And the script deletes the local user which causes other problems if it deletes a non-IDM local user.

# getent -s sss passwd bin
bin:x:1:1:bin:/bin:/sbin/nologin

This is unexpected behavior. With CentOS 7, this returns blank but with CentOS 8, this returns bin.

What happened is the sssd behavior changed. The enable_files_domain option under [sssd] in the /etc/sssd/sssd.conf file is set to false by default in CentOS 7 however in CentOS 8, the default is now true. This means local accounts are also cached by sssd and are returned when querying with getent.

After making the change, the following now happens as expected:

# getent -s sss passwd bin

And continue on with adding servers to IDM.

Posted in Computers, FreeIPA | Leave a comment

Kubernetes Manual Upgrade to 1.18.8

Upgrading Kubernetes Clusters

This documentation is intended to provide the manual process for upgrading the server Operating Systems, Kubernetes to 1.18.8, and any additional upgrades. This provides example output and should help in troubleshooting should the automated processes experience a problem.

All of the steps required to prepare for an installation should be completed prior to starting this process.

Server and Kubernetes Upgrades

Patch Servers

As part of quarterly upgrades, the Operating Systems for all servers need to be upgraded.

For the control plane, there isn’t a “pool” so just patch each server and reboot it. Do one server at a time and check the status of the cluster before moving to subsequent master servers in the control plane.

For the worker nodes, you’ll need to drain each of the workers before patching and rebooting. Run the following command to both confirm the current version of 1.17.6 and that all nodes are in a Ready state to be patched:

$ kubectl get nodes
NAME                           STATUS   ROLES    AGE    VERSION
ndld0cuomkube1.internal.pri    Ready    master   259d   v1.17.6
ndld0cuomkube2.internal.pri    Ready    master   259d   v1.17.6
ndld0cuomkube3.internal.pri    Ready    master   259d   v1.17.6
ndld0cuomknode1.internal.pri   Ready    <none>   259d   v1.17.6
ndld0cuomknode2.internal.pri   Ready    <none>   259d   v1.17.6
ndld0cuomknode3.internal.pri   Ready    <none>   259d   v1.17.6

To drain a server, patch, and then return the server to the pool, follow the steps below:

$ kubectl drain [nodename] --delete-local-data --ignore-daemonsets

Then patch the server and reboot:

# yum upgrade -y
# shutdown -t 0 now -r

Finally bring the node back into the pool.

$ kubectl uncordon [nodename]

Update Versionlock Information

Currently the clusters have locked kubernetes to version 1.17.6, kubernetes-cni to version 0.7.5, and docker to 1.13.1-161. The locks on each server need to be removed and new locks put into place for the new version of kubernetes, kubernetes-cni, and docker where appropriate.

Versionlock file location: /etc/yum/pluginconf.d/

Simply delete the existing locks:

/usr/bin/yum versionlock delete "kubelet.*"
/usr/bin/yum versionlock delete "kubectl.*"
/usr/bin/yum versionlock delete "kubeadm.*"
/usr/bin/yum versionlock delete "kubernetes-cni.*"
/usr/bin/yum versionlock delete "docker.*"
/usr/bin/yum versionlock delete "docker-common.*"
/usr/bin/yum versionlock delete "docker-client.*"
/usr/bin/yum versionlock delete "docker-rhel-push-plugin.*"

And then add in the new locks at the desired levels:

/usr/bin/yum versionlock add "kubelet-1.18.8-0.*"
/usr/bin/yum versionlock add "kubectl-1.18.8-0.*"
/usr/bin/yum versionlock add "kubeadm-1.18.8-0.*"
/usr/bin/yum versionlock "docker-1.13.1-162.*"
/usr/bin/yum versionlock "docker-common-1.13.1-162.*"
/usr/bin/yum versionlock "docker-client-1.13.1-162.*"
/usr/bin/yum versionlock "docker-rhel-push-plugin-1.13.1-162.*"
/usr/bin/yum versionlock "kubernetes-cni-0.8.6-0.*"

Then install the updated kubernetes and docker binaries. Note that the versionlocked versions and the installed version must match:

/usr/bin/yum install kubelet-1.18.8-0.x86_64
/usr/bin/yum install kubectl-1.18.8-0.x86_64
/usr/bin/yum install kubeadm-1.18.8-0.x86_64
/usr/bin/yum install docker-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-common-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-client-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-rhel-push-plugin-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install kubernetes-cni-0.8.6-0.x86_64

Upgrade Kubernetes

Using the kubeadm command on the first master server, you can review the plan and then upgrade the cluster:

[root@ndld0cuomkube1 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.17.6
[upgrade/versions] kubeadm version: v1.18.8
I0901 16:37:26.141057   32596 version.go:252] remote version is much newer: v1.19.0; falling back to: stable-1.18
[upgrade/versions] Latest stable version: v1.18.8
[upgrade/versions] Latest stable version: v1.18.8
[upgrade/versions] Latest version in the v1.17 series: v1.17.11
[upgrade/versions] Latest version in the v1.17 series: v1.17.11

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     9 x v1.17.6   v1.17.11

Upgrade to the latest version in the v1.17 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.17.6   v1.17.11
Controller Manager   v1.17.6   v1.17.11
Scheduler            v1.17.6   v1.17.11
Kube Proxy           v1.17.6   v1.17.11
CoreDNS              1.6.5     1.6.7
Etcd                 3.4.3     3.4.3-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.17.11

_____________________________________________________________________

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     9 x v1.17.6   v1.18.8

Upgrade to the latest stable version:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.17.6   v1.18.8
Controller Manager   v1.17.6   v1.18.8
Scheduler            v1.17.6   v1.18.8
Kube Proxy           v1.17.6   v1.18.8
CoreDNS              1.6.5     1.6.7
Etcd                 3.4.3     3.4.3-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.18.8

_____________________________________________________________________

There are likely newer versions of Kubernetes control plane containers available. In order to maintain consistency across all clusters, only upgrade the masters to 1.18.8.

[root@ndld0cuomkube1 ~]# kubeadm upgrade apply 1.18.8
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.18.8"
[upgrade/versions] Cluster version: v1.17.6
[upgrade/versions] kubeadm version: v1.18.8
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.8"...
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: bd6dbccfa412f07652db6f47485acd35
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 825ea808f14bdad0c2d98e038547c430
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: 1caf2ef6d0ddace3294395f89153cef9
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.8" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests766631209"
W0901 16:44:07.979317   10575 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: bd6dbccfa412f07652db6f47485acd35
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: 19eda19deaac25d2bb9327b8293ac498
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 825ea808f14bdad0c2d98e038547c430
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 9dda1d669f9a43cd117cb5cdf36b6582
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: 1caf2ef6d0ddace3294395f89153cef9
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: cb2a7e4997f70016b2a80ff8f1811ca8
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.18.8". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

Update Control Planes

On the second and third master, run the kubeadm upgrade apply 1.18.8 command and the control plane will be upgraded.

Update File and Directory Permissions

Verify the permissions match the table below once the upgrade is complete:

Path or Fileuser:groupPermissions
/etc/kubernetes/manifests/etcd.yaml root:root 0644
/etc/kubernetes/manifests/kube-apiserver.yaml 0644
/etc/kubernetes/manifests/kube-controller-manager.yaml root:root0644
/etc/kubernetes/manifests/kube-scheduler root:root 0644
/var/lib/etcd root:root 0700
/etc/kubernetes/admin.conf root:root 0644
/etc/kubernetes/scheduler.conf root:root 0644
/etc/kubernetes/controller-manager.conf root:root 0644
/etc/kubernetes/pki root:root 0755
/etc/kubernetes/pki/ca.crt root:root 0644
/etc/kubernetes/pki/apiserver.crt root:root 0644
/etc/kubernetes/pki/apiserver-kubelet-client.crt root:root 0644
/etc/kubernetes/pki/front-proxy-ca.crt root:root 0644
/etc/kubernetes/pki/front-proxy-client.crt root:root 0644
/etc/kubernetes/pki/sa.pub root:root 0644
/etc/kubernetes/pki/ca.key root:root 0600
/etc/kubernetes/pki/apiserver.key root:root 0600
/etc/kubernetes/pki/apiserver-kubelet-client.key root:root 0600
/etc/kubernetes/pki/front-proxy-ca.key root:root 0600
/etc/kubernetes/pki/front-proxy-client.key root:root 0600
/etc/kubernetes/pki/sa.key root:root 0600
/etc/kubernetes/pki/etcd root:root 0755
/etc/kubernetes/pki/etcd/ca.crt root:root 0644
/etc/kubernetes/pki/etcd/server.crt root:root 0644
/etc/kubernetes/pki/etcd/peer.crt root:root 0644
/etc/kubernetes/pki/etcd/healthcheck-client.crt root:root 0644
/etc/kubernetes/pki/etcd/ca.key root:root 0600
/etc/kubernetes/pki/etcd/server.key root:root 0600
/etc/kubernetes/pki/etcd/peer.key root:root 0600
/etc/kubernetes/pki/etcd/healthcheck-client.key root:root 0600

Update Manifests

During the kubeadm upgrade, the current control plane manifests are moved from /etc/kubernetes/manifests into /etc/kubernetes/tmp and new manifest files deployed. There are multiple settings and permissions that need to be reviewed and updated before the task is considered completed.

The kubeadm-config configmap has been updated to point to bldr0cuomrepo1.internal.pri:5000 however it and the various container configurationsshould be checked anyway. One of the issues is if it’s not updated or used, you’ll have to make the update manually including manually editing the kube-proxy daemonset configuration.

Note that when a manifest is updated, the associated image is reloaded. No need to manage the pods once manifests are updated.

etcd Manifest

Verify and update etcd.yaml

  • Change imagePullPolicy to Always
  • Change image switching g8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

kube-apiserver Manifest

Verify and update kube-apiserver.yaml

  • Add AlwaysPullImages and ResourceQuota admission controllers to the –enable-admission-plugins line
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

kube-controller-manager Manifest

Verify and update kube-controller-manager.yaml

  • Add ” – –cluster-name=kubecluster-[site]” after ” – –cluster-cidr=192.168.0.0/16″
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io to bldr0cuomrepo1.internal.pri:5000

kube-scheduler Manifest

Verify and update kube-scheduler.yaml

  • Change imagePullPolicy to Always
  • Change image switching k8s,gcr.io to bldr0cuomrepo1.internal.pri:5000

Update kube-proxy

You’ll need to edit the kube-proxy daemonset to change the imagePullPolicy. Check the image tag at the same time.

$ kubectl edit daemonset kube-proxy -n kube-system
  • Change imagePullPolicy to Always.
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Update coredns

You” need to edit the coredns deployment to change the imagePullPolizy. Check the image tag at the same time.

$ kubectl edit deployment coredns -n kube-system
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io to bldr0cuomrepo1.internal.pri:5000

Save the changes

Restart kubelet

Once done, kubelet and docker needs to be restarted on all nodes.

systemctl daemon-reload
systemctl restart kubelet
systemctl restart docker

Verify

Once kubelet has been restarted on all nodes, verify all nodes are at 1.18.8.

$ kubectl get nodes
NAME                          STATUS   ROLES    AGE    VERSION
ndld0cuomkube1.intrado.sqa    Ready    master   259d   v1.18.8
ndld0cuomkube2.intrado.sqa    Ready    master   259d   v1.18.8
ndld0cuomkube3.intrado.sqa    Ready    master   259d   v1.18.8
ndld0cuomknode1.intrado.sqa   Ready    <none>   259d   v1.18.8
ndld0cuomknode2.intrado.sqa   Ready    <none>   259d   v1.18.8
ndld0cuomknode3.intrado.sqa   Ready    <none>   259d   v1.18.8

Configuration Upgrades

Configuration files are on the tool servers (lnmt1cuomtool11) in the /usr/local/admin/playbooks/cschelin/kubernetes/configurations directory and the expectation is you’ll be in that directory when directed to apply configurations.

Calico Upgrade

In the calico directory, run the following command:

$ kubectl apply -f calico.yaml
configmap/calico-config configured
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.io/calico-node unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
daemonset.apps/calico-node configured
serviceaccount/calico-node unchanged
deployment.apps/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged

After calico is applied, the calico-kube-controllers pod will restart and then the calico-node pod restarts to retrieve the updated image.

Pull the calicoctl binary and copy it to /usr/local/bin, then verify the version. Note that this has likely already been done on the tool server. Verify it before pulling the binary.

$ curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.16.0/calicoctl

Verification

Verify the permissions of the files once the upgrade is complete.

Path or Fileuser:groupPermissions
/etc/cni/net.d/10-calico-conflist root:root0644
/etc/cni/net.d/calico-kubeconfig root:root0644

metrics-server Upgrade

In the metrics-server directory, run the following command:

$ kubectl apply -f components.yaml
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
serviceaccount/metrics-server unchanged
deployment.apps/metrics-server configured
service/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged

Once the metrics-server deployment has been updated, the pod will restart.

kube-state-metrics Upgrade

In the kube-state-metrics directory, run the following command:

$ kubectl apply -f .
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics configured
clusterrole.rbac.authorization.k8s.io/kube-state-metrics configured
deployment.apps/kube-state-metrics configured
serviceaccount/kube-state-metrics configured
service/kube-state-metrics configured

Once the kube-state-metrics deployment is updated, the pod will restart.

Filebeat Upgrade

Filebeat uses Elastic Stack clusters in four environments. Filebeat itself is installed on all clusters. Ensure you’re managing the correct cluster when upgrading the filebeat container as configurations are specific to each cluster.

Change to the appropriate cluster context directory and run the following command:

$ kubectl apply -f filebeat-kubernetes.yaml

Verification

Essentially monitor each cluster. You should see the filebeat containers restarting and returning to a Running state.

$ kubectl get pods -n monitoring -o wide
Posted in Computers, Kubernetes | Tagged | Leave a comment

Kubernetes Ansible Upgrade to 1.18.8

Upgrading Kubernetes Clusters

This document provides a guide to upgrading the Kubernetes clusters in the quickest manner. Much of the upgrade process can be done using Ansible Playbooks. There are a few processes that need to be done centrally on the tool server. And the OS and control plane updates are also manual in part due to the requirement to manually remove servers from the Kubernetes API pool.

In most cases, examples are not provided as it is assumed that you are familiar with the processes and can perform the updates without having to be reminded of how to verify.

For any process that is performed with an Ansible Playbook, it is assumed you are on the lnmt1cuomtool11 server in the /usr/local/admin/playbooks/cschelin/kubernetes directory. All Ansible related steps expect to start from that directory. In addition, the application of pod configurations will be in the configurations subdirectory.

Perform Upgrades

Patch Servers

Patch the control plane master servers one at a time and esure the cluster is healthy before continuing to the second and third master servers.

Drain each worker prior to patching and rebooting the worker node.

$ kubectl drain [nodename] --delete-local-data --ignore-daemonsets

Patch the server and reboot

yum upgrade -y
shutdown -t 0 now -r

Rejoin the worker node to the pool.

kubectl uncordon [nodename]

Update Versionlock And Components

In the upgrade directory, run the update -t [tag] script. This will install yum-plugin-versionlock if missing, remove the old versionlocks, create new versionlocks for kubernetes, kubernetes-cni, and docker, and then the components will be upgraded.

Upgrade Kubernetes

Using the kubeadm command on the first master server, upgrade the first master server.

# kubeadm upgrade apply 1.18.8

Upgrade Control Planes

On the second and third master, run the kubeadm upgrade apply 1.18.8 command and the control plane will be upgraded.

Update kube-proxy

Check the kube-proxy daemonset and update the image tag if required.

$ kubectl edit daemonset kube-proxy -n kube-system
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes

Update coredns

Check the coredns-deployment and update the image tag if required.

$ kubectl edit deployment corednss -n kube-system
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Restart kubelet and docker

In the restart directory, run the update -t [tag] script. This will restart kubelet and docker on all servers.

Calico Upgrade

In the configurations/calico directory, run the following command:

$ kubectl apply -f calico.yaml

calicoctl Upgrade

Pull the updated calicoctl binary and copy it to /usr/local/bin.

$ curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.16.0/calicoctl

Update File and Directory Permissions and Manifests

In the postinstall directory, run the update -t [tag] script. This will perform the following steps.

  • Add the cluster-name to the kube-controller-manager.yaml file
  • Update the imagePullPolicy and image lines to all manifests
  • Add the AlwaysPullImages and ResourceQuota admission controllers to the kube-apiserver.yaml file.
  • Update the permissions of all files and directories.

Filebeat Upgrade

In the configurations directory, change to the appropriate cluster context directory, bldr0-0, cabo0-0, tato0-1, and lnmt1-2 and run the following command.

$ kubectl apply -f filebeat-kubernetes.yaml
Posted in Computers, Kubernetes | Tagged | Leave a comment

Kubernetes Preparation Steps For 1.18.8

Upgrading Kubernetes Clusters

The purpose of the document is to provide the background information on what is being upgraded, what versions, and the steps required to prepare for the upgrade itself. These steps are only done once. Once all these steps have been completed and all the configurations checked into gitlab, all clusters are then ready to be upgraded.

Upgrade Preparation Steps

Upgrades to the sandbox environment are done a few weeks before the official release for more in depth testing. Checking the release docs, changelog, and general operational status for the various tools that are in use.

Sever Preparations

With the possibility of an upgrade to Spacewalk and to ensure the necessary software is installed prior to the upgrade, make sure all repositories are enabled and that the yum-plugin-versionlock software is installed.

Enable Repositories

Check the Spacewalk configuration and ensure that upgrades are coming from the local server and not from the internet.

Install yum versionlock

The critical components of Kubernetes are locked into place using the versionlock yum plugin. If not already installed, install it before beginning work.

# yum install yum-plugin-versionlock -y

Software Preparations

This section describes the updates that need to be made to the various containers that are installed in the Kubernetes clusters. Most of the changes involve updating the location to point to the local Docker repository vs pulling directly from the internet.

Ansible Playbooks

This section isn’t going to be instructions on setting up or using Ansible Playbooks. The updates to the various configurations are also saved with the Ansible Playbooks repo. You’ll make the appropriate changes to the updated configuration files and then push them back up to the gitlab server.

Update calico.yaml

In the calico directory, run the following command to get the current calico.yaml file.

$ curl https://docs.projectcalico.org/manifests/calico.yaml -O

Edit the file, search for image: and insert in front of calico, the path to the local repository.

bldr0cuomrepo1.internal.pri:5000/

Make sure you follow the documentation to update calicoctl to 3.16.0.

Update metrics-server

In the metrics-server directory, run the following command to get the current components.yaml file:

$ wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml

Edit the file, search for image: and replace k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000/

Update kube-state-metrics

Updating kube-state-metrics is a bit more involved as there are several files that are part of the distribution, however you only need a small subset. You’ll need to clone or if you already have it, pull the kube-state-metrics repo.

$ git clone https://github.com/kubernetes/kube-state-metrics.git

Once you have the repo, in the kube-state-metrics/examples/standard directory, copy all the files into the playbooks kube-state-metrics directory.

Edit the deployment.yaml file and replace quay.io with bldr0cuomrepo1.internal.pri:5000/

Update filebeat-kubernetes.yaml

In the filebeat directory, run the following command to get the current filebeat-kubernetes.yaml file:

$ curl -L -O https://raw.githubusercontent.com/elastic/beats/7.9/deploy/kubernetes/filebeat-kubernetes.yaml

Change all references in the filebeat-kubernetes.yaml file from kube-system to monitoring. If a new installation, create the monitoring namespace.

Then copy the file into each of the cluster directories and make the following changes.

DaemonSet Changes

In the DaemonSet section, replace the image location docker.elastic.co/filebeat:7.9.2 with bldr0cuomrepo1.internal.pri:5000/beats/filebeat:7.9.2. This pulls the image from our local repository vs from the Internet.

In order for the search and replace script to work the best, make the following changes:

        - name: ELASTICSEARCH_HOST
          value: "<elasticsearch>"
        - name: ELASTICSEARCH_PORT
          value: "9200"
        - name: ELASTICSEARCH_USERNAME
          value: ""
        - name: ELASTICSEARCH_PASSWORD
          value: ""

In addition, remove the following lines. They confuse the container if they exist.

        - name: ELASTIC_CLOUD_ID
          value:
        - name: ELASTIC_CLOUD_AUTH
          value:

Add the default username and password to the following lines as noted:

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME:elastic}
      password: ${ELASTICSEARCH_PASSWORD:changeme}

ConfigMap Changes

In the ConfigMap section, activate the filebeat.autodiscover section by uncommenting it and delete the filebeat.inputs configuration section. In the filebeat.autodiscover section, make the following three changes as noted with comments.

filebeat.autodiscover:
  providers:
    - type: kubernetes
      host: ${NODE_NAME}                          # rename node to host
      hints.enabled: true
      hints.default_config.enabled: false         # add this line
      hints.default_config:
        type: container
        paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log
        exclude_lines: ["^\\s+[\\-`('.|_]"]  # drop asciiart lines  # add this line

In the processors section, remove the cloud.id and cloud.auth lines, add the following uncommented lines, and change DEPLOY_ENV to the environment filebeat is being deployed to: dev, sqa, staging, or prod.

# Add deployment environment field to every event to make it easier to sort between Dev and SQA logs.
# DEPLOY_ENV values: dev, sqa, staging, or prod
   - add_fields:
       target: ''
       fields:
         environment: 'DEPLOY_ENV'

Elastic Stack in Dev and QA

This Elastic Stack cluster is used by the Development and QA Kubernetes clusters. Update the files in the bldr0-0 and cabo0-0 subdirectories.

- name: ELASTICSEARCH_HOST
  value: bldr0cuomemstr1.internal.pri

Elastic Stack in Staging

This Elastic Stack cluster is used by the Staging Kubernetes cluster. Update the files in the tato0-1 subdirectory.

- name: ELASTICSEARCH_HOST
  value: tato0cuomelkmstr1.internal.pri

Elastic Stack in Production

This Elastic Stack cluster is used by the Production Kubernetes Cluster. Update the file in the lnmt1-2 subdirectory.

- name: ELASTICSEARCH_HOST
  value: lnmt1cuomelkmstr1.internal.pri
Posted in Computers, Kubernetes | Tagged | Leave a comment

Kubernetes Upgrade to 1.18.8

Upgrading Kubernetes Clusters

The following lists what software and pods will be upgraded during this quarter.

  • Upgrade the Operating System
  • Upgrade Kubernetes
    • Upgrade kudeadm, kubectl, and kubelet RPMs from 1.17.6 to 1.18.8.
    • Upgrade kubernetes-cni RPM from 0.7.5-0 to 0.8.6-0.
    • kube-apiserver is upgraded from 1.17.6 to 1.18.8.
    • kube-controller-manager is upgraded from 1.17.6 to 1.18.8.
    • kube-scheduler is upgraded from 1.17.6 to 1.18.8.
    • kube-proxy is upgraded from 1.17.6 to 1.18.8.
    • coredns is upgraded from 1.6.5 to 1.6.7.
    • etcd maintains at the current version of 3.4.3-0.
  • Upgrade Calico from 3.14.1 to 3.16.0.
  • Upgrade Filebeat from 7.8.0 to 7.9.2.
  • Upgrade docker from 1.3.1-161 to 1.13.1-162.
  • metrics-servers is upgraded from 0.3.6 to 0.3.7.
  • kube-state-metrics isupgrade from 1.9.5 to 1.9.7.

Unchanged Products

There are no unchanged products this quarter.

Upgrade Notes

The following notes provide information on what changes might be affecting users of the clusters when upgrading from one version to the next. The notes I’m adding reflect what I think relevant to the environment so no notes on Azure or OpenShift will be listed. For more detailss, click on the provided links. If something is found that might be relevant, please respond and I’ll check it out and add it in.

Kubernetes Core

The following notes will reflect changes that might be relevant between the currently installed 1.17.6 up through 1.18.8, the target upgrade for Q4. While I’m working to not miss something, if we’re not sure, check the links to see if any changes apply to your product or project.

  • 1.17.7 – kubernetes-cni upgraded to 0.8.6.
  • 1.17.8 – Nothing of interest. Note that there’s a 1.17.8-rc1 as well.
  • 1.17.9 – Privilege escalation patch: CVE-2020-8559. DOS patch: CVE-2020-8557.
  • 1.17.10 – Do not use this release; artifacts are not complete.
  • 1.17.11 – A note that Kubernetes is built with go 1.13.15. No other updates.
  • 1.18.0 – Lots of notes as always. Most are cloud specific (Azure mainly). Some interesting bits though:
    • kubectl debug command added, permits the creation of a sidecar in a pod to assist with troubleshooting a problematic container.
    • IPv6 support is now beta in 1.18.
    • Deprecated APIs
      • apps/v1beta1, apps/v1beta2 – apps/v1
      • daemonsets, deployments, replicates under extensions/v1beta1 – use apps/v1
    • New IngressClass resource added to enable better Ingress configuration
    • autoscaling/v2beta2 HPA added spec.behavior
    • startupProbe (beta) for slow starting containers.
  • 1.18.1 – Nothing much to note
  • 1.18.2 – Fix conversion error for HPA objects with invalid annotations
  • 1.18.3 – init containers are now considered for calculation of resource requests when scheduling
  • 1.18.4 – kubernetes-cni upgraded to 0.8.6
  • 1.18.5 – Nothing of interest. Note there’s a 1.18.5-rc1 as well.
  • 1.18.6 – Privilege escalation patch; CVE-2020-8559. DOS patch; CVE-2020-8557.
  • 1.18.7 – Do not use this release; artifacts are not complete.
  • 1.18.8 – Kubernetes now built with go 1.13.15. Nothing else.

kubernetes-cni

Still search for release notes for the upgrade from 0.7.5 to 0.8.6.

coredns

  • 1.6.6 – Mainly a fix for DNS Flag Day 2020, the bufsize plugin. A fix related to CVE-2019-19794.
  • 1.6.7 – Adding an expiration jitter. Resolve TXT records via CNAME.

Calico

The major release notes are on a single page. Versions noted here to describe the upgrade for each version. For example, 3.14.1 and 3.14.2 both point to the 3.14 Release Notes. Here I’m describing the changes, if relevant, between the .0, .1, and .2 releases.

Note that currently many features of Calico haven’t been implemented yet so improvements, changes, and fixes for Calico probably don’t impact the current clusters.

  • 3.14.1 – Fix CVE-2020-13597 – IPv6 rogue router advertisement vulnerability. Added port 6443 to failsafe ports.
  • 3.14.2 – Remove unnecessary packages from cni-plugin and pod2daemon images.
  • 3.15.0 – WireGuard enabled to secure on the wire in-cluster pod traffic. The ability to migrate key/store data from etcd to use the kube-apiserver.
  • 3.15.1 – Fix service IP advertisement breaking host service connectivity.
  • 3.15.2 – Add monitor-addresses option to calico-node to continually monitor IP addresses. Handle CNI plugin panics more gracefully. Remove unnecessary packages from cni-plugin and pod2daemon images to address CVEs.
  • 3.16.0 – Supports eBPF which is a RH8.2 product (future info not currently available to my clusters. Removed more unnecessary packages from pod2daemon image.

Filebeat

  • 7.8.1 – Corrected base64 encoding of the monitoring.elasticsearch.api_key. Added support for timezone offsets.
  • 7.9.0 – Fixed handling for Kubernetes Update and Delete watcher events. Fixed memory leak in tcp and unix input sources. Fixed file ownership in docker images so they can be used in a secure environment. Logstash module can automatically detect the log format and process accordingly.
  • 7.9.1 – Nothing really jumped out as relevant.
  • 7.9.2 – Nothing in the release notes yet.

docker

This release is related to a CVE to address a vulnerability in 1.13.1-108.

metrics-server

  • 0.3.7 – New image location. Image run as a non-root user. Single file now vs a group of files (components.yaml).

kube-state-metrics

Like Calico, the CHANGELOG is a single file. The different bullet points point to the same file, but describe the changes if relevant.

  • 1.9.6 – Just a single change related to an API mismatch.
  • 1.9.7 – Switched an apiVersion to v1 for the mutatingwebhookconfiguration file.

References

Posted in Computers, Kubernetes | Tagged | Leave a comment

Cinnamon Buns

I tried using the recipe on the website but there were so many ads making constant changes to the webpage that it was impossible to stay where the instructions were. As such, I’m copying the basic instructions here and I’ll use it for the baking attempt.

Dough

  • 1 cup warm milk
  • 2 1/2 teaspoons instant dry yeast
  • 2 large eggs at room temperature
  • 1/3 cup of salted butter (softened)
  • 4 1/2 cups all-purpose flour
  • 1 teaspoon salt
  • 1/2 cup granulated sugar
  1. Pour the warm milk in the bowl of a stand mixer and sprinkle the yeast over the top.
  2. Add the eggs, butter, salt, and sugar
  3. Add in 4 cups of the flour and mix using the beater blade just until the ingredients are barley combined. Allow the mixture to rest for 5 minutes for the ingredients to soak together.
  4. Scrape the dough off of the beater blade and remove it. Attach the dough hook.
  5. Beat the dough on medium speed, adding in up to 1/2 cup more flour if needed to form a dough. Knead for up to 7 minutes until the dough is elastic and smooth. The dough should be a little tacky and still sticking to the side of the bowl. Don’t add too much flour though.
  6. Spray a large bowl with cooking spray.
  7. Use a rubber spatula to remove the dough from the mixer bowl and place it in the greased large bowl.
  8. Cover the bowl with a towel or wax paper.
  9. Set the bowl in a warm place and allow the dough to rise until double. A good place might to be start the oven off at a low setting, 100* for example, turn it off when it’s warm, and then put the bowl into the oven. Figure about 30 minutes for the dough to rise.
  10. When ready, put the dough on a well floured pastry mat or parchment paper and sprinkle more flour on the dough.
  11. Flour up a rolling pin and spread the dough out. It should be about 2′ by 1 1/2′ when done.
  12. Smooth the filling evenly over the rectangle.
  13. Roll the dough up starting on the long, 2′ end.
  14. Cut into 12 pieces and place in a greased baking pan.
  15. Cover the pan and let the rolls rise for 20 minutes or so.
  16. Preheat the oven to 375 degrees.
  17. Pour 1/2 cup of heavy cream over the risen rolls.
  18. Bake for 20-22 minutes or until the rolls are golden brown and the center cooked.
  19. Allow the rolls to cool.
  20. Spread the frosting over the rolls.

Filling

Simple enough. Combine the three ingredients in a bowl and mix until well combined.

  • 1/2 cup of salted butter (almost melted)
  • 1 cup packed brown sugar
  • 2 tablespoons of cinnamon

Frosting

  • 6 ounces of cream cheese (softened)
  • 1/3 cup salted butter (softened)
  • 2 cups of powdered sugar
  • 1/2 tablespoon of vanilla or maple extract
  1. Combine cream cheese and salted butter. Blend well.
  2. Add the powdered sugar and extract.

Posted in Cooking | Tagged | Leave a comment

Kubernetes Pod Schedule Prioritization

Introduction

Currently Kubernetes is not configured to treat any pod as more or less important than any other pod with the exception of critical Kubernetes pods such as the kube-apiserver, kube-scheduler, and kube-controller-manager.

Multiple products with different Service Class requirements are hosted on Kubernetes but there is no configuration that provides any prioritization of these products.

The research goal is to identify a process or configuration which would let the Applications and Operations teams identify and ensure their products have priority when using cluster resources. For example, in the event of an unintentional failure such as a worker node failure, or an intentional failure such as removing a worker node from a cluster pool for maintenance.

A secondary goal is to determine if overcommitting the Kubernetes clusters is a viable solution to resource availability.

As always, this is a summation that generally applies to my environment. For full details, links to documents are provided at the end of this document.

Service Class

Service Class is used to define service availability. This is not relevant to individual components of a product but of the overall service itself. This is a list of Service Class definitions.

  • Mission Critical Service (MCS) – 99.999% up-time.
  • Business Critical Service (BCS) – 99.9% up-time.
  • Business Essential Service (BES) – 99% up-time.
  • Business Support Service (BSS) – 98% up-time.
  • Unsupported Business Service (UBS) – No guaranteed service up-time
  • LAB – No guaranteed service up-time.

Note that the PriorityClass design does not ensure the hosted Product satisfies the contracted Service Class. PriorityClass Objects ensures that resources are available to more critical Products should there be resource exhaustion due to overcommitment or worker node failure.

PriorityClass Objects

Kubernetes as of version 1.14 has introduced PriorityClass Objects. This object lets us assign a resource priority to a pod that lets a pod jump ahead in the scheduling queue.

  • 2,000,001,000 – This is used for critical pods running on Kubernetes nodes (system-mode-critical).
  • 2,000,000,000 – This is used for critical pods which manage Kubernetes clusters (system-cluster-critical)
  • 1,000,000,000 – This level and lower is available for any product to use.
  • 0 – This is the default level for all non-critical pods.
Linux:cschelin@lnmt1cuomtool11$  kubectl get priorityclasses -A
NAME                      VALUE        GLOBAL-DEFAULT   AGE
system-cluster-critical   2000000000   false            22d
system-node-critical      2000001000   false            22d

system-node-critical Object

The following pods are assigned to the system-node-critical Object.

  • calico-node
  • kube-proxy

system-cluster-critical Object

The following pods are assigned to the system-cluster-critical Object.

  • calico-kube-controllers
  • coredns
  • etcd
  • kube-apiserver
  • kube-controller-manager
  • kube-scheduler

PriorityClass Definitions

A PriorityClass Object lets us define a set of values which can be used by applications in order to ensure availability based on Service Class. See the below recommendations to be configured for the Kubernetes environments.

  • 7,000,000 – Critical Infrastructure Service
  • 6,000,000 – Mission Critical Service
  • 5,000,000 – Infrastructure Service
  • 4,000,000 – Business Critical Plus Service (a product that requires 99.99% up-time)
  • 3,000,000 – Business Critical Service
  • 2,000,000 – Business Essential Service
  • 1,000,000 – Business Support Service
  • 500,000 – Unsupported Business Service and LAB Services (global default)

Most of the items in the list are well know Service Class definitions. For the ones that I’ve added, additional details follow.

Critical Infrastructure Service

Any pod that is used by any or all other pods in the cluster. Especially if the pod is used by a MCS product.

Infrastructure Service

Standard infrastructure pods such as kube-state-metrics and the metrics-server pods. This includes other services such as Prometheus and Filebeat.

Business Critical Plus Service

Currently there is no 4 9’s Service Class defined however some products have been deployed as requiring 4 9’s support. For this reason, a PriorityClass Object was created to satisfy that Service Class request.

Testing

In testing:

  1. MCS pods in a deployment will run as long as resources are available.
  2. If there are not enough resources for the lower PriorityClass deployments, pods will be started until resources are exhausted. Remaining pods will be put in a Pending state.
  3. If additional MCS pods need to start, lower PriorityClass pods will be Terminated. New pods will start and remain in a Pending state.
  4. Once the additional MCS pods are not needed, they will be deleted and any Pending pods will start.
  5. For multiple MCS deployments there is no PriorityClass priority. If there are unsufficient resources for all MCS pods to start, then any remaining MCS pods will be put in a Pending state.
  6. If any lower PriorityClass pods has sufficient resources to start where a higher PriorityClass pod is unable to start, the lower PriorityClass pod will start.

Pod Premption

There is a PriorityClass option called preemptionPolicy which has been made available in Kubernetes 1.15. This option lets you configure a PriorityClass to not evict pods of a lower PriorityClass. The option moves pods up in the scheduling queue, however it doesn’t evict pods if cluster resources are running low.

PodDisruptionBudget

This Object lets you specific the number of pods that must remain running. However, in testing this doesn’t appear to apply to PriorityClass evictions. If there is insufficient resources, pods in a lower PriorityClass will be evicted regardless of this setting. It will prevent a voluntary failure such as draining a worker node if there aren’t sufficient remaining pods.

Configuration Settings

For Deployments, you’d add the below defined name as a spec.priorityClassName: [name].

The following configurations are recommended for the environment.

Critical Infrastructure Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-infrastructure
value: 7000000
globalDefault: false
description: "This priority class is reserved for infrastructure services that all pods use."

Mission Critical Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: mission-critical
value: 6000000
globalDefault: false
description: "This priority class is reserved for services that require 99.999% uptime."

Infrastructure Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: infrastructure
value: 5000000
globalDefault: false
description: "This priority class is reserved for infrastructure services."

Business Critical Plus Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-critical-plus
value: 4000000
globalDefault: false
description: "This priority class is reserved for services that require 99.99% uptime."

Business Critical Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-critical
value: 3000000
globalDefault: false
description: "This priority class is reserved for services that require 99.9% uptime."

Business Essential Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-essential
value: 2000000
globalDefault: false
description: "This priority class is reserved for services that require 99% uptime."

Business Support Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-support
value: 1000000
globalDefault: false
description: "This priority class is reserved for services that require 98% uptime."

Unsupported Business Service

Note the globalDefault setting here defining any pod that fails to set a PriorityClass in their Deployments.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: unsupported-business
value: 500000
globalDefault: true
description: "This priority class is reserved for services that have no uptime requirements."

PriorityClass Object Table

Linux:cschelin@lnmt1cuomtool11$ kubectl get pc -A
NAME                              VALUE        GLOBAL-DEFAULT   AGE
business-critical                 3000000      false            3d9h
business-critical-plus            4000000      false            3d9h
business-essential                2000000      false            3d9h
business-support                  1000000      false            3d9h
critical-infrastructure           7000000      false            3s
infrastructure                    5000000      false            6s
mission-critical                  6000000      false            14s
system-cluster-critical           2000000000   false            25d
system-node-critical              2000001000   false            25d
unsupported-business              500000       true             3d9h

Conclusion

The above recommendations provide a reliable way of ensuring critical products that are deployed to Kubernetes will have the necessary resources to respond appropriately to requests.

In order to prevent service disruption, ensure any deployed product doesn’t consume more resources than the minimum required for all deployed products.

This might also permit overcommitting resources in the clusters.

References

Posted in Computers, Kubernetes | Tagged | Leave a comment