FreeIPA/Red Hat IDM

I’m working on bringing my 100+ servers under FreeIPA, aka a centralized Identity Management system. Since FreeIPA is an upstream source for Red Hat IDM, I added it to the title.

Installing FreeIPA on servers is bog simple. Run

# ipa-client-install
WARNING: ntpd time&date synchronization service will not be configured as
conflicting service (chronyd) is enabled
Use --force-ntpd option to disable it and force configuration of ntpd

Discovery was successful!
Client hostname: bldr0cuomshift.internal.pri
Realm: INTERNAL.PRI
DNS Domain: internal.pri
IPA Server: lnmt1cuomifidm1.internal.pri
BaseDN: dc=internal,dc=pri

Continue to configure the system with these values? [no]: yes
Skipping synchronizing time with NTP server.
User authorized to enroll computers: admin
Password for admin@INTERNAL.PRI:
Successfully retrieved CA cert
    Subject:     CN=Certificate Authority,O=INTERNAL.PRI
    Issuer:      CN=Certificate Authority,O=INTERNAL.PRI
    Valid From:  2020-06-27 03:52:06
    Valid Until: 2040-06-27 03:52:06

Enrolled in IPA realm INTERNAL.PRI
Created /etc/ipa/default.conf
New SSSD config will be created
Configured sudoers in /etc/nsswitch.conf
Configured /etc/sssd/sssd.conf
Configured /etc/krb5.conf for IPA realm INTERNAL.PRI
trying https://lnmt1cuomifidm1.internal.pri/ipa/json
[try 1]: Forwarding 'schema' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/json'
trying https://lnmt1cuomifidm1.internal.pri/ipa/session/json
[try 1]: Forwarding 'ping' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/session/json'
[try 1]: Forwarding 'ca_is_enabled' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/session/json'
Systemwide CA database updated.
Adding SSH public key from /etc/ssh/ssh_host_rsa_key.pub
Adding SSH public key from /etc/ssh/ssh_host_ed25519_key.pub
Adding SSH public key from /etc/ssh/ssh_host_ecdsa_key.pub
[try 1]: Forwarding 'host_mod' to json server 'https://lnmt1cuomifidm1.internal.pri/ipa/session/json'
Could not update DNS SSHFP records.
SSSD enabled
Configured /etc/openldap/ldap.conf
Configured /etc/ssh/ssh_config
Configured /etc/ssh/sshd_config
Configuring internal.pri as NIS domain.
Client configuration complete.
The ipa-client-install command was successful

Then I migrate local accounts over to use IDM instead. This has been working just fine on CentOS and Red Hat 7. The script I use was running:

# getent -s sss passwd [account]

This returns only accounts that are managed in IDM. So I can then change the file ownerships and group ownerships of files the local account owns.

Issue though. With CentOS 8 (and I assume Red Hat 8), the command returns information for local non-IDM accounts. This is unexpected behavior and breaks my script. Not killer of course but it does mean I would have to manually identify local users and make sure they’re in IDM before trying to convert them. And the script deletes the local user which causes other problems if it deletes a non-IDM local user.

# getent -s sss passwd bin
bin:x:1:1:bin:/bin:/sbin/nologin

This is unexpected behavior. With CentOS 7, this returns blank but with CentOS 8, this returns bin.

What happened is the sssd behavior changed. The enable_files_domain option under [sssd] in the /etc/sssd/sssd.conf file is set to false by default in CentOS 7 however in CentOS 8, the default is now true. This means local accounts are also cached by sssd and are returned when querying with getent.

After making the change, the following now happens as expected:

# getent -s sss passwd bin

And continue on with adding servers to IDM.

If you find you need to remove a system from IDM, first check /home to see what accounts exist and compare against /etc/passwd. Basically you need to change the permissions for every file to match the new user that you’ll create.

cd /home
ls -l
ipa-client-install --uninstall
find / -gid [id] -print
find / -uid [id] -print
useradd -c "comment" -d /home/[homedir] -s /bin/ksh -m [username]
passwd [username]
Posted in Computers, FreeIPA | Tagged , , , | Leave a comment

Kubernetes Manual Upgrade to 1.18.8

Upgrading Kubernetes Clusters

This documentation is intended to provide the manual process for upgrading the server Operating Systems, Kubernetes to 1.18.8, and any additional upgrades. This provides example output and should help in troubleshooting should the automated processes experience a problem.

All of the steps required to prepare for an installation should be completed prior to starting this process.

Server and Kubernetes Upgrades

Patch Servers

As part of quarterly upgrades, the Operating Systems for all servers need to be upgraded.

For the control plane, there isn’t a “pool” so just patch each server and reboot it. Do one server at a time and check the status of the cluster before moving to subsequent master servers in the control plane.

For the worker nodes, you’ll need to drain each of the workers before patching and rebooting. Run the following command to both confirm the current version of 1.17.6 and that all nodes are in a Ready state to be patched:

$ kubectl get nodes
NAME                           STATUS   ROLES    AGE    VERSION
ndld0cuomkube1.internal.pri    Ready    master   259d   v1.17.6
ndld0cuomkube2.internal.pri    Ready    master   259d   v1.17.6
ndld0cuomkube3.internal.pri    Ready    master   259d   v1.17.6
ndld0cuomknode1.internal.pri   Ready    <none>   259d   v1.17.6
ndld0cuomknode2.internal.pri   Ready    <none>   259d   v1.17.6
ndld0cuomknode3.internal.pri   Ready    <none>   259d   v1.17.6

To drain a server, patch, and then return the server to the pool, follow the steps below:

$ kubectl drain [nodename] --delete-local-data --ignore-daemonsets

Then patch the server and reboot:

# yum upgrade -y
# shutdown -t 0 now -r

Finally bring the node back into the pool.

$ kubectl uncordon [nodename]

Update Versionlock Information

Currently the clusters have locked kubernetes to version 1.17.6, kubernetes-cni to version 0.7.5, and docker to 1.13.1-161. The locks on each server need to be removed and new locks put into place for the new version of kubernetes, kubernetes-cni, and docker where appropriate.

Versionlock file location: /etc/yum/pluginconf.d/

Simply delete the existing locks:

/usr/bin/yum versionlock delete "kubelet.*"
/usr/bin/yum versionlock delete "kubectl.*"
/usr/bin/yum versionlock delete "kubeadm.*"
/usr/bin/yum versionlock delete "kubernetes-cni.*"
/usr/bin/yum versionlock delete "docker.*"
/usr/bin/yum versionlock delete "docker-common.*"
/usr/bin/yum versionlock delete "docker-client.*"
/usr/bin/yum versionlock delete "docker-rhel-push-plugin.*"

And then add in the new locks at the desired levels:

/usr/bin/yum versionlock add "kubelet-1.18.8-0.*"
/usr/bin/yum versionlock add "kubectl-1.18.8-0.*"
/usr/bin/yum versionlock add "kubeadm-1.18.8-0.*"
/usr/bin/yum versionlock "docker-1.13.1-162.*"
/usr/bin/yum versionlock "docker-common-1.13.1-162.*"
/usr/bin/yum versionlock "docker-client-1.13.1-162.*"
/usr/bin/yum versionlock "docker-rhel-push-plugin-1.13.1-162.*"
/usr/bin/yum versionlock "kubernetes-cni-0.8.6-0.*"

Then install the updated kubernetes and docker binaries. Note that the versionlocked versions and the installed version must match:

/usr/bin/yum install kubelet-1.18.8-0.x86_64
/usr/bin/yum install kubectl-1.18.8-0.x86_64
/usr/bin/yum install kubeadm-1.18.8-0.x86_64
/usr/bin/yum install docker-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-common-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-client-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install docker-rhel-push-plugin-1.13.1-162.git64e9980.el7_8.x86_64
/usr/bin/yum install kubernetes-cni-0.8.6-0.x86_64

Upgrade Kubernetes

Using the kubeadm command on the first master server, you can review the plan and then upgrade the cluster:

[root@ndld0cuomkube1 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.17.6
[upgrade/versions] kubeadm version: v1.18.8
I0901 16:37:26.141057   32596 version.go:252] remote version is much newer: v1.19.0; falling back to: stable-1.18
[upgrade/versions] Latest stable version: v1.18.8
[upgrade/versions] Latest stable version: v1.18.8
[upgrade/versions] Latest version in the v1.17 series: v1.17.11
[upgrade/versions] Latest version in the v1.17 series: v1.17.11

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     9 x v1.17.6   v1.17.11

Upgrade to the latest version in the v1.17 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.17.6   v1.17.11
Controller Manager   v1.17.6   v1.17.11
Scheduler            v1.17.6   v1.17.11
Kube Proxy           v1.17.6   v1.17.11
CoreDNS              1.6.5     1.6.7
Etcd                 3.4.3     3.4.3-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.17.11

_____________________________________________________________________

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     9 x v1.17.6   v1.18.8

Upgrade to the latest stable version:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.17.6   v1.18.8
Controller Manager   v1.17.6   v1.18.8
Scheduler            v1.17.6   v1.18.8
Kube Proxy           v1.17.6   v1.18.8
CoreDNS              1.6.5     1.6.7
Etcd                 3.4.3     3.4.3-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.18.8

_____________________________________________________________________

There are likely newer versions of Kubernetes control plane containers available. In order to maintain consistency across all clusters, only upgrade the masters to 1.18.8.

[root@ndld0cuomkube1 ~]# kubeadm upgrade apply 1.18.8
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.18.8"
[upgrade/versions] Cluster version: v1.17.6
[upgrade/versions] kubeadm version: v1.18.8
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 3 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.8"...
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: bd6dbccfa412f07652db6f47485acd35
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 825ea808f14bdad0c2d98e038547c430
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: 1caf2ef6d0ddace3294395f89153cef9
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.8" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests766631209"
W0901 16:44:07.979317   10575 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: bd6dbccfa412f07652db6f47485acd35
Static pod: kube-apiserver-ndld0cuomkube1.internal.pri hash: 19eda19deaac25d2bb9327b8293ac498
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 825ea808f14bdad0c2d98e038547c430
Static pod: kube-controller-manager-ndld0cuomkube1.internal.pri hash: 9dda1d669f9a43cd117cb5cdf36b6582
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-09-01-16-44-07/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: 1caf2ef6d0ddace3294395f89153cef9
Static pod: kube-scheduler-ndld0cuomkube1.internal.pri hash: cb2a7e4997f70016b2a80ff8f1811ca8
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.18.8". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

Update Control Planes

On the second and third master, run the kubeadm upgrade apply 1.18.8 command and the control plane will be upgraded.

Update File and Directory Permissions

Verify the permissions match the table below once the upgrade is complete:

Path or Fileuser:groupPermissions
/etc/kubernetes/manifests/etcd.yaml root:root 0644
/etc/kubernetes/manifests/kube-apiserver.yaml 0644
/etc/kubernetes/manifests/kube-controller-manager.yaml root:root0644
/etc/kubernetes/manifests/kube-scheduler root:root 0644
/var/lib/etcd root:root 0700
/etc/kubernetes/admin.conf root:root 0644
/etc/kubernetes/scheduler.conf root:root 0644
/etc/kubernetes/controller-manager.conf root:root 0644
/etc/kubernetes/pki root:root 0755
/etc/kubernetes/pki/ca.crt root:root 0644
/etc/kubernetes/pki/apiserver.crt root:root 0644
/etc/kubernetes/pki/apiserver-kubelet-client.crt root:root 0644
/etc/kubernetes/pki/front-proxy-ca.crt root:root 0644
/etc/kubernetes/pki/front-proxy-client.crt root:root 0644
/etc/kubernetes/pki/sa.pub root:root 0644
/etc/kubernetes/pki/ca.key root:root 0600
/etc/kubernetes/pki/apiserver.key root:root 0600
/etc/kubernetes/pki/apiserver-kubelet-client.key root:root 0600
/etc/kubernetes/pki/front-proxy-ca.key root:root 0600
/etc/kubernetes/pki/front-proxy-client.key root:root 0600
/etc/kubernetes/pki/sa.key root:root 0600
/etc/kubernetes/pki/etcd root:root 0755
/etc/kubernetes/pki/etcd/ca.crt root:root 0644
/etc/kubernetes/pki/etcd/server.crt root:root 0644
/etc/kubernetes/pki/etcd/peer.crt root:root 0644
/etc/kubernetes/pki/etcd/healthcheck-client.crt root:root 0644
/etc/kubernetes/pki/etcd/ca.key root:root 0600
/etc/kubernetes/pki/etcd/server.key root:root 0600
/etc/kubernetes/pki/etcd/peer.key root:root 0600
/etc/kubernetes/pki/etcd/healthcheck-client.key root:root 0600

Update Manifests

During the kubeadm upgrade, the current control plane manifests are moved from /etc/kubernetes/manifests into /etc/kubernetes/tmp and new manifest files deployed. There are multiple settings and permissions that need to be reviewed and updated before the task is considered completed.

The kubeadm-config configmap has been updated to point to bldr0cuomrepo1.internal.pri:5000 however it and the various container configurationsshould be checked anyway. One of the issues is if it’s not updated or used, you’ll have to make the update manually including manually editing the kube-proxy daemonset configuration.

Note that when a manifest is updated, the associated image is reloaded. No need to manage the pods once manifests are updated.

etcd Manifest

Verify and update etcd.yaml

  • Change imagePullPolicy to Always
  • Change image switching g8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

kube-apiserver Manifest

Verify and update kube-apiserver.yaml

  • Add AlwaysPullImages and ResourceQuota admission controllers to the –enable-admission-plugins line
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

kube-controller-manager Manifest

Verify and update kube-controller-manager.yaml

  • Add ” – –cluster-name=kubecluster-[site]” after ” – –cluster-cidr=192.168.0.0/16″
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io to bldr0cuomrepo1.internal.pri:5000

kube-scheduler Manifest

Verify and update kube-scheduler.yaml

  • Change imagePullPolicy to Always
  • Change image switching k8s,gcr.io to bldr0cuomrepo1.internal.pri:5000

Update kube-proxy

You’ll need to edit the kube-proxy daemonset to change the imagePullPolicy. Check the image tag at the same time.

$ kubectl edit daemonset kube-proxy -n kube-system
  • Change imagePullPolicy to Always.
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Update coredns

You” need to edit the coredns deployment to change the imagePullPolizy. Check the image tag at the same time.

$ kubectl edit deployment coredns -n kube-system
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io to bldr0cuomrepo1.internal.pri:5000

Save the changes

Restart kubelet

Once done, kubelet and docker needs to be restarted on all nodes.

systemctl daemon-reload
systemctl restart kubelet
systemctl restart docker

Verify

Once kubelet has been restarted on all nodes, verify all nodes are at 1.18.8.

$ kubectl get nodes
NAME                          STATUS   ROLES    AGE    VERSION
ndld0cuomkube1.intrado.sqa    Ready    master   259d   v1.18.8
ndld0cuomkube2.intrado.sqa    Ready    master   259d   v1.18.8
ndld0cuomkube3.intrado.sqa    Ready    master   259d   v1.18.8
ndld0cuomknode1.intrado.sqa   Ready    <none>   259d   v1.18.8
ndld0cuomknode2.intrado.sqa   Ready    <none>   259d   v1.18.8
ndld0cuomknode3.intrado.sqa   Ready    <none>   259d   v1.18.8

Configuration Upgrades

Configuration files are on the tool servers (lnmt1cuomtool11) in the /usr/local/admin/playbooks/cschelin/kubernetes/configurations directory and the expectation is you’ll be in that directory when directed to apply configurations.

Calico Upgrade

In the calico directory, run the following command:

$ kubectl apply -f calico.yaml
configmap/calico-config configured
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.io/calico-node unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
daemonset.apps/calico-node configured
serviceaccount/calico-node unchanged
deployment.apps/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged

After calico is applied, the calico-kube-controllers pod will restart and then the calico-node pod restarts to retrieve the updated image.

Pull the calicoctl binary and copy it to /usr/local/bin, then verify the version. Note that this has likely already been done on the tool server. Verify it before pulling the binary.

$ curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.16.0/calicoctl

Verification

Verify the permissions of the files once the upgrade is complete.

Path or Fileuser:groupPermissions
/etc/cni/net.d/10-calico-conflist root:root0644
/etc/cni/net.d/calico-kubeconfig root:root0644

metrics-server Upgrade

In the metrics-server directory, run the following command:

$ kubectl apply -f components.yaml
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
serviceaccount/metrics-server unchanged
deployment.apps/metrics-server configured
service/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged

Once the metrics-server deployment has been updated, the pod will restart.

kube-state-metrics Upgrade

In the kube-state-metrics directory, run the following command:

$ kubectl apply -f .
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics configured
clusterrole.rbac.authorization.k8s.io/kube-state-metrics configured
deployment.apps/kube-state-metrics configured
serviceaccount/kube-state-metrics configured
service/kube-state-metrics configured

Once the kube-state-metrics deployment is updated, the pod will restart.

Filebeat Upgrade

Filebeat uses Elastic Stack clusters in four environments. Filebeat itself is installed on all clusters. Ensure you’re managing the correct cluster when upgrading the filebeat container as configurations are specific to each cluster.

Change to the appropriate cluster context directory and run the following command:

$ kubectl apply -f filebeat-kubernetes.yaml

Verification

Essentially monitor each cluster. You should see the filebeat containers restarting and returning to a Running state.

$ kubectl get pods -n monitoring -o wide
Posted in Computers, Kubernetes | Tagged | Leave a comment

Kubernetes Ansible Upgrade to 1.18.8

Upgrading Kubernetes Clusters

This document provides a guide to upgrading the Kubernetes clusters in the quickest manner. Much of the upgrade process can be done using Ansible Playbooks. There are a few processes that need to be done centrally on the tool server. And the OS and control plane updates are also manual in part due to the requirement to manually remove servers from the Kubernetes API pool.

In most cases, examples are not provided as it is assumed that you are familiar with the processes and can perform the updates without having to be reminded of how to verify.

For any process that is performed with an Ansible Playbook, it is assumed you are on the lnmt1cuomtool11 server in the /usr/local/admin/playbooks/cschelin/kubernetes directory. All Ansible related steps expect to start from that directory. In addition, the application of pod configurations will be in the configurations subdirectory.

Perform Upgrades

Patch Servers

Patch the control plane master servers one at a time and esure the cluster is healthy before continuing to the second and third master servers.

Drain each worker prior to patching and rebooting the worker node.

$ kubectl drain [nodename] --delete-local-data --ignore-daemonsets

Patch the server and reboot

yum upgrade -y
shutdown -t 0 now -r

Rejoin the worker node to the pool.

kubectl uncordon [nodename]

Update Versionlock And Components

In the upgrade directory, run the update -t [tag] script. This will install yum-plugin-versionlock if missing, remove the old versionlocks, create new versionlocks for kubernetes, kubernetes-cni, and docker, and then the components will be upgraded.

Upgrade Kubernetes

Using the kubeadm command on the first master server, upgrade the first master server.

# kubeadm upgrade apply 1.18.8

Upgrade Control Planes

On the second and third master, run the kubeadm upgrade apply 1.18.8 command and the control plane will be upgraded.

Update kube-proxy

Check the kube-proxy daemonset and update the image tag if required.

$ kubectl edit daemonset kube-proxy -n kube-system
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes

Update coredns

Check the coredns-deployment and update the image tag if required.

$ kubectl edit deployment corednss -n kube-system
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Restart kubelet and docker

In the restart directory, run the update -t [tag] script. This will restart kubelet and docker on all servers.

Calico Upgrade

In the configurations/calico directory, run the following command:

$ kubectl apply -f calico.yaml

calicoctl Upgrade

Pull the updated calicoctl binary and copy it to /usr/local/bin.

$ curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.16.0/calicoctl

Update File and Directory Permissions and Manifests

In the postinstall directory, run the update -t [tag] script. This will perform the following steps.

  • Add the cluster-name to the kube-controller-manager.yaml file
  • Update the imagePullPolicy and image lines to all manifests
  • Add the AlwaysPullImages and ResourceQuota admission controllers to the kube-apiserver.yaml file.
  • Update the permissions of all files and directories.

Filebeat Upgrade

In the configurations directory, change to the appropriate cluster context directory, bldr0-0, cabo0-0, tato0-1, and lnmt1-2 and run the following command.

$ kubectl apply -f filebeat-kubernetes.yaml
Posted in Computers, Kubernetes | Tagged | Leave a comment

Kubernetes Preparation Steps For 1.18.8

Upgrading Kubernetes Clusters

The purpose of the document is to provide the background information on what is being upgraded, what versions, and the steps required to prepare for the upgrade itself. These steps are only done once. Once all these steps have been completed and all the configurations checked into gitlab, all clusters are then ready to be upgraded.

Upgrade Preparation Steps

Upgrades to the sandbox environment are done a few weeks before the official release for more in depth testing. Checking the release docs, changelog, and general operational status for the various tools that are in use.

Sever Preparations

With the possibility of an upgrade to Spacewalk and to ensure the necessary software is installed prior to the upgrade, make sure all repositories are enabled and that the yum-plugin-versionlock software is installed.

Enable Repositories

Check the Spacewalk configuration and ensure that upgrades are coming from the local server and not from the internet.

Install yum versionlock

The critical components of Kubernetes are locked into place using the versionlock yum plugin. If not already installed, install it before beginning work.

# yum install yum-plugin-versionlock -y

Software Preparations

This section describes the updates that need to be made to the various containers that are installed in the Kubernetes clusters. Most of the changes involve updating the location to point to the local Docker repository vs pulling directly from the internet.

Ansible Playbooks

This section isn’t going to be instructions on setting up or using Ansible Playbooks. The updates to the various configurations are also saved with the Ansible Playbooks repo. You’ll make the appropriate changes to the updated configuration files and then push them back up to the gitlab server.

Update calico.yaml

In the calico directory, run the following command to get the current calico.yaml file.

$ curl https://docs.projectcalico.org/manifests/calico.yaml -O

Edit the file, search for image: and insert in front of calico, the path to the local repository.

bldr0cuomrepo1.internal.pri:5000/

Make sure you follow the documentation to update calicoctl to 3.16.0.

Update metrics-server

In the metrics-server directory, run the following command to get the current components.yaml file:

$ wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml

Edit the file, search for image: and replace k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000/

Update kube-state-metrics

Updating kube-state-metrics is a bit more involved as there are several files that are part of the distribution, however you only need a small subset. You’ll need to clone or if you already have it, pull the kube-state-metrics repo.

$ git clone https://github.com/kubernetes/kube-state-metrics.git

Once you have the repo, in the kube-state-metrics/examples/standard directory, copy all the files into the playbooks kube-state-metrics directory.

Edit the deployment.yaml file and replace quay.io with bldr0cuomrepo1.internal.pri:5000/

Update filebeat-kubernetes.yaml

In the filebeat directory, run the following command to get the current filebeat-kubernetes.yaml file:

$ curl -L -O https://raw.githubusercontent.com/elastic/beats/7.9/deploy/kubernetes/filebeat-kubernetes.yaml

Change all references in the filebeat-kubernetes.yaml file from kube-system to monitoring. If a new installation, create the monitoring namespace.

Then copy the file into each of the cluster directories and make the following changes.

DaemonSet Changes

In the DaemonSet section, replace the image location docker.elastic.co/filebeat:7.9.2 with bldr0cuomrepo1.internal.pri:5000/beats/filebeat:7.9.2. This pulls the image from our local repository vs from the Internet.

In order for the search and replace script to work the best, make the following changes:

        - name: ELASTICSEARCH_HOST
          value: "<elasticsearch>"
        - name: ELASTICSEARCH_PORT
          value: "9200"
        - name: ELASTICSEARCH_USERNAME
          value: ""
        - name: ELASTICSEARCH_PASSWORD
          value: ""

In addition, remove the following lines. They confuse the container if they exist.

        - name: ELASTIC_CLOUD_ID
          value:
        - name: ELASTIC_CLOUD_AUTH
          value:

Add the default username and password to the following lines as noted:

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME:elastic}
      password: ${ELASTICSEARCH_PASSWORD:changeme}

ConfigMap Changes

In the ConfigMap section, activate the filebeat.autodiscover section by uncommenting it and delete the filebeat.inputs configuration section. In the filebeat.autodiscover section, make the following three changes as noted with comments.

filebeat.autodiscover:
  providers:
    - type: kubernetes
      host: ${NODE_NAME}                          # rename node to host
      hints.enabled: true
      hints.default_config.enabled: false         # add this line
      hints.default_config:
        type: container
        paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log
        exclude_lines: ["^\\s+[\\-`('.|_]"]  # drop asciiart lines  # add this line

In the processors section, remove the cloud.id and cloud.auth lines, add the following uncommented lines, and change DEPLOY_ENV to the environment filebeat is being deployed to: dev, sqa, staging, or prod.

# Add deployment environment field to every event to make it easier to sort between Dev and SQA logs.
# DEPLOY_ENV values: dev, sqa, staging, or prod
   - add_fields:
       target: ''
       fields:
         environment: 'DEPLOY_ENV'

Elastic Stack in Dev and QA

This Elastic Stack cluster is used by the Development and QA Kubernetes clusters. Update the files in the bldr0-0 and cabo0-0 subdirectories.

- name: ELASTICSEARCH_HOST
  value: bldr0cuomemstr1.internal.pri

Elastic Stack in Staging

This Elastic Stack cluster is used by the Staging Kubernetes cluster. Update the files in the tato0-1 subdirectory.

- name: ELASTICSEARCH_HOST
  value: tato0cuomelkmstr1.internal.pri

Elastic Stack in Production

This Elastic Stack cluster is used by the Production Kubernetes Cluster. Update the file in the lnmt1-2 subdirectory.

- name: ELASTICSEARCH_HOST
  value: lnmt1cuomelkmstr1.internal.pri
Posted in Computers, Kubernetes | Tagged | Leave a comment

Kubernetes Upgrade to 1.18.8

Upgrading Kubernetes Clusters

The following lists what software and pods will be upgraded during this quarter.

  • Upgrade the Operating System
  • Upgrade Kubernetes
    • Upgrade kudeadm, kubectl, and kubelet RPMs from 1.17.6 to 1.18.8.
    • Upgrade kubernetes-cni RPM from 0.7.5-0 to 0.8.6-0.
    • kube-apiserver is upgraded from 1.17.6 to 1.18.8.
    • kube-controller-manager is upgraded from 1.17.6 to 1.18.8.
    • kube-scheduler is upgraded from 1.17.6 to 1.18.8.
    • kube-proxy is upgraded from 1.17.6 to 1.18.8.
    • coredns is upgraded from 1.6.5 to 1.6.7.
    • etcd maintains at the current version of 3.4.3-0.
  • Upgrade Calico from 3.14.1 to 3.16.0.
  • Upgrade Filebeat from 7.8.0 to 7.9.2.
  • Upgrade docker from 1.3.1-161 to 1.13.1-162.
  • metrics-servers is upgraded from 0.3.6 to 0.3.7.
  • kube-state-metrics isupgrade from 1.9.5 to 1.9.7.

Unchanged Products

There are no unchanged products this quarter.

Upgrade Notes

The following notes provide information on what changes might be affecting users of the clusters when upgrading from one version to the next. The notes I’m adding reflect what I think relevant to the environment so no notes on Azure or OpenShift will be listed. For more detailss, click on the provided links. If something is found that might be relevant, please respond and I’ll check it out and add it in.

Kubernetes Core

The following notes will reflect changes that might be relevant between the currently installed 1.17.6 up through 1.18.8, the target upgrade for Q4. While I’m working to not miss something, if we’re not sure, check the links to see if any changes apply to your product or project.

  • 1.17.7 – kubernetes-cni upgraded to 0.8.6.
  • 1.17.8 – Nothing of interest. Note that there’s a 1.17.8-rc1 as well.
  • 1.17.9 – Privilege escalation patch: CVE-2020-8559. DOS patch: CVE-2020-8557.
  • 1.17.10 – Do not use this release; artifacts are not complete.
  • 1.17.11 – A note that Kubernetes is built with go 1.13.15. No other updates.
  • 1.18.0 – Lots of notes as always. Most are cloud specific (Azure mainly). Some interesting bits though:
    • kubectl debug command added, permits the creation of a sidecar in a pod to assist with troubleshooting a problematic container.
    • IPv6 support is now beta in 1.18.
    • Deprecated APIs
      • apps/v1beta1, apps/v1beta2 – apps/v1
      • daemonsets, deployments, replicates under extensions/v1beta1 – use apps/v1
    • New IngressClass resource added to enable better Ingress configuration
    • autoscaling/v2beta2 HPA added spec.behavior
    • startupProbe (beta) for slow starting containers.
  • 1.18.1 – Nothing much to note
  • 1.18.2 – Fix conversion error for HPA objects with invalid annotations
  • 1.18.3 – init containers are now considered for calculation of resource requests when scheduling
  • 1.18.4 – kubernetes-cni upgraded to 0.8.6
  • 1.18.5 – Nothing of interest. Note there’s a 1.18.5-rc1 as well.
  • 1.18.6 – Privilege escalation patch; CVE-2020-8559. DOS patch; CVE-2020-8557.
  • 1.18.7 – Do not use this release; artifacts are not complete.
  • 1.18.8 – Kubernetes now built with go 1.13.15. Nothing else.

kubernetes-cni

Still search for release notes for the upgrade from 0.7.5 to 0.8.6.

coredns

  • 1.6.6 – Mainly a fix for DNS Flag Day 2020, the bufsize plugin. A fix related to CVE-2019-19794.
  • 1.6.7 – Adding an expiration jitter. Resolve TXT records via CNAME.

Calico

The major release notes are on a single page. Versions noted here to describe the upgrade for each version. For example, 3.14.1 and 3.14.2 both point to the 3.14 Release Notes. Here I’m describing the changes, if relevant, between the .0, .1, and .2 releases.

Note that currently many features of Calico haven’t been implemented yet so improvements, changes, and fixes for Calico probably don’t impact the current clusters.

  • 3.14.1 – Fix CVE-2020-13597 – IPv6 rogue router advertisement vulnerability. Added port 6443 to failsafe ports.
  • 3.14.2 – Remove unnecessary packages from cni-plugin and pod2daemon images.
  • 3.15.0 – WireGuard enabled to secure on the wire in-cluster pod traffic. The ability to migrate key/store data from etcd to use the kube-apiserver.
  • 3.15.1 – Fix service IP advertisement breaking host service connectivity.
  • 3.15.2 – Add monitor-addresses option to calico-node to continually monitor IP addresses. Handle CNI plugin panics more gracefully. Remove unnecessary packages from cni-plugin and pod2daemon images to address CVEs.
  • 3.16.0 – Supports eBPF which is a RH8.2 product (future info not currently available to my clusters. Removed more unnecessary packages from pod2daemon image.

Filebeat

  • 7.8.1 – Corrected base64 encoding of the monitoring.elasticsearch.api_key. Added support for timezone offsets.
  • 7.9.0 – Fixed handling for Kubernetes Update and Delete watcher events. Fixed memory leak in tcp and unix input sources. Fixed file ownership in docker images so they can be used in a secure environment. Logstash module can automatically detect the log format and process accordingly.
  • 7.9.1 – Nothing really jumped out as relevant.
  • 7.9.2 – Nothing in the release notes yet.

docker

This release is related to a CVE to address a vulnerability in 1.13.1-108.

metrics-server

  • 0.3.7 – New image location. Image run as a non-root user. Single file now vs a group of files (components.yaml).

kube-state-metrics

Like Calico, the CHANGELOG is a single file. The different bullet points point to the same file, but describe the changes if relevant.

  • 1.9.6 – Just a single change related to an API mismatch.
  • 1.9.7 – Switched an apiVersion to v1 for the mutatingwebhookconfiguration file.

References

Posted in Computers, Kubernetes | Tagged | Leave a comment

Cinnamon Buns

I tried using the recipe on the website but there were so many ads making constant changes to the webpage that it was impossible to stay where the instructions were. As such, I’m copying the basic instructions here and I’ll use it for the baking attempt.

Dough

  • 1 cup warm milk
  • 2 1/2 teaspoons instant dry yeast
  • 2 large eggs at room temperature
  • 1/3 cup of salted butter (softened)
  • 4 1/2 cups all-purpose flour
  • 1 teaspoon salt
  • 1/2 cup granulated sugar
  1. Pour the warm milk in the bowl of a stand mixer and sprinkle the yeast over the top.
  2. Add the eggs, butter, salt, and sugar
  3. Add in 4 cups of the flour and mix using the beater blade just until the ingredients are barley combined. Allow the mixture to rest for 5 minutes for the ingredients to soak together.
  4. Scrape the dough off of the beater blade and remove it. Attach the dough hook.
  5. Beat the dough on medium speed, adding in up to 1/2 cup more flour if needed to form a dough. Knead for up to 7 minutes until the dough is elastic and smooth. The dough should be a little tacky and still sticking to the side of the bowl. Don’t add too much flour though.
  6. Spray a large bowl with cooking spray.
  7. Use a rubber spatula to remove the dough from the mixer bowl and place it in the greased large bowl.
  8. Cover the bowl with a towel or wax paper.
  9. Set the bowl in a warm place and allow the dough to rise until double. A good place might to be start the oven off at a low setting, 100* for example, turn it off when it’s warm, and then put the bowl into the oven. Figure about 30 minutes for the dough to rise.
  10. When ready, put the dough on a well floured pastry mat or parchment paper and sprinkle more flour on the dough.
  11. Flour up a rolling pin and spread the dough out. It should be about 2′ by 1 1/2′ when done.
  12. Smooth the filling evenly over the rectangle.
  13. Roll the dough up starting on the long, 2′ end.
  14. Cut into 12 pieces and place in a greased baking pan.
  15. Cover the pan and let the rolls rise for 20 minutes or so.
  16. Preheat the oven to 375 degrees.
  17. Pour 1/2 cup of heavy cream over the risen rolls.
  18. Bake for 20-22 minutes or until the rolls are golden brown and the center cooked.
  19. Allow the rolls to cool.
  20. Spread the frosting over the rolls.

Filling

Simple enough. Combine the three ingredients in a bowl and mix until well combined.

  • 1/2 cup of salted butter (almost melted)
  • 1 cup packed brown sugar
  • 2 tablespoons of cinnamon

Frosting

  • 6 ounces of cream cheese (softened)
  • 1/3 cup salted butter (softened)
  • 2 cups of powdered sugar
  • 1/2 tablespoon of vanilla or maple extract
  1. Combine cream cheese and salted butter. Blend well.
  2. Add the powdered sugar and extract.

Posted in Cooking | Tagged | Leave a comment

Summer Access Road

We live in the mountains surrounded by pine and aspens and visited by elk, deer, moose, foxes, and bobcats plus the semi-domesticated animals like dogs and cats.

The other thing we’re visited by are fires. Either by acts of nature like lightning strikes or acts of idiots like the homeless or just reckless folks.

The year before we moved here the area had a pretty large fire called the Cold Springs fire. It was started by a couple of young men who failed to put out their campfire properly.

https://wildfirepartners.org/cold-springs-fire/

Back when the subdivision was created in 1984, an egress route was required by Boulder County so folks up in on Ridge Road could have an alternate method of escaping a fire like the Cold Springs fire. It was called the Summer Access Road in part due to there being no maintenance on the road during winter months.

At a yearly HOA meeting, we heard the following story.

The property to the east of the Summer Access Road was purchased. The folks who purchased the property cut in a driveway at the half way point down the Summer Access Road going uphill to a ridge where they intended to build a house.

Unfortunately they didn’t touch base with the HOA and when they were confronted, indicated they were not only going to build there, that if the HOA didn’t maintain the road in the winter, they’d block it above their driveway which would of course prevent anyone else from using the road.

Reminder of course that is road was created so the folks up on Ridge Road could escape a fire in an emergency. If it’s blocked, that’s going to be a problem.

As a result, the HOA went to court and was able to block their access to the Summer Access Road.

The property does connect with Ridge Road at the topmost corner, however the fire department said the piece is too steep for fire engines and required that the owners work with the adjacent property owners to create an easement for a driveway a fire engine can navigate.

Unfortunately they had so alienated the neighbors that they were denied. As a result, the owners filed a quiet claim to the Summer Access Road. A quiet claim is intended basically to claim ownership of the Summer Access Road and quiet all other claims. This is different than a quit claim in that is used when a co-owner indicates they have no claim to shared property.

So back to court the HOA went. Up to this point, the Summer Access Road was an easement on the three properties that make up that area. The folks who own those three properties allowed the easement. But there wasn’t an official court document that indicated the HOA was fully responsible for the road. So there was some concern about the future of the road.

However the court ruled that the HOA does in fact own the maintenance and management of the road. The property owners were forced to restore the lower driveway and block the usage of it (the location was poorly chosen and run off would have damaged the Summer Access Road). They were permitted to create a driveway that pointed downhill from the first turn of the Summer Access Road and they were required to follow the HOA rules with regards to access to the Summer Access Road. As in the HOA could close the road due to inclement weather or simple maintenance of the road.

The property owners were also forced to put up a large bond for the upper driveway, $26,000 as I understand. And pay a chunk of the legal fees incurred by the HOA, again $42,000 as I recall. They also had to get an okay from the county before proceeding. Apparently they were pretty abusive towards the county when they were visited back at the start of all this.

The bad part in general is the homeowner HOA fees doubled for two years to pay for this. But the good news as noted above is the HOA actually officially is responsible for the Summer Access Road. I’ve not seen any further activity on this property. I don’t know if they are researching how to put in the new road, if the money has tapped them out short term or what. We’ll see what the future holds.

Posted in Colorado, Rocky Knob | Tagged , , | Leave a comment

Docker Registry

Overview

I have a requirement to create a local Docker Registry. I’m doing this because I have four Kubernetes clusters to somewhat mirror the work environment. This lets me test out various new bits I want to apply to the work environment but without using up work resources or involving multiple groups. In addition, we’re on high speed WiFi so we have a pretty small pipe in general. So I’m not constantly using up bandwidth, hosting it locally is the next best thing.

Currently work is using Artifactory. Artifactory has some cool features for Docker in that I can create a Virtual Repository that consists of multiple Remote Repositories. So I can have a group specific Virtual Repository to be used in hosting images and when I try to pull a new image, such as kube-apiserver v1.18.8, Artifactory automatically pulls it into the Virtual Repository. Very nice.

Unfortunately, the Docker management features of Artifactory are a paid for product and in looking at the costs, I can’t justify paying that for my own learning purposes. Hence I’m installing the default Docker Registry.

Installation

It’s actually a pretty simply process overall. I have a CentOS 7 server created, bldr0cuomrepo1.internal.pri, and install the docker-distribution RPM which is part of the extras repository.

Check the configuration file located in /etc/docker-distribution/registry/config.yaml for any changes I might want to make. In my case, the default is fine.

version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /var/lib/registry
http:
    addr: :5000

And finally enable and start the docker-distribution.

# systemctl enable docker-distribution
# systemctl start docker-distribution

Insecure

Okay, well this is an insecure registry as far as Docker and Kubernetes is concerned. As such, I need to make a change to the /etc/docker/daemon.json file.

{
        "insecure-registries" : ["bldr0cuomrepo1.internal.pri:5000"]
}

And of course, restart docker.

Image Management

Now if you want to host your own images for Kubernetes, you can host it locally and have your deployment point to the local registry. In addition, you can pull commonly used images from the internet and host them locally.

# docker pull nginx:alpine

Then you need to tag the image. This involves changing the location from the various internet sites like docker.io, k8s.gcr.io, and quay.io, to your new local repository.

# docker tag nginx:alpine bldr0cuomrepo1.internal.pri:5000/nginx:alpine

Once pulled, you can run a docker image ls to see the installed images. Then you can push it up to your repository.

[root@bldr0cuomifdock1 ~]# docker push bldr0cuomrepo1.internal.pri:5000/llamas-image:v1
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/llamas-image]
ca01cce58e28: Pushed
a181cbf898a0: Pushed
570fc47f2558: Pushed
5d17421f1571: Pushed
7bb2a9d37337: Pushed
3e207b409db3: Pushed
v1: digest: sha256:4a0a5e1d545b9ac88041e9bb751d2e2f389d313ac5a59f7f4c3ce174cd527110 size: 1568

And now that it’s hosted locally, you can pull it to any server where docker is installed.

[root@bldr0cuomifdock1 data]# docker pull bldr0cuomrepo1.internal.pri:5000/llamas-image:v1
v1: Pulling from llamas-image
cbdbe7a5bc2a: Pull complete
10c113fb0c77: Pull complete
9ba64393807b: Pull complete
262f9908119d: Pull complete
c4a057508f96: Pull complete
e044fc51fea0: Pull complete
Digest: sha256:4a0a5e1d545b9ac88041e9bb751d2e2f389d313ac5a59f7f4c3ce174cd527110
Status: Downloaded newer image for bldr0cuomrepo1.internal.pri:5000/llamas-image:v1
bldr0cuomrepo1.internal.pri:5000/llamas-image:v1
Posted in Computers, Docker | Leave a comment

Kubernetes Pod Schedule Prioritization

Introduction

Currently Kubernetes is not configured to treat any pod as more or less important than any other pod with the exception of critical Kubernetes pods such as the kube-apiserver, kube-scheduler, and kube-controller-manager.

Multiple products with different Service Class requirements are hosted on Kubernetes but there is no configuration that provides any prioritization of these products.

The research goal is to identify a process or configuration which would let the Applications and Operations teams identify and ensure their products have priority when using cluster resources. For example, in the event of an unintentional failure such as a worker node failure, or an intentional failure such as removing a worker node from a cluster pool for maintenance.

A secondary goal is to determine if overcommitting the Kubernetes clusters is a viable solution to resource availability.

As always, this is a summation that generally applies to my environment. For full details, links to documents are provided at the end of this document.

Service Class

Service Class is used to define service availability. This is not relevant to individual components of a product but of the overall service itself. This is a list of Service Class definitions.

  • Mission Critical Service (MCS) – 99.999% up-time.
  • Business Critical Service (BCS) – 99.9% up-time.
  • Business Essential Service (BES) – 99% up-time.
  • Business Support Service (BSS) – 98% up-time.
  • Unsupported Business Service (UBS) – No guaranteed service up-time
  • LAB – No guaranteed service up-time.

Note that the PriorityClass design does not ensure the hosted Product satisfies the contracted Service Class. PriorityClass Objects ensures that resources are available to more critical Products should there be resource exhaustion due to overcommitment or worker node failure.

PriorityClass Objects

Kubernetes as of version 1.14 has introduced PriorityClass Objects. This object lets us assign a resource priority to a pod that lets a pod jump ahead in the scheduling queue.

  • 2,000,001,000 – This is used for critical pods running on Kubernetes nodes (system-mode-critical).
  • 2,000,000,000 – This is used for critical pods which manage Kubernetes clusters (system-cluster-critical)
  • 1,000,000,000 – This level and lower is available for any product to use.
  • 0 – This is the default level for all non-critical pods.
Linux:cschelin@lnmt1cuomtool11$  kubectl get priorityclasses -A
NAME                      VALUE        GLOBAL-DEFAULT   AGE
system-cluster-critical   2000000000   false            22d
system-node-critical      2000001000   false            22d

system-node-critical Object

The following pods are assigned to the system-node-critical Object.

  • calico-node
  • kube-proxy

system-cluster-critical Object

The following pods are assigned to the system-cluster-critical Object.

  • calico-kube-controllers
  • coredns
  • etcd
  • kube-apiserver
  • kube-controller-manager
  • kube-scheduler

PriorityClass Definitions

A PriorityClass Object lets us define a set of values which can be used by applications in order to ensure availability based on Service Class. See the below recommendations to be configured for the Kubernetes environments.

  • 7,000,000 – Critical Infrastructure Service
  • 6,000,000 – Mission Critical Service
  • 5,000,000 – Infrastructure Service
  • 4,000,000 – Business Critical Plus Service (a product that requires 99.99% up-time)
  • 3,000,000 – Business Critical Service
  • 2,000,000 – Business Essential Service
  • 1,000,000 – Business Support Service
  • 500,000 – Unsupported Business Service and LAB Services (global default)

Most of the items in the list are well know Service Class definitions. For the ones that I’ve added, additional details follow.

Critical Infrastructure Service

Any pod that is used by any or all other pods in the cluster. Especially if the pod is used by a MCS product.

Infrastructure Service

Standard infrastructure pods such as kube-state-metrics and the metrics-server pods. This includes other services such as Prometheus and Filebeat.

Business Critical Plus Service

Currently there is no 4 9’s Service Class defined however some products have been deployed as requiring 4 9’s support. For this reason, a PriorityClass Object was created to satisfy that Service Class request.

Testing

In testing:

  1. MCS pods in a deployment will run as long as resources are available.
  2. If there are not enough resources for the lower PriorityClass deployments, pods will be started until resources are exhausted. Remaining pods will be put in a Pending state.
  3. If additional MCS pods need to start, lower PriorityClass pods will be Terminated. New pods will start and remain in a Pending state.
  4. Once the additional MCS pods are not needed, they will be deleted and any Pending pods will start.
  5. For multiple MCS deployments there is no PriorityClass priority. If there are unsufficient resources for all MCS pods to start, then any remaining MCS pods will be put in a Pending state.
  6. If any lower PriorityClass pods has sufficient resources to start where a higher PriorityClass pod is unable to start, the lower PriorityClass pod will start.

Pod Premption

There is a PriorityClass option called preemptionPolicy which has been made available in Kubernetes 1.15. This option lets you configure a PriorityClass to not evict pods of a lower PriorityClass. The option moves pods up in the scheduling queue, however it doesn’t evict pods if cluster resources are running low.

PodDisruptionBudget

This Object lets you specific the number of pods that must remain running. However, in testing this doesn’t appear to apply to PriorityClass evictions. If there is insufficient resources, pods in a lower PriorityClass will be evicted regardless of this setting. It will prevent a voluntary failure such as draining a worker node if there aren’t sufficient remaining pods.

Configuration Settings

For Deployments, you’d add the below defined name as a spec.priorityClassName: [name].

The following configurations are recommended for the environment.

Critical Infrastructure Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-infrastructure
value: 7000000
globalDefault: false
description: "This priority class is reserved for infrastructure services that all pods use."

Mission Critical Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: mission-critical
value: 6000000
globalDefault: false
description: "This priority class is reserved for services that require 99.999% uptime."

Infrastructure Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: infrastructure
value: 5000000
globalDefault: false
description: "This priority class is reserved for infrastructure services."

Business Critical Plus Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-critical-plus
value: 4000000
globalDefault: false
description: "This priority class is reserved for services that require 99.99% uptime."

Business Critical Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-critical
value: 3000000
globalDefault: false
description: "This priority class is reserved for services that require 99.9% uptime."

Business Essential Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-essential
value: 2000000
globalDefault: false
description: "This priority class is reserved for services that require 99% uptime."

Business Support Service

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: business-support
value: 1000000
globalDefault: false
description: "This priority class is reserved for services that require 98% uptime."

Unsupported Business Service

Note the globalDefault setting here defining any pod that fails to set a PriorityClass in their Deployments.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: unsupported-business
value: 500000
globalDefault: true
description: "This priority class is reserved for services that have no uptime requirements."

PriorityClass Object Table

Linux:cschelin@lnmt1cuomtool11$ kubectl get pc -A
NAME                              VALUE        GLOBAL-DEFAULT   AGE
business-critical                 3000000      false            3d9h
business-critical-plus            4000000      false            3d9h
business-essential                2000000      false            3d9h
business-support                  1000000      false            3d9h
critical-infrastructure           7000000      false            3s
infrastructure                    5000000      false            6s
mission-critical                  6000000      false            14s
system-cluster-critical           2000000000   false            25d
system-node-critical              2000001000   false            25d
unsupported-business              500000       true             3d9h

Pod Configuration

In order to assign this to pods, you’ll need to add the PriorityClass to the deployment or pod configuration.

    spec:
      containers:
      - image: bldr0cuomrepo1.dev.internal.pri:5000/llamas:v1.4.1
        imagePullPolicy: Always
        name: llamas
      priorityClassName: business-essential

For the pod configuration.

spec:
  containers:
  - name: llamas
    image: bldr0cuomrepo1.dev.internal.pri:5000/llamas:v1.4.1
    imagePullPolicy: Always
  priorityClassName: business-essential

Conclusion

The above recommendations provide a reliable way of ensuring critical products that are deployed to Kubernetes will have the necessary resources to respond appropriately to requests.

In order to prevent service disruption, ensure any deployed product doesn’t consume more resources than the minimum required for all deployed products.

This might also permit overcommitting resources in the clusters.

References

Posted in Computers, Kubernetes | Tagged | 1 Comment

Jenkins And Build Agents

Overview

In this article, I’ll provide instructions in how I installed Jenkins and the two Jenkins Build Agents in my environment.

System Requirements

I used one of my standard templates in vCenter to create the three Jenkins nodes. All three servers have 2 CPUs and 4 Gigs of Memory. For the main Jenkins server, 64 Gigs of storage is sufficient. For Build Agents, 200 Gigs of Storage is recommended. Basically as much as you need when storing deployment jobs. My photos website has about 30 Gigs of pictures. With deployments going to 3 sites (local for testing, docker for the future, and the remote public visible site), that means just the photos website takes almost 100 gigs. Jenkins will require Java 1.8 to be installed prior to installing Jenkins.

Firewall Configuration

As part of Zero Trust Networking, each system has a firewall. You’ll need to configure the firewall for the Jenkins nodes.

firewall-cmd --permanent --new-service=jenkins
firewall-cmd --permanant --service=jenkins --set-short="Jenkins ports"
firewall-cmd --permanent --service=jenkins --set-description="Jenkins port exceptions"
firewall-cmd --permanent --service=jenkins --add-port=8080/tcp
firewall-cmd --permanent --add-service=jenkins
firewall-cmd --zone=public --add-service=http --permanent
firewall-cmd --reload

Installing Jenkins

You’ll need to install the repository configuration and GPG keys, then install Java and finally Jenkins.

wget -O /etc/yum.repos.d/jenkins.repo \
    https://pkg.jenkins.io/redhat-stable/jenkins.repo
rpm --import https://pkg.jenkins.io/redhat-stable/jenkins.io.key
yum upgrade
yum install java-8-openjdk
yum install jenkins
systemctl daemon-reload

Enable and Start Jenkins

Pretty simple process here. You enable and start Jenkins.

systemctl enable jenkins
systemctl start jenkins

Configure Users

During the installation process, a password was created so you can Unlock the Jenkins installation. Copy it from
/var/lib/jenkins/secrets/initialAdminPassword and paste it into the Unlock screen. Once Jenkins is unlocked, you’ll be presented with a Create Administrator page. Fill it in and save it. Once done, you can then access Jenkins and install any plugins you want to use.

Build Agents

In order to effectively use Jenkins, the main node that you installed shouldn’t be processing any jobs. If it processes jobs, it can be overwhelmed and other jobs might queue up delaying deployments. For my homelab it’s not so critical however I am trying to emulate a production like environment so having Build Agents will satisfy that requirement.

Configuring the Build Agents

Jenkins requires a few things before the main system can incorporate a Build Agent. You’ll need to create the jenkins account.

useradd -d /var/lib/jenkins -c "Jenkins Remote Agent" -m jenkins

Of course set a password. Something long and hard to figure out. Then create your public/private key pair. This is used to communicate with the necessary servers.

ssh-keygen -t rsa

This creates the key pair in the Jenkins .ssh directory.

Next is install Java 1.8. Jenkins will communicate with Java in order to run jobs.

yum install -y java-1.8.0-openjdk-headless
Installed:
  java-1.8.0-openjdk-headless.x86_64 1:1.8.0.272.b10-1.el7_9

Dependency Installed:
  copy-jdk-configs.noarch 0:3.3-10.el7_5             javapackages-tools.noarch 0:3.4.1-11.el7              lksctp-tools.x86_64 0:1.0.17-2.el7
  pcsc-lite-libs.x86_64 0:1.8.8-8.el7                python-javapackages.noarch 0:3.4.1-11.el7             python-lxml.x86_64 0:3.2.1-4.el7
  tzdata-java.noarch 0:2020d-2.el7

Complete!

Now that the node is prepared, in Jenkins you’ll need to add a node. Click Manage Jenkins and Manage Nodes and Clouds. Then click on New Node. Once you name it, you’ll fill out the form as follows:

  • Name: Remote Build 2 (this is my first Jenkins Build Agent)
  • Description: Access to the remote server for local files
  • Number of Executors: 2 (rule of thumb is 1 per CPU)
  • Remote root directory: /var/lib/jenkins (the jenkins account home dir)
  • Labels: guardian (the label you’ll use in jobs to determine which Build Agent to use)
  • Usage: Select Only build jobs with label expressions matching this node
  • Launch method: Select Launch agents via SSH
  • Host: 192.168.104.82 (I tend to avoid using DNS as it can be unreliable)
  • Credentials: Select Remote schelin.org server
  • Host Key Verification Strategy: Select Known hosts file Verification Strategy
  • Availability: Select Keep this agent online as much as possible

When you’re done, click Save and the node will show up in the list of nodes. Then create the second Build Agent following the above installation instructions and configure it as follows.

  • Name: Local Build 3 (this is my second Jenkins Build Agent)
  • Description: Local Server Builds
  • Number of Executors: 2 (rule of thumb is 1 per CPU)
  • Remote root directory: /var/lib/jenkins (the jenkins account home dir)
  • Labels: local (the label you’ll use in jobs to determine which Build Agent to use)
  • Usage: Select Only build jobs with label expressions matching this node
  • Launch method: Select Launch agents via SSH
  • Host: 192.168.104.81 (I tend to avoid using DNS as it can be unreliable)
  • Credentials: Select Local environment
  • Host Key Verification Strategy: Select Known hosts file Verification Strategy
  • Availability: Select Keep this agent online as much as possible

And when done, click Save and the node will show up

References

Posted in Computers, Jenkins | Tagged | Leave a comment