Kubernetes Networking

Overview

This article provides instructions in installing the networking layer to the Kubernetes clusters.

Calico Networking

You’ll need to install Calico which is the network layer for the cluster. There are two files you’ll retrieve from Tigera who makes Calico. The tigera-operator.yaml and custom-resources.yaml files.

In the custom-resources.yaml file, update the spec.calicoNetwork.ipPools.cidr line to point to the PodNetwork. In my case, 10.42.0.0/16.

In the tigera-operator.yaml file, update the image: line to point to the on-prem insecure registry and any imagePullPolicy lines to Always.

Once done, use kubectl to install the two configurations. First the tigera-operator.yaml file, then the custom-resources.yaml file.

kubectl create -f tigera-operator.yaml
kubectl create -f custom-resources.yaml

When done and all is working, you should also see several calico pods start up.

$ kubectl get pods -A | grep -E "(calico|tigera)"
calico-apiserver   calico-apiserver-6fd86fcb4b-77tld                         1/1     Running   0             32m
calico-apiserver   calico-apiserver-6fd86fcb4b-p6bzc                         1/1     Running   0             32m
calico-system      calico-kube-controllers-dd6c88556-zhg6b                   1/1     Running   0             45m
calico-system      calico-node-66fkb                                         1/1     Running   0             45m
calico-system      calico-node-99qs2                                         1/1     Running   0             45m
calico-system      calico-node-dtzgf                                         1/1     Running   0             45m
calico-system      calico-node-ksjpr                                         1/1     Running   0             45m
calico-system      calico-node-lhhrl                                         1/1     Running   0             45m
calico-system      calico-node-w8nmx                                         1/1     Running   0             45m
calico-system      calico-typha-69f9d4d5b4-vp7mp                             1/1     Running   0             44m
calico-system      calico-typha-69f9d4d5b4-xv5tg                             1/1     Running   0             45m
calico-system      calico-typha-69f9d4d5b4-z65kn                             1/1     Running   0             44m
calico-system      csi-node-driver-5czsp                                     2/2     Running   0             45m
calico-system      csi-node-driver-ch746                                     2/2     Running   0             45m
calico-system      csi-node-driver-gg9f4                                     2/2     Running   0             45m
calico-system      csi-node-driver-kwbwp                                     2/2     Running   0             45m
calico-system      csi-node-driver-nh564                                     2/2     Running   0             45m
calico-system      csi-node-driver-rvfd4                                     2/2     Running   0             45m
tigera-operator    tigera-operator-7d89d9444-4scfq                           1/1     Running   0             45m

It does take a bit so give it some time to get going.

Troubleshooting

I did have a problem with the installation the first time as I hadn’t updated the custom-resources.yaml file to update the cidr line with my podnetwork configuration. After rebuilding the cluster, I updated and reapplied and it worked. One other issue was crio wasn’t enabled or started on the first control node for some reason. Once it was enabled and started, it worked as expected.

Posted in Computers, Kubernetes | Tagged , | 1 Comment

Kubernetes Metrics Server

Overview

The metrics server collects metrics from your kubernetes cluster. It’s also used by the Horizontal Pod Autoscaling (HPA) function to let you manage pods.

Installation

For my clusters, it’s a pretty simple configuration. I retrieve the components.yaml file from the metrics-server github site (see References below), compare it with the previous version if any, retrieve the images, tag, and push them to the local repository, then update the components.yaml file to point to the local repository. When done, simply apply it to the cluster.

kubectl apply -f components.yaml

Issue

I found that if I add the line to the KubeletConfiguration block when initializing the cluster, it’ll be added to the appropriate config.yaml files and the kubelet-config configmap in the cluster. I will leave this here as a reminder in case it pops up.

There is one issue that has to be addressed. See the References section for a link. Basically one of the flags indicates a preference for using IP Addresses for pods before external IPs or hostnames. Since IP Addresses weren’t part of the cluster build, metrics-server won’t start, generating tons of certificate errors. Of course you can move Hostname to the front of the line but you’re adding a DNS lookup to your list of tasks. You can also add a ignore tls flag which of course isn’t secure.

kube-system       metrics-server-5597479f8d-fn8xm                           0/1     Running   0               13h

What to do?

First you’ll need to edit the kubelet-config configmap and add serverTLSBootstrap: true right after the kind: KubeletConfiguration line and save it.

$ kubectl edit configmap kubelet-config -n kube-system
configmap/kubelet-config edited

Next you’ll have to edit every control node and worker node’s /var/lib/kubelet/config.yaml file and add the same line at the same place and restart kubelet.

Finally Certificate Requests (csr) will be created for each node. You’ll need to approve each CSR.

$ kubectl get csr
NAME        AGE   SIGNERNAME                      REQUESTOR                                      REQUESTEDDURATION   CONDITION
csr-4kr8m   20s   kubernetes.io/kubelet-serving   system:node:bldr0cuomkube3.dev.internal.pri    <none>              Pending
csr-fqpvs   28s   kubernetes.io/kubelet-serving   system:node:bldr0cuomknode3.dev.internal.pri   <none>              Pending
csr-m526d   27s   kubernetes.io/kubelet-serving   system:node:bldr0cuomkube2.dev.internal.pri    <none>              Pending
csr-nc6t7   27s   kubernetes.io/kubelet-serving   system:node:bldr0cuomkube1.dev.internal.pri    <none>              Pending
csr-wxhfd   28s   kubernetes.io/kubelet-serving   system:node:bldr0cuomknode1.dev.internal.pri   <none>              Pending
csr-z42x4   28s   kubernetes.io/kubelet-serving   system:node:bldr0cuomknode2.dev.internal.pri   <none>              Pending
$ kubectl certificate approve csr-4kr8m
certificatesigningrequest.certificates.k8s.io/csr-4kr8m approved

During this process, if you’re monitoring the pods, you’ll see the metrics-server start. It’s because you’ve approved the csr on the node where the metrics-server is running. Make sure you do all the servers.

kube-system       metrics-server-5597479f8d-fn8xm                           1/1     Running   0               13h

Issue

I’m still working through this but if I start the metrics-server before or after Calico is started, it requires the pod to be deleted to actually get metrics. Will try a few more installations to see if I can identify just exactly when the metrics-server should be started.

References

  • https://github.com/kubernetes-sigs/metrics-server
  • https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubelet-serving-certs
  • https://github.com/kubernetes-sigs/metrics-server/issues/196 – This helped me resolve the issue by mainly pointing to the actual docs but there are good troubleshooting info here.
Posted in Computers, Kubernetes | Tagged , , , | 1 Comment

Installing Kubernetes

Overview

This article provides instructions in building the Kubernetes cluster using kubeadm and any post installation requirements.

Build Cluster

On the first control plane node run the kubeadm command.

kubeadm init --config kubeadm-config.yaml --upload-certs

After the first node has been initialized, the connect strings will be provided to join the two control nodes and three worker nodes to the new cluster. Add in the second control plane node using the string and then the third node. Do them in order as the third one will time out while the second one is pulling images.

When all three control plane nodes are up, use the worker connect string from the first control plane node and add in all three worker nodes. They can be added in parallel or sequentially but they do get added quickly.

You can then check the status of the cluster.

$ kubectl get nodes
NAME                               STATUS   ROLES           AGE   VERSION
bldr0cuomknode1.dev.internal.pri   Ready    <none>          8d    v1.25.7
bldr0cuomknode2.dev.internal.pri   Ready    <none>          8d    v1.25.7
bldr0cuomknode3.dev.internal.pri   Ready    <none>          8d    v1.25.7
bldr0cuomkube1.dev.internal.pri    Ready    control-plane   8d    v1.25.7
bldr0cuomkube2.dev.internal.pri    Ready    control-plane   8d    v1.25.7
bldr0cuomkube3.dev.internal.pri    Ready    control-plane   8d    v1.25.7

And check all the pods as well to make sure everything is running as expected.

$ kubectl get pods -A
NAMESPACE         NAME                                                      READY   STATUS    RESTARTS     AGE
kube-system       coredns-565d847f94-bp2c7                                  1/1     Running   2            8d
kube-system       coredns-565d847f94-twlvf                                  1/1     Running   0            3d17h
kube-system       etcd-bldr0cuomkube1.dev.internal.pri                      1/1     Running   0            4d
kube-system       etcd-bldr0cuomkube2.dev.internal.pri                      1/1     Running   1 (4d ago)   4d
kube-system       etcd-bldr0cuomkube3.dev.internal.pri                      1/1     Running   0            18h
kube-system       kube-apiserver-bldr0cuomkube1.dev.internal.pri            1/1     Running   0            4d
kube-system       kube-apiserver-bldr0cuomkube2.dev.internal.pri            1/1     Running   0            4d
kube-system       kube-apiserver-bldr0cuomkube3.dev.internal.pri            1/1     Running   0            18h
kube-system       kube-controller-manager-bldr0cuomkube1.dev.internal.pri   1/1     Running   0            4d
kube-system       kube-controller-manager-bldr0cuomkube2.dev.internal.pri   1/1     Running   0            4d
kube-system       kube-controller-manager-bldr0cuomkube3.dev.internal.pri   1/1     Running   0            18h
kube-system       kube-proxy-bpcfh                                          1/1     Running   1            8d
kube-system       kube-proxy-jl469                                          1/1     Running   1            8d
kube-system       kube-proxy-lrbh6                                          1/1     Running   2            8d
kube-system       kube-proxy-n9q4f                                          1/1     Running   2            8d
kube-system       kube-proxy-tf9wt                                          1/1     Running   1            8d
kube-system       kube-proxy-v66pt                                          1/1     Running   2            8d
kube-system       kube-scheduler-bldr0cuomkube1.dev.internal.pri            1/1     Running   0            4d
kube-system       kube-scheduler-bldr0cuomkube2.dev.internal.pri            1/1     Running   0            4d
kube-system       kube-scheduler-bldr0cuomkube3.dev.internal.pri            1/1     Running   0            18h

Certificate Signing Requests

When the cluster is up, due to the kubelet configuration updates, you’ll need to approve some CSRs. It’s an easy process to do with one caveat, the certs are only good for a year so you’ll need to do this again next year. Make a note.

$ kubectl get csr
NAME        AGE   SIGNERNAME                      REQUESTOR                                      REQUESTEDDURATION   CONDITION
csr-4kr8m   20s   kubernetes.io/kubelet-serving   system:node:bldr0cuomkube3.dev.internal.pri    <none>              Pending
csr-fqpvs   28s   kubernetes.io/kubelet-serving   system:node:bldr0cuomknode3.dev.internal.pri   <none>              Pending
csr-m526d   27s   kubernetes.io/kubelet-serving   system:node:bldr0cuomkube2.dev.internal.pri    <none>              Pending
csr-nc6t7   27s   kubernetes.io/kubelet-serving   system:node:bldr0cuomkube1.dev.internal.pri    <none>              Pending
csr-wxhfd   28s   kubernetes.io/kubelet-serving   system:node:bldr0cuomknode1.dev.internal.pri   <none>              Pending
csr-z42x4   28s   kubernetes.io/kubelet-serving   system:node:bldr0cuomknode2.dev.internal.pri   <none>              Pending
$ kubectl certificate approve csr-4kr8m
certificatesigningrequest.certificates.k8s.io/csr-4kr8m approved

Security Settings

Per the CIS group, several of the installed files need to be updated to ensure proper settings. Review the CIS documentation to see which files and directories need to be updated.

Image Updates

As noted earlier, update the kubernetes manifests to point to the local image registry. These files are on each of the control nodes in the /etc/kubernetes/manifests directory. In addition update the imagePullPolicy to Always. This ensures you always get the correct, uncorrupted image. Each kube and etcd containers will restart automatically when the manifest files are updated.

Conclusion

The cluster is up now. Now we’ll need to add the network management layer (Calico), metrics-server, ingress controller, and for the development cluster, a continuous delivery tool (argocd).

Posted in Computers, Kubernetes, Uncategorized | Tagged , | 1 Comment

Preparing Kubernetes

Overview

This article will provide a howto on preparing hosts to install Kubernetes 1.25.7 on CentOS 7 using kubeadm. I’ll be using CRI-O as the container environment and Calico for the network layer. A followup article will provide instructions in building the cluster and post installation needs.

Note that I tried Rocky Linux 8 but podman isn’t current enough for CRI-O and is throwing errors due to a change in the configuration file from a single entry to multiple entries.

Insecure Registries

Currently I’m using an on-prem insecure registry. I installed the docker distribution software which works well enough to host local images. Then on a docker server, I pull the necessary images, tag them with the local information, and then push them to the new local registry. Then I update kubernetes manifests and other tools to point to the local registry. With this, I’m not pulling images from the internet every time I make some change or another.

Prepare Hosts

There are a few things that need to be done with the hosts to make them ready.

Container Runtime

In order to use a container run time, you’ll need to create a couple of files. You’ll be creating a bridge and overlay file and modify the system with sysctl.

First time in /etc/modules-load-d create a br_netfilter.conf file.

br_netfilter

Next create the /etc/modules-load-d overlay.conf file.

br_netfilter

You can either restart the system or simply use modprobe to load the modules.

modprobe overlay
modprobe br_netfilter

Next create /etc/sysctl.d/kubernetes.conf and add the following lines:

net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1

Again, restart the system or simply reload the sysctl table:

sysctl --system

Disable swap

First off is to disable and remove swap from the all the nodes, control and worker. Since it’s Kubernetes and it manages resources, swap is not needed.

  • Remove the /dev/mapper/vg00-swap line from /etc/fstab
  • Remove rd.lvm.lv=vg00/swap from /etc/default/grub and run grub2-mkconconfig -o /boot/grub2/grub.cfg to rebuild the grub.cfg file.
  • Disable swap by running swapoff -v /dev/mapper/vg00-swap
  • Run the umount /dev/mapper/vg00-swap command to remove swap the run lvremove /dev/mapper/vg00-swap to recover the space.

If SELinux is configured, ensure the SELINUX line is set to permissive in /etc/selinux/config. You’ll need to reboot of course to enable this.

You may want to do some Quality of Service management. If so, install the iproute-tc tool. See the References section for further information on the software.

Firewalls

I have firewalls running around on my servers as I follow the zero-trust networking model, however because I’m using Calico for my network layer, it handles it for me so you need to disable the firewall on all nodes.

Docker

At least for CentOS 7 you’ll install the docker and tools on all nodes.

yum install -y docker docker-common docker-client

Configure docker to allow access to the on-prem insecure registries. Without this, docker will not pull the images. In addition, you want to use journald for logging. Update the /etc/docker/daemon.json file as follows:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "5"
  },
  "insecure-registries": ["bldr0cuomrepo1.dev.internal.pri:5000"]
}

In addition, update the docker system startup file and add the following flag.

--log-driver=journald

Container Runtime

Now the systems are ready for CRI-O. You’ll need to add a couple of repositories to your control nodes before doing the installation plus as of 1.24.0, you’ll have the option of selecting a CNI plugin. I’ll be using the containernetworking-plugins as that’s now it was set up but you have the option to select a different one if you like.

Configure Repositories

You’ll need to add the two repositories as provided below. While we can pull the files from the CRI-O website, as always we want consistency across the clusters. We are installing 1.24.0 on CentOS 7.

First the crio.repo file. Save it in /etc/yum.repos.d/crio.repo

[devel_kubic_libcontainers_stable_cri-o_1.24]
name=devel:kubic:libcontainers:stable:cri-o:1.24 (CentOS_7)
type=rpm-md
baseurl=https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.24/CentOS_7/
gpgcheck=1
gpgkey=https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.24/CentOS_7/repodata/repomd.xml.key
enabled=1

Next is the stable.repo. Again save it in /etc/yum.repos.d/stable.repo

[devel_kubic_libcontainers_stable]
name=Stable Releases of Upstream github.com/containers packages (CentOS_7)
type=rpm-md
baseurl=https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_7/
gpgcheck=1
gpgkey=https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_7/repodata/repomd.xml.key
enabled=1

Install the crio package.

yum install crio

Then install the CNI of choice.

yum install containernetworking-plugins

In order for CRI-O to know about the on-prem insecure registries, you’ll need to update the /etc/containers/registries.conf. Add the following TOML formatted block of code.

[[registry]]
prefix = "bldr0cuomrepo1.dev.internal.pri:5000"
insecure = true
location = "bldr0cuomrepo1.dev.internal.pri:5000"

The pause container isn’t displayed when getting a listing of pods, but it’s used by Kubernetes to manage the network namespace so restarting or crashing pods don’t lose their network configuration. In order to point to a local insecure registry, you have to update the /etc/crio/crio.conf file with the following line:

pause_image = "bldr0cuomrepo1.dev.internal.pri:5000/pause:3.6"

When all are installed. Enable and start crio.

systemctl enable crio
systemctl start crio

Kubernetes Binaries

In order to install kubernetes binaries, you’ll first need to install the kubernetes repository into /etc/yum.repos.d. Create the file, kubernetes.repo and add the following lines.

[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl

And now, install the necessary binaries.

dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

Next enable kubelet. You won’t be able to start it as the config.yaml file doesn’t exist yet. That’s created when you run kubeadm.

systemctl enable kubelet

Build kubeadm Config

There are multiple options for the kubeadm-config.yaml file. Here is the one I’m using when building the cluster. This file should only be on the first control node as once the cluster is started, you’ll have commands to run to join other control and worker nodes to the first control node.

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  imagePullPolicy: Always
---
apiVersion: kubeadm.k8s.io/v1beta3
clusterName: "bldr"
controlPlaneEndpoint: "bldr0cuomvip1.dev.internal.pri:6443"
etcd:
  local:
    imageRepository: "bldr0cuomrepo1.dev.internal.pri:5000"
imageRepository: "bldr0cuomrepo1.dev.internal.pri:5000"
kind: ClusterConfiguration
kubernetesVersion: "1.25.7"
networking:
  podSubnet: "10.42.0.0/16"
  serviceSubnet: "10.69.0.0/16"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
serverTLSBootstrap: true

There are three sections here to detail.

InitConfiguration

For security purposes, we want to adjust all images to be pulled every time it needs to be loaded. Since the image repository is on-prem, having it be set to Always isn’t a big issue.

ClusterConfiguration

There are several options we’ll set to make sure we are running properly when initializing the cluster.

clusterName: I have four environments that will have clusters. The sites are bldr (dev), cabo (qa), tato (stage), and lnmt (production). Set this to one of the environments.

controlPlaneEndpoint: This is the HAProxy VIP along with the port of 6443.

imageRepository: This is the local image repository, in this case bldr0cuomrepo1.dev.internal.pri:5000. It’s set for the etcd binary and the three kubernetes binaries.

kubernetesVersion: Set it to the version being installed, in this case 1.25.7.

networking.podSubnet: Set to the network all the pods will be started on.

networking.serviceSubnet: Set to the network all internal services will use.

KubeletConfiguration

This is used by the metrics-server in order to access the cluster and return statistics. This setting will be applied to every servers kubelet config.yaml file plus to the cluster kubeadm-config configmap.

As a note, Certificate Signing Requests (CSRs) will need to be approved once the cluster is up.

Conclusion

The servers are all prepared and ready to be started. Log in to the first control node and follow the instructions for building the cluster.

References

Posted in Computers, Kubernetes | Tagged | 1 Comment

Kubernetes Storage

Overview

This article provides some quick instructions on creating an NFS server for use as Persistent Storage in Kubernetes. A different article will discuss creating Persistent Storage.

Firewall Configuration

For the NFS server, it only will be accessed by Kubernetes so we’ll restrict access to the NFS share to the environments network. To do that and not block access via ssh, we’ll create a new firewall zone called nfs. We’ll add nfs, rpc-bind, and mountd to that zone plus add the network range. Ultimately we’ll have the following configuration.

# firewall-cmd --zone nfs --list-all
nfs (active)
  target: default
  icmp-block-inversion: no
  interfaces:
  sources: 192.168.101.0/24
  services: mountd nfs rpc-bind
  ports:
  protocols:
  forward: no
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

NFS Configuration

To prepare the storage, we’ll create the three directories. We’re creating a registry directory for OpenShift/OKD4 although it’s not used in Kubernetes. I do have an OKD4 cluster that will use this storage as well.

mkdir -p /srv/nfs4
chmod 755 /srv/nfs4
chown -R root:root /srv

mkdir /srv/nfs4/registry
chmod 755 /srv/nfs4/registry
chown nobody:nobody /srv/nfs4/registry

mkdir /srv/nfs4/storage
chmod 755 /srv/nfs4/storage
chown nobody:nobody /srv/nfs4/storage

NFS Installation

Install the nfs-utils and python3-libselinux packages. Then create the /etc/exports file that creates the shared drives.

/srv/nfs4              192.168.101.0/24(rw,sync,no_subtree_check,crossmnt,fsid=0)
/srv/nfs4/registry     192.168.101.0/24(rw,sync,no_subtree_check,no_root_squash,no_all_squash,insecure,fsid=1)
/srv/nfs4/storage     192.168.101.0/24(rw,sync,no_subtree_check,no_root_squash,no_all_squash,insecure,fsid=2)

Export the file systems.

exportfs -ra

Enable and start the nfs-server.

systemctl enable nfs-server
systemctl start nfs-server

Verification

To make sure the shares are ready, run the following command.

# showmount --exports
Export list for bldr0cuomnfs1.dev.internal.pri:
/srv/nfs4/storage  192.168.101.0/24
/srv/nfs4/registry 192.168.101.0/24
/srv/nfs4          192.168.101.0/24

And finished.

Posted in Computers, Kubernetes | Tagged , , | 1 Comment

Load Balancing Kubernetes

Overview

This article provides instructions in how I set up my HAProxy servers (yes two) to provide access to the Kubernetes cluster.

Configuration

To emulate a production like environment, I’m configuring two HAProxy servers to provide access to the Kubernetes cluster. In order to ensure access to Kubernetes, I’m also installing keepalived. In addition, I’m using a tool called monit to ensure the haproxy binary continues to run in case it stops.

The server configuration isn’t gigantic. I am using my default CentOS 7.9 image though so it’s 2 CPUs, 4 Gigs of memory, and 100 Gigs of storage but only 32 Gigs are allocated.

HAProxy

I am making a few changes to the default installation of haproxy. In the global block the following configuration is in place.

global
        log /dev/log local0
        log /dev/log local1 notice
        chroot /var/lib/haproxy
        stats socket /var/lib/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # Default ciphers to use on SSL-enabled listening sockets.
        # For more information, see ciphers(1SSL). This list is from:
        #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
        # An alternative list with additional directives can be obtained from
        #  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3

In the defaults block of the haproxy.cfg file, the following configuration is in place.

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5s
        timeout client  50s
        timeout server  50s

I also added a listener so you can go to the web page and see various statistics on port 1936. Don’t forget to set the firewall to let you access the stats.

listen stats
        bind *:1936
        mode http
        log  global
        maxconn 10
        stats enable
        stats hide-version
        stats refresh 30s
        stats show-node
        stats show-desc Stats for the k8s cluster
        stats uri /
        monitor-uri /healthz/ready

There are two ports that need to be open for Kubernetes control plane nodes. Ports 6443 for the api server and 22623 for the machine config server. Set up the frontend and backend configurations as follows:

frontend kubernetes-api-server
        bind *:6443
        default_backend kubernetes-api-server
        mode tcp
        option tcplog

backend kubernetes-api-server
        mode tcp
        server bldr0cuomkube1 192.168.101.160:6443 check
        server bldr0cuomkube2 192.168.101.161:6443 check
        server bldr0cuomkube3 192.168.101.162:6443 check


frontend machine-config-server
        bind *:22623
        default_backend machine-config-server
        mode tcp
        option tcplog

backend machine-config-server
        mode tcp
        server bldr0cuomkube1 192.168.101.160:22623 check
        server bldr0cuomkube2 192.168.101.161:22623 check
        server bldr0cuomkube3 192.168.101.162:22623 check

For the worker nodes, the following configuration for ports 80 and 443 are required.

frontend ingress-http
        bind *:80
        default_backend ingress-http
        mode tcp
        option tcplog

backend ingress-http
        balance source
        mode tcp
        server bldr0cuomknode1-http-router0 192.168.101.163:80 check
        server bldr0cuomknode2-http-router1 192.168.101.164:80 check
        server bldr0cuomknode3-http-router2 192.168.101.165:80 check


frontend ingress-https
        bind *:443
        default_backend ingress-https
        mode tcp
        option tcplog

backend ingress-https
        balance source
        mode tcp
        server bldr0cuomknode1-http-router0 192.168.101.163:443 check
        server bldr0cuomknode2-http-router1 192.168.101.164:443 check
        server bldr0cuomknode3-http-router2 192.168.101.165:443 check

Before starting haproxy, you’ll need to do some configuration work. For logging, create the /var/log/haproxy directory as logs will be stored there.

Since we’re using chroot to isolate haproxy, create the /var/lib/haproxy/dev directory. Then create a socket for the logs:

python3 -c "import socket as s; sock = s.socket(s.AF_UNIX); sock.bind('/var/lib/haproxy/dev/log')"

To point to this new device, add the following configuration file to /etc/rsyslog.d called 49-haproxy.conf and restart rsyslog.

# Create an additional socket in haproxy's chroot in order to allow logging via
# /dev/log to chroot'ed HAProxy processes
$AddUnixListenSocket /var/lib/haproxy/dev/log

# Send HAProxy messages to a dedicated logfile
if $programname startswith 'haproxy' then /var/log/haproxy/haproxy.log
&~

keepalived

Since there are two servers, I have hap1 as the primary and hap2 as the secondary server. On the primary server, use the following configuration.

vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    interface ens192
    state MASTER
    priority 200

    virtual_router_id 33
    unicast_src_ip 192.168.101.61
    unicast_peer {
        192.168.101.62
    }

    advert_int 1
    authentication {
        auth_type PASS
        auth_pass [unique password]
    }

    virtual_ipaddress {
        192.168.101.100
    }

    track_script {
        chk_haproxy
    }
}

And the backup server

vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}

vrrp_instance VI_1 {
    interface ens192
    state BACKUP
    priority 100

    virtual_router_id 33
    unicast_src_ip 192.168.101.62
    unicast_peer {
        192.168.101.61
    }

    advert_int 1
    authentication {
        auth_type PASS
        auth_pass [Unique password]
    }

    virtual_ipaddress {
        192.168.101.100
    }

    track_script {
        chk_haproxy
    }
}

monit

The monit tool watches running processes and if the process ceases to exist, the tool restarts the process. It can be configured to notify admins as well. The following changes were made to the default monit configuration.

Note that the username and password appear to be hard coded into monit. The best I could do was ensure access was read-only.

set daemon  120              # check services at 2 minute intervals

set log /var/log/monit.log

set idfile /var/lib/monit/.monit.id

set statefile /var/lib/monit/.monit.state

set eventqueue
    basedir /var/lib/monit/events  # set the base directory where events will be stored
    slots 100                      # optionally limit the queue size

set httpd
    port 2812
     address 192.168.101.62                  # only accept connection from localhost
     allow 192.168.101.62/255.255.255.255    # allow localhost to connect to the server
     allow 192.168.101.90/255.255.255.255                    # allow connections from the tool server
     allow 192.168.0.0/255.255.0.0                           # allow connections from the internal servers
     allow admin:monit read-only   # require authentication

include /etc/monit.d/*


Posted in Computers, Kubernetes | Tagged , , , | 1 Comment

Docker Distribution

Overview

At a prior job I used Artifactory to manage images. The nice thing about Artifactory is you can create a Virtual Repository in that you configure it to automatically pull images from a Remote Repository to make it available as if it was a Local Repository.

What Docker Distribution is is a simple Docker Repository. Since I’m on High Speed Wi-Fi here in the mountains, I don’t want to keep pulling images and disrupting both our bandwidth but neighbors bandwidth as it’s WiFi.

Installation

Installing the software is easy enough.

# yum install -y docker-distribution

Once installed, check the /etc/docker-distribution/registry/config.yml file for settings but for me, the default is fine. When finished, enable and start the tool.

# systemctl enable docker-distribution
# systemctl start docker-distribution

Listing

I wanted to be able to view what was in the repository and there wasn’t an easy to do it without going into the /var/lib/registry/docker/registry/v2/repositories and parsing out the _manifests and tags directories. As a result, I created an index.php script that parses it out and displays it. In order to do so, I installed the web server and php.

Docker Images
Docker Image 	Docker Tag 	Pull String
argocd 	v2.6.6 	bldr0cuomrepo1.dev.internal.pri:5000/argocd:v2.6.6
centos 	latest 	bldr0cuomrepo1.dev.internal.pri:5000/centos:latest
cni 	v3.17.1 	bldr0cuomrepo1.dev.internal.pri:5000/cni:v3.17.1
cni 	v3.18.2 	bldr0cuomrepo1.dev.internal.pri:5000/cni:v3.18.2
cni 	v3.20.1 	bldr0cuomrepo1.dev.internal.pri:5000/cni:v3.20.1
coredns 	1.3.1 	bldr0cuomrepo1.dev.internal.pri:5000/coredns:1.3.1

Managing Images

The steps involved in managing local images is pretty easy over all. I have a separate docker server I use for building images. Over there I pull the image, tag it, and push the image to the local repository.

docker pull registry.k8s.io/kube-apiserver:v1.25.7
docker tag registry.k8s.ip/kube-apiserver:v1.25.7 bldr0cuomrepo1.dev.internal.pri:5000/kube-apiserver:v1.25.7
docker push bldr0cuomrepo1.dev.internal.pri:5000/kube-apiserver:v1.25.7

Once all the images are uploaded, you can then delete any images on the docker server to keep things cleaned up.

docker rmi registry.k8s.io/kube-apiserver:v1.25.7
docker rmi bldr0cuomrepo1.dev.internal.pri:5000/kube-apiserver:v1.25.7

Finally you’ll update the Kubernetes manifests and anything else that loads images from the internet.

Posted in Computers, Docker | Tagged , | 1 Comment

Terraform Builds

Overview

This article provides instruction on how to use Terraform to build virtual machines on VMware.

Preparation

I use templates to build virtual machines and have several templates in order to build systems quickly. I mainly had a single machine for each type of system that I want to use to test things.

  • CentOS 7
  • CentOS 8
  • Debian 10
  • Red Hat Enterprise Linux 8
  • Rocky Linux 8
  • SUSE 15
  • Ubuntu 20

Each of these template images is updated periodically or replaced, has my personal account on it, has my service account, and is configured so my service account is able to run ansible playbooks against any built machine.

In order for the process I’m using to build machines, I’ve created a unique template for each environment that begins with the environment name (bldr0, cabo0, and so on). In addition, to speed up builds, I have duplicates in each environment for each of my R720XD servers (Monkey, Morgan, and Slash). It speeds up from around 6 minutes to provision a VM to 2 minutes. If you have shared storage, having a single template would work fine. I’ve configured the VM to have a base template IP address (192.168.101.42, and so on) in order for the Terraform script to be able to communicate and configure the new machine. The templates also have some standard information such as the service account, ssh keys, sudoers access, and the necessary standard groups (sysadmin and tmproot).

Process

I’m a learn from example person and tend to look for a page where someone has a successful build process. The problem I find with official documentation is it has to address all possibilities so it’s a lot more complicated to parse and apply. It’s one of the reasons for this documentation as it’s just what’s needed to build the environment for my purposes. Sometimes even the best docs don’t explain everything, for example Gary’s docs don’t provide information on the example shell script he listed at the end of his docs.

Per the Terraform docs, they don’t recommend using a provider to reconfigure your new virtual machine. The suggestion is to mount a virtual CD similar to the cloud-init image to automatically configure the VM. The reason for this makes sense. For the provider process, you’re logging into the new VM so you have to have the credentials in the Terraform script plus have a unique shell script to run any commands you want to run to reconfigure the new VM.

Personally in a homelab environment, I don’t have a problem with this and it is easier than building a CD image and a process for accessing it and reconfiguring the VM.

Configuration

The first part is to set the variables used to build the virtual machine. I would suggest using Hashicorp Vault to manage credentials. Note that my kvm Terraform installation I use a variables file and fill in a module when building systems. I plan on that here as well so the document will be updated (again) when I figure that out.

For the below configuration, I’ll have my information since I’m using this document to rebuild my environment if needed. Update it with your information. I will hide the credentials of course.

I’m using a pretty basic vCenter configuration. I don’t have centralized storage and am not using centrally managed network configurations. It’s just easier and less complicated.

About the only thing I haven’t figured out yet is how to place the new VM in the correct folder. If I figure it out, I’ll update this document.

provider "vsphere" {
  # If you use a domain, set your login like this "Domain\\User"
  user           = "administrator@vcenter.local"
  password       = "[password]"
  vsphere_server = "lnmt1cuomvcenter.internal.pri"

  # If you have a self-signed cert
  allow_unverified_ssl = true
}

data "vsphere_datacenter" "dc" {
  name = "Colorado"
}

# If you don't have any resource pools, put "/Resources" after cluster name
data "vsphere_resource_pool" "pool" {
  name          = "192.168.1.15/Resources"
  datacenter_id = data.vsphere_datacenter.dc.id
}

# Retrieve datastore information on vsphere
data "vsphere_datastore" "datastore" {
  name          = "NikiVMs"
  datacenter_id = data.vsphere_datacenter.dc.id
}

# Retrieve network information on vsphere
data "vsphere_network" "network" {
  name          = "QA Network"
  datacenter_id = data.vsphere_datacenter.dc.id
}

# Retrieve template information on vsphere
data "vsphere_virtual_machine" "template" {
  name          = "cabo0cuomcentos7_monkey"
  datacenter_id = data.vsphere_datacenter.dc.id
}

System Built

The second section defines the build for Terraform and creates the VM. At the end the remote-exec provisioning pushes up the shell script which reconfigures the new VM.

The resource name, ‘cabo-02’ gives me a unique name in the Terraform state file so that I can manage different machines. This one is for cabo0cuomkube1. The cabo0cuomkube2 system will be ‘cabo-03’ and so on.

We define the number of CPUs and RAM in case it’s different from the main template (2 CPUs and 4 Gigs of RAM).

In the disk block, define the template to be used. The clone block will then build the system using that image.

The remote-exec process provides the credentials to upload the script.sh shell script. The host line is the template IP address which is the same for every VM created in this environment (cabo).

# Set vm parameters
resource "vsphere_virtual_machine" "cabo-02" {
  name             = "cabo0cuomkube1"
  num_cpus         = 4
  memory           = 8192
  datastore_id     = data.vsphere_datastore.datastore.id
  resource_pool_id = data.vsphere_resource_pool.pool.id
  guest_id         = data.vsphere_virtual_machine.template.guest_id
  scsi_type        = data.vsphere_virtual_machine.template.scsi_type

  # Set network parameters
  network_interface {
    network_id = data.vsphere_network.network.id
  }

  # Use a predefined vmware template as main disk
  disk {
    label = "cabo0cuomcentos7_monkey.vmdk"
    size = "100"
  }

  # create the VM from a template
  clone {
    template_uuid = data.vsphere_virtual_machine.template.id
  }

# Execute script on remote vm after this creation
  provisioner "remote-exec" {
    script = "script.sh"
    connection {
      type     = "ssh"
      user     = "root"
      password = "[password]"
      host     = "192.168.102.42"
    }
  }
}

Shell Script

Finally the shell script simply has commands used to reconfigure the VM.

#!/bin/bash

hostnamectl set-hostname cabo0cuomkube1.qa.internal.pri

nmcli con mod ens192 ipv4.method manual ipv4.addresses \
  192.168.102.160 ipv4.gateway 192.168.102.254 ipv4.dns \
  192.168.1.254 ipv4.dns-search 'qa.internal.pri schelin.org'

shutdown -t 0 now -r

Due to the shutdown command, the script exits with a non-zero value so an error will be reported.

And that’s it. The VM should be created and accessible via the new IP address after it reboots.

References

Posted in Computers, Terraform | Tagged , | 2 Comments

Kubernetes Index

Overview

I keep trying to find documents and videos that do an installation of Kubernetes along with associated configurations and including ArgoCD for an environment like mine. This index shows the final Kubernetes installation (list of all pods) plus an index of articles I write (with references) to document the support and build process.

Note that I do have other Kubernetes type articles here, this is a start to finish build series of articles. While they’re all dated November 20th, that’s to let you go from one to the next by clicking on ‘next article’ at the bottom. I did include a couple of prior articles and didn’t change the date so you’ll need to return to the index to continue for them.

My environment consists of a vCenter cluster of three Dell R720XD servers. Virtual machines are created with Terraform from templates which already have the OS (CentOS 7 or Rocky Linux 8) and some basic configurations such as my personal account and a service account that specifically permits NOPASSWD access for the Ansible playbooks.

Note that there’s a problem with Podman (docker replacement for Red Hat based systems) and my container network software (CNI) so I went with CentOS 7 for now. With automation, once it’s working I can rebuild everything quickly šŸ™‚

System descriptions will be provided in the individual documents.

Repositories

These are my github sites with my terraform, ansible playbooks, and gitops yaml files used to build and configure Kubernetes.

I will note that the ones on github are copies of my internal gitlab server. A couple of the repos are connected to github to push updates when I update a master branch but most need me to switch to the main branch, git pull, and then git push github to get the updates publicly available, meaning I have to remember to do so.

I do have a separate git repo for configurations and ansible-vault files which are applied after the main software is deployed. For this project, the Llamas website doesn’t have a separate configuration but other repos such as the Ansible and GitOps repos have separate configuration repos.

In this scenario, you’ll use the Terraform scripts to build the Virtual Machines. When all are up and running, you can use Ansible to configure the new VMs.

Next up would be to further initialize the new VMs. The Process is to run the newserver/initialize Ansible playbook and which install more common configurations but also configurations subject to change, so they aren’t in the template.

Finally, I have a suite of scripts used to manage the servers. Extract information for the inventory for example. Run the utility/unixsuite playbook to install the scripts.

When the core playbooks are done, follow the articles to prepare, create, and configure Kubernetes.

Article List

This is a list of the articles for creating the on-prem infrastructure, building a cluster, installing ArgoCD, and ultimately installing the Llamas band website. I also want to install AWX which is the upstream of Red Hat’s Ansible Automation Hub (formerly Ansible Tower). If no link below, then Iā€™m still working on the installation or page.

Example Output

NAMESPACE            NAME                                                      READY   STATUS    RESTARTS      AGE
argocd               argocd-application-controller-0                           1/1     Running   0             11d
argocd               argocd-applicationset-controller-6c64d9f677-rrd5l         1/1     Running   0             11d
argocd               argocd-dex-server-57c6485f6f-5gswb                        1/1     Running   0             11d
argocd               argocd-notifications-controller-6cc686cd6f-lw8q4          1/1     Running   0             11d
argocd               argocd-redis-679bb4b7bd-kxgzg                             1/1     Running   0             11d
argocd               argocd-repo-server-645f954984-8zb8x                       1/1     Running   0             11d
argocd               argocd-server-889549b4-ggx97                              1/1     Running   0             11h
calico-apiserver     calico-apiserver-6fd86fcb4b-77tld                         1/1     Running   1 (11d ago)   11d
calico-apiserver     calico-apiserver-6fd86fcb4b-p6bzc                         1/1     Running   3 (11d ago)   11d
calico-system        calico-kube-controllers-dd6c88556-zhg6b                   1/1     Running   0             12d
calico-system        calico-node-66fkb                                         1/1     Running   0             12d
calico-system        calico-node-99qs2                                         1/1     Running   0             12d
calico-system        calico-node-dtzgf                                         1/1     Running   0             12d
calico-system        calico-node-ksjpr                                         1/1     Running   1             12d
calico-system        calico-node-lhhrl                                         1/1     Running   0             12d
calico-system        calico-node-w8nmx                                         1/1     Running   0             12d
calico-system        calico-typha-69f9d4d5b4-vp7mp                             1/1     Running   0             12d
calico-system        calico-typha-69f9d4d5b4-xv5tg                             1/1     Running   0             12d
calico-system        calico-typha-69f9d4d5b4-z65kn                             1/1     Running   0             12d
calico-system        csi-node-driver-5czsp                                     2/2     Running   2             12d
calico-system        csi-node-driver-ch746                                     2/2     Running   0             12d
calico-system        csi-node-driver-gg9f4                                     2/2     Running   0             12d
calico-system        csi-node-driver-kwbwp                                     2/2     Running   0             12d
calico-system        csi-node-driver-nh564                                     2/2     Running   0             12d
calico-system        csi-node-driver-rvfd4                                     2/2     Running   0             12d
default              echoserver-6f54957b4d-6qc8n                               1/1     Running   0             2d20h
default              my-nginx-66689dbf87-jkgjk                                 1/1     Running   0             3d14h
ingress-controller   haproxy-ingress-7bc69b8cc-wq2hc                           1/1     Running   0             9d
kube-system          coredns-9b6bfc8df-fh4kr                                   1/1     Running   1             12d
kube-system          coredns-9b6bfc8df-sn8dj                                   1/1     Running   1             12d
kube-system          etcd-bldr0cuomkube1.dev.internal.pri                      1/1     Running   1             11d
kube-system          etcd-bldr0cuomkube2.dev.internal.pri                      1/1     Running   0             11d
kube-system          etcd-bldr0cuomkube3.dev.internal.pri                      1/1     Running   0             11d
kube-system          kube-apiserver-bldr0cuomkube1.dev.internal.pri            1/1     Running   1             11d
kube-system          kube-apiserver-bldr0cuomkube2.dev.internal.pri            1/1     Running   0             11d
kube-system          kube-apiserver-bldr0cuomkube3.dev.internal.pri            1/1     Running   0             11d
kube-system          kube-controller-manager-bldr0cuomkube1.dev.internal.pri   1/1     Running   1             11d
kube-system          kube-controller-manager-bldr0cuomkube2.dev.internal.pri   1/1     Running   0             11d
kube-system          kube-controller-manager-bldr0cuomkube3.dev.internal.pri   1/1     Running   0             11d
kube-system          kube-proxy-4mr9m                                          1/1     Running   1             12d
kube-system          kube-proxy-gqrd6                                          1/1     Running   0             12d
kube-system          kube-proxy-kg899                                          1/1     Running   0             12d
kube-system          kube-proxy-nwsw8                                          1/1     Running   0             12d
kube-system          kube-proxy-rm7lg                                          1/1     Running   0             12d
kube-system          kube-proxy-zj4sg                                          1/1     Running   0             12d
kube-system          kube-scheduler-bldr0cuomkube1.dev.internal.pri            1/1     Running   1             11d
kube-system          kube-scheduler-bldr0cuomkube2.dev.internal.pri            1/1     Running   0             11d
kube-system          kube-scheduler-bldr0cuomkube3.dev.internal.pri            1/1     Running   0             11d
kube-system          metrics-server-5597479f8d-lwwbg                           1/1     Running   0             11d
llamas               llamas-6b44d5cd5d-9v52b                                   1/1     Running   0             43m
llamas               llamas-6b44d5cd5d-cpd2h                                   1/1     Running   0             42m
llamas               llamas-6b44d5cd5d-dw7pq                                   1/1     Running   0             42m
tigera-operator      tigera-operator-7d89d9444-4scfq                           1/1     Running   3 (11d ago)   12d
Posted in Computers, Kubernetes | Tagged | Leave a comment

Python and Postgresql Index

Overview

This is the index article that will provide links to individual articles on the conversion of one of my projects from a php/mysql(mariadb) to python and postgresql. There shouldn’t be too many however like my Kubernetes and ArgoCD series of articles, I’ll be doing this from the server build using terraform, the installation of postgresql 15 (current as of this writing), and the various tasks I need to perform to migrate my Status Management app.

Repositories

I have a github site that contains a bunch of my projects. For this one, I’ll be adding the server build to the existing terraform repository and will use the server configuration repository to configure the new server. Once done, I’ll create a new repository for the Status Management application itself and as I progress, I’ll update it until it works with Python and Postgresql.

Goals

What am I trying to achieve here? Well, I want to learn Python more than the quick dabbling I do. I’m more comfortable with perl and shell scripts but more and more jobs require Python knowledge. I could get up to speed quickly enough I think if I was put in such a position but it’d be nice to be reasonably familiar first. Secondly I learn best when I have a project to work on. Going through the tutorials and building yet another hello-world application in python is boring and most of the books start off with teaching everything from the beginning. I don’t need to know how to code. I’ve been writing programs since the early 80’s.

And the same with Postgresql. I’m pretty familiar in general with MySQL and of course MariaDB but Postgresql is another one I should have some familiarity with. This project will show me how to migrate my existing data from mysql over to postgresql and of course how to properly craft statements in python to access the data and present it as a web page. It’ll be fun as always as I enjoy such thing immensely.

Index Listing

Posted in ansible, Computers, Terraform | Tagged , , , , | Leave a comment