Installing OKD4

Overview

What I’m trying to accomplish here is to set up a homelab OpenShift type environment. I’m installing OKD4.7 in this case but OKD is an upstream of the Red Hat OpenShift so should be just fine for my use-case.

This installation is focused on my own requirements, as such I don’t have near the detail the Red Hat documentation has. This has been extremely helpful in understanding the Red Hat installation process.

I intend on documenting what I currently have, environment wise, and how to configure it appropriately. If we don’t have a similar environment, I highly recommend the article in the References section below. It greatly helped me understand the installation process.

Environment

I am running this project initially on VMware but will also be working on a second KVM server.
I currently run a pfSense package to manage my Firewall plus the DNS and DHCP services which are used for this project.
I’m using HAProxy to provide Load Balancing for the API-Server in the cluster. This is running on CentOS 8 Server.
I’m using a Service box to provide the necessary web server and NFS storage services. This is running on CentOS 8.
For the bootstrap node and cluster elements, I’ll be using a Fedora CoreOS image.

Virtual Machines

The servers are built in my Development environment (192.168.101.0/24). The assumption for this build is you already have the Gold Images for the non CoreOS nodes. Building the Operating Systems is beyond the scope of this article. Note that the function of the two non CoreOS nodes is pretty minor. The CPU, RAM, and Storage settings for the cluster are the recommendations from Red Hat.

Some general information to make the table a bit smaller.

Bootstrap/Control Plane – 4 CPU, 16 Gigs of RAM, 120 Gigs of storage.
Compute – 2 CPU, 8 Gigs of RAM, 120 Gigs of storage.
Operating System is Fedora CoreOS for the above three server types.
For the service and haproxy node, CentOS 8, 2 CPU, 4 Gigs RAM, and 30 Gigs of storage.
For the Machine Names, all are prefixed with bldr0cuomokd. So boot is bldr0cuomokdboot.

Machine Type	Machine Name	IP Address	MAC
Bootstrap Server	boot	192.168.101.107	00:50:56:b3:a3:7e
Control Plane	mst1	192.168.101.101	00:50:56:b3:02:f4
Control Plane	mst2	192.168.101.102	00:50:56:b3:2a:62
Control Plane	mst3	192.168.101.103	00:50:56:b3:1a:11
Compute	wrk1	192.168.101.104	00:50:56:b3:42:30
Compute	wrk2	192.168.101.105	00:50:56:b3:8b:67
Compute	wrk3	192.168.101.106	00:50:56:b3:d8:81
Compute	wrk4	192.168.101.109	00:50:56:91:93.c3
Compute	wrk5	192.168.101.110	00:50:56:91:a7:21
Compute	wrk6	192.168.101.111	00:50:56:91:2f:be
Compute	wrk7	192.168.101.112	00:50:56:91:7e:7a
NFS/Web	svc1	192.168.101.100	00:50:56:b3:38:31
HAProxy	hap1	192.168.101.108	00:50:56:b3:e0:0a

Note that worker nodes 4 and 5 were added after the cluster was built but prior to the 24 hour period expiring. Worker nodes 6 and 7 were added after the 24 hour window expired.

DNS

The above Machine Names have been entered into DNS. In addition, the following aliases need to be added to DNS.

The cluster domain name will be [site].internal.pri. For example, bldr0-0.internal.pri.

api – The api server. Aliased to the haproxy hostname.
api-int – The internal api server alias. Aliased to the haproxy hostname.
console-openshift-console – Console access to the cluster. Aliased to the haproxy hostname.
oauth-openshift – The authentication method. Aliased to the haproxy hostname.
etcd-0 – The first etcd node. Aliased to the first master node.
etcd-1 – The second etcd node. Aliased to the second master node.
etcd-2 – The third etcd node. Aliased to the third master node.
bootstrap – The bootstraper server. Aliased to the bootstrap server.
address=/okd.internal.pri/192.168.101.108 – This needs to be added to the Options in the DNS Forwarder section of pfSense. This lets the ingress router work.

Note that I used okd.internal.pri as I didn’t quite understand the naming convention for the domain. If I were to create clusters in all four of my environments, I’d be using bldr0-0, cabo0-0, tato0-0, and lnmt1-2 which matches my existing Kubernetes cluster naming convention.

DHCP

The components will start and will need to get an IP address from the DHCP server. The IP will need to be external to the DHCP server range and will require the MAC address in order for discovery to work.

Note that I used the wrong MAC for the last two worker nodes and the IPs came up in the DHCP range. So ensure you have the correct MAC address (apparently VMware uses 00:50:56:b3:00:00 and 00:50:56:91:00:00, note the b3 and 91 differences). Just a note of caution.

Fedora CoreOS

We’ll be downloading two images. You’ll need the ISO image so you can boot to it to start the installation and you’ll need the raw image for the actual component installation.

On Fedora CoreOS Download Site click on the Bare Metal &Virtualized tab and then down load the ISO image plus the signature and sha256 keys. In addition, download the raw image plus the signature and sha256 keys.

Prepare Nodes

Again, I’m on a VMware environment. To prepare for the cluster build, create 7 blank VMs using the configuration as noted in the Virtual Machines section. In the VM Options tab under Boot Options, check the Force BIOS Setup checkbox. This lets you attach the CoreOS image to the server so you can install CoreOS.

Load Balancer

You need to install haproxy on the HAProxy node and configure it to be a Load Balancer for the cluster.

# yum install -y haproxy

For the global and defaults sections, increase the maxconn line to 20000.

Delete everything after the end of the defaults section.

For statistics, add:

listen stats
    bind :9000
    mode http
    stats enable
    stats uri /

For the API Server (port 6443), add the following front end and back end sections:

frontend okd4_k8s_api_fe
    bind :6443
    default_backend okd4_k8s_api_be
    mode tcp
    option tcplog

backend okd4_k8s_api_be
    balance source
    mode tcp
    server      bldr0cuomokdboot 192.168.101.107:6443 check
    server      bldr0cuomokdmst1 192.168.101.101:6443 check
    server      bldr0cuomokdmst2 192.168.101.102:6443 check
    server      bldr0cuomokdmst3 192.168.101.103:6443 check

When the servers are being built, the following section will be needed:

frontend okd4_machine_config_server_fe
    bind :22623
    default_backend okd4_machine_config_server_be
    mode tcp
    option tcplog

backend okd4_machine_config_server_be
    balance source
    mode tcp
    server      bldr0cuomokdboot 192.168.101.107:22623 check
    server      bldr0cuomokdmst1 192.168.101.101:22623 check
    server      bldr0cuomokdmst2 192.168.101.102:22623 check
    server      bldr0cuomokdmst3 192.168.101.103:22623 check

For port 80 traffic, if any, the following section is needed:

frontend okd4_http_ingress_traffic_fe
    bind :80
    default_backend okd4_http_ingress_traffic_be
    mode tcp
    option tcplog

backend okd4_http_ingress_traffic_be
    balance source
    mode tcp
    server      bldr0cuomokdwrk1 192.168.101.104:80 check
    server      bldr0cuomokdwrk2 192.168.101.105:80 check
    server      bldr0cuomokdwrk3 192.168.101.106:80 check

And for port 443 traffic, the following section is needed:

frontend okd4_https_ingress_traffic_fe
    bind *:443
    default_backend okd4_https_ingress_traffic_be
    mode tcp
    option tcplog

backend okd4_https_ingress_traffic_be
    balance source
    mode tcp
    server      bldr0cuomokdwrk1 192.168.101.104:443 check
    server      bldr0cuomokdwrk2 192.168.101.105:443 check
    server      bldr0cuomokdwrk3 192.168.101.106:443 check

Once edited, enable and start haproxy.

systemctl enable haproxy
systemctl start haproxy

Web Server

In order for the OpenShift nodes to retrieve the image and ignition files, you’ll need to install a web server on the Service node. Also configure the node to listen on port 8080.

dnf install -y httpd
sed -i 's/Listen 80/Listen 8080/' /etc/httpd/conf/httpd.conf

Once done, enable and start the server.

systemctl enable httpd
systemctl start httpd

OpenShift Binaries

You need to have the oc binary plus the openshift-installer on your Service Node.

OpenShift Downloads

While not necessary for the installation, you should retrieve the kubectl binary as well.

OpenShift install-config.yaml File

The following file is used for building the cluster.

apiVersion: v1
baseDomain: [domain]                     # update the domain info
metadata:
  name: [sub-domain]                     # update the sub-domain info

compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0

controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14                   # Verify your network
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16                         # Verify the service network.

platform:
  none: {}

fips: false

pullSecret: '{"auths": ...}'               # add the pullSecret
sshKey: 'ssh-ed25519 AAAA...'              # add your ssh public key

OpenShift Pull Secret

In the configuration file, update the pullSecret from the download below.

OpenShift Pull Secret Download

SSH Key

When building the compute nodes, in order to access the servers, you’ll need to provide an SSH Key from the account that will need to access the servers. If you don’t already have a public key ready, generate one.

ssh-keygen -t rsa

To log in to the servers, log in as core@[servername].

Perform The Installation

Create an install_dir directory and make a copy of the install-config.yaml file into this directory. Then run the installer.

openshift-install create manifests --dir=install_dir
INFO Consuming Install Config from target directory
INFO Manifests created in: install_dir/manifests and install_dir/openshift

Rename the install_dir to the manifests directory.

Run Installer Again

This time you’re creating the ignition-configs. Note that the ignition files contain a certificate that expires in 24 hours. If you need to rebuild the cluster or add nodes after the 24 hours has expired, you’ll need a new certificate. See the Adding Workers section at the end of this document.

$ openshift-install create ignition-configs --dir=install_dir/
INFO Consuming Install Config from target directory
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
INFO Ignition-Configs created in: install_dir and install_dir/auth

Install Configuration

In the /var/www/html directory, create the okd4 directory, copy the new configuration files, copy the fedora CoreOS files and rename them (makes it easier to type on the console), and set the permissions.

mkdir /var/www/html/okd4
cp -r manifests/* /var/www/html/okd4/
cp -r install_dir/* /var/www/html/okd4/
mv fedora-coreos-33.20210426.3.0-metal.x86_64.raw.xz /var/www/html/okd4/fcos.raw.xz
mv fedora-coreos-33.20210426.3.0-metal.x86_64.raw.xz.sig /var/www/html/okd4/fcos.raw.xz.sig 
chown -R apache: /var/www/html
chmod -R 755 /var/www/html

Build The bootstrap Server

You’ll need to boot into the Fedora CoreOS Live image, press tab to jump to the kernel line, and enter in the following information:

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101.100:8080/okd4/bootstrap.ign

Now The Control Plane

Start the master servers and after hitting tab, enter the following lines.

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101.100:8080/okd4/master.ign

It can take 10 or 15 minutes for the masters to register.

Now The Compute Nodes

Same here, start the workers and enter the following lines

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101/100:8080/okd4/worker.ign

It can take quite some time for the workers to register, up to 30 minutes.

Update HAProxy

When all nodes have been bootstrapped, you need to remove the bootstrap entry in the HAProxy configuration. Just comment it out and restart haproxy.

Certificate Signing Requests

Once the workers have been accepted into the cluster, you’ll need to approve any CSRs so they can start loading up pods. You’ll need to get the jq tool so you can approve a bunch of CSRs in a bundle.

wget -O jq https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64
chmod +x jq
sudo mv jq /usr/local/bin/
jq --version

Once you have the jq file, check for pending CSRs. At the start, there will be a ton. For new nodes, it can take several minutes as the new nodes tend to upgrade the CoreOS OS during the bootstrap process. Once things settle out, new CSRs should be pending.

oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-6n8c6   91s     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-85mmn   31m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-8tn26   6m38s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-lgxlv   16m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

Once you see the pending CSRs, then run this command to approve all pending CSRs

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
certificatesigningrequest.certificates.k8s.io/csr-6n8c6 approved
certificatesigningrequest.certificates.k8s.io/csr-85mmn approved
certificatesigningrequest.certificates.k8s.io/csr-8tn26 approved
certificatesigningrequest.certificates.k8s.io/csr-lgxlv approved

For some things such as new worker nodes, it will take two passes to approve all the CSRs.

Console Access

Finally check the status of the clusteroperators, specifically the console. Once it’s up and running, you can get your password from the install_dir/auth/kubeadmin_password file. Log in to the console as kubeadmin and you’re in!

$ oc get clusteroperators
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.okd-2021-04-24-103438   True        False         False      13h
baremetal                                  4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
cloud-credential                           4.7.0-0.okd-2021-04-24-103438   True        False         False      5d16h
cluster-autoscaler                         4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
config-operator                            4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
console                                    4.7.0-0.okd-2021-04-24-103438   True        False         False      4d23h
csi-snapshot-controller                    4.7.0-0.okd-2021-04-24-103438   True        False         False      4d23h
dns                                        4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
etcd                                       4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
image-registry                             4.7.0-0.okd-2021-04-24-103438   True        False         True       2d17h
ingress                                    4.7.0-0.okd-2021-04-24-103438   True        False         True       5d14h
insights                                   4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
kube-apiserver                             4.7.0-0.okd-2021-04-24-103438   True        False         False      5d13h
kube-controller-manager                    4.7.0-0.okd-2021-04-24-103438   True        False         False      5d13h
kube-scheduler                             4.7.0-0.okd-2021-04-24-103438   True        False         False      5d14h
kube-storage-version-migrator              4.7.0-0.okd-2021-04-24-103438   True        False         False      13h
machine-api                                4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
machine-approver                           4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
machine-config                             4.7.0-0.okd-2021-04-24-103438   True        False         False      46h
marketplace                                4.7.0-0.okd-2021-04-24-103438   True        False         False      4d23h
monitoring                                 4.7.0-0.okd-2021-04-24-103438   True        False         False      45h
network                                    4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
node-tuning                                4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
openshift-apiserver                        4.7.0-0.okd-2021-04-24-103438   True        False         False      13h
openshift-controller-manager               4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
openshift-samples                          4.7.0-0.okd-2021-04-24-103438   True        False         False      5d13h
operator-lifecycle-manager                 4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
operator-lifecycle-manager-catalog         4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
operator-lifecycle-manager-packageserver   4.7.0-0.okd-2021-04-24-103438   True        False         False      13h

Persistent Storage

Next up is to create some persistent storage.

NFS Server

In order to set up a persistent volume, you’ll need to configure an NFS mount on an accessible server.

On the Service server:

dnf install -y nfs-utils
systemctl enable nfs-server rpcbind
systemctl start nfs-server rpcbind
mkdir -p /var/nfsshare/registry
chmod -R 777 /var/nfsshare
chown -R nobody:nobody /var/nfsshare

Then set up the share.

echo '/var/nfsshare 192.168.101.0/24(rw,sync,no_root_squash,no_all_squash,no_wdelay)' > /etc/exports

Assuming selinux and a firewall is running, you’ll need to make the following changes. Ignore the setsebool and firewall-cmd lines if one or both are not configured.

sudo setsebool -P nfs_export_all_rw 1
sudo systemctl restart nfs-server
sudo firewall-cmd --permanent --zone=public --add-service mountd
sudo firewall-cmd --permanent --zone=public --add-service rpc-bind
sudo firewall-cmd --permanent --zone=public --add-service nfs
sudo firewall-cmd --reload

Image Registry

Apply the following registry file to the cluster. Make sure the server IP is accurate.

$ cat registry_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: registry-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /var/nfsshare/registry
    server: 192.168.101.100

And

$ oc apply -f registry_pv.yaml

By default, persistent storage isn’t configured. As such the operator sets it up as ‘Removed’. You’ll need to edit the configuration to tell OpenShift that persistent storage is available.

$ oc edit configs.imageregistry.operator.openshift.io

Update the following settings.

managementState: Removed

  storage: {}

Change to:

managementState: Managed

  storage:
    pvc:
      claim:

Create Accounts

In order to create accounts for users to access the cluster, you use the htpasswd program and submit it to the cluster. You’ll also need to create a rolebinding or clusterrolebinding to provide permissions.

htpasswd

Simply create a file that contains your username and password for accessing the OKD cluster.

-c = Create New File. This does overwrite an existing file so use caution.
-B = use most secure algorithm.
-b = Accept the username and userpassword on the command line.

Create Secret

Next create the htpass-secret secret.

oc create secret generic htpass-secret --from-file=htpasswd=htpasswd -n openshift-config

Provider

You’ll need to apply a provider for the htpasswd supplied credentials. Use the following file. This says the OAuth object will use the htpass-secret for credentials.

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd_provider
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret

Bindings

If you log in as yourself to the console now, you can access but have no authority to see or do anything. You’ll need to bind your account to a RBAC set of permissions.

oc adm policy add-cluster-role-to-user cluster-admin [username]

And now I have access.

Adding Workers

When the ignition files were initially created, they have a certificate that expires 24 hours after it’s been created. Within that window, you can add multiple worker nodes. After the certificate expires, you’ll need to get a new certificate from the cluster. This certificate expires in 10 years and you can add multiple worker nodes from then on.

Things to remember:

Add the bootstrap servers back into the haproxy server configuration and restart.
Also add the new workers to the worker section in the haproxy configuration.
Create the DNS entries.
Update the DHCP configuration for the new Workers.

Within The Window

You can simply add the new Worker Nodes by following the above instructions for creating a blank VM, starting it with the Fedora CoreOS Live ISO, and entering in the kernel parameters to bootstrap the Workers. The worker.ign files are valid for any number of Worker Nodes.

Window Has Expired

You need to extract the new certificate from the cluster and add it to the worker.ign file. Make sure you back up the current worker.ign file, just in case.

Extract certificate:

openssl s_client -connect api-int.okd.internal.pri:22623 -showcerts </dev/null 2>/dev/null|openssl x509 -outform PEM > api-int.pem

The creates the api-int.pem file. This file now needs to be converted into a base64 string. The –wrap=0 parameter turns the block into a single line.

base64 --wrap=0 ./api-int.pem 1> ./api.int.base64

Now back up the worker.ign file.

cp worker.ign worker.ign.backup

And replace the current certificate with the new one located in the api.int.base64 file.

{"ignition":{"config":{"merge":[{"source":"https://api-int.okd.internal.pri:22623/config/worker"}]},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,[ADD CERTIFICATE HERE"}]}},"version":"3.2.0"}}

And finally follow the process above to add a new worker to the cluster.

References

I used the following link as it was more focused on what I’m running vs the Red Hat site which has a ton of options to consider. I’ve built the cluster three times now and with the third build, I rewrote this article as instruction for my specific use-case. If you have a similar environment, this article might be helpful. The following link lets you create a similar environment but firewalled away from your central homelab environment. Ultimately it made me a bit more skilled and able to better understand the more extensive Red Hat docs.

Guide Installing an OKD4.5 Cluster

The following links are the Red Hat docs for a Bare Metal deployment and copies of the links in the above article so you’re not hunting for the pullSecret link.