Convert From CentOS 8 to CentOS Streams

Overview

This article provides brief instructions on how to convert and upgrade a CentOS 8 system.

Background

In December 2021, Red Hat retired the CentOS 8 AppSteam BaseOS, Extras, and the other CentOS mirrors in favor of going to a Streams model. In this model, CentOS becomes part of the path from Fedora to Red Hat Enterprise Linux instead of a redistribution of Red Hat Enterprise Linux. There are alternatives if we want to continue with the same model such as Arch Linux and Rocky Linux.

Conversion Process

It’s a pretty simple process to make the conversion. If the conversion is done after December 2021, you’ll need to modify the Extras repo. Otherwise you can simply run the commands that follow this quick edit.

Modify Extras

In the /etc/yum.repos.d directory, edit the CentOS-Linux-Extras.repo file, comment out the mirrorlist entry, uncomment the baseurl entry, and point the url to one of the mirror sites. In my case, since it’s a small file that needs to be added, I changed to mirror.clarkson.edu but any mirror will do.

cd /etc/yum.repos.d
sed -i "s/enabled=1/enabled=0/g" *
sed -i "s/enabled=0/enabled=1/g" CentOS-Extras.repo
sed -i "s/^mirrorlist/#mirrorlist/g" CentOS-Extras.repo
sed -i "s/^#baseurl/baseurl/g" CentOS-Extras.repo
sed -i "s/mirror.centos.org/mirror.clarkson.edu/g" CentOS-Extras.repo

Install The Stream

Next, install the centos-release-stream rpm.

dnf install centos-release-stream -y

Swap Repositories

Swap from the Linux to the Stream repositories

dnf swap centos-{linux,stream}-repos -y

Sync Distributions

This section will update or downgrade as appropriate, the installed packages.

dnf distro-sync -y

References

Posted in Computers | Tagged , | Leave a comment

Increase Ingress Routers

A problem I found with my OKD4 cluster is the HAProxy statistics were claiming 5 of my 7 worker nodes were red, aka down. After some searching, I found HAProxy is reporting via the ingress router pods. Further checking of the cluster showed only two ingress routers were running. You would think this should be a daemonset so the ingress router would be available on every worker. Two seems sufficient however, for example I have three physical hosts that are running my OKD4 cluster. If I have 7 workers spread across the three hosts and the two router pods are on one host and the host fails, then the application that uses the ingress router will need to wait until OKD4 realizes they’re gone and spins up two more ingress router pods.

At first I figured it was the deployment that needed to be updated. However updating the deployment replicas from 2 to 7 failed. The number of replicas reverted back to 2.

After some hunting, I found the solution. You actually have to patch the ingress operator not the deployment.

oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 7}}' --type=merge 

And success. Now there are 7 ingress pods running on my cluster.

 openshift-ingress                                  router-default-6b8b455c59-56gk5                          1/1     Running     0          16d
openshift-ingress router-default-6b8b455c59-6z678 1/1 Running 0 16d
openshift-ingress router-default-6b8b455c59-dhrgx 1/1 Running 0 16d
openshift-ingress router-default-6b8b455c59-kgs5n 1/1 Running 0 16d
openshift-ingress router-default-6b8b455c59-ngvdx 1/1 Running 2 16d
openshift-ingress router-default-6b8b455c59-t8zmd 1/1 Running 0 16d
openshift-ingress router-default-6b8b455c59-wbh2z 1/1 Running 0 16d

References

  • https://access.redhat.com/solutions/5393521 – You need a Red Hat account to access this page.
  • https://docs.openshift.com/container-platform/4.9/networking/ingress-operator.html#nw-ingress-controller-configuration_configuring-ingress – Openshift Documentation
Posted in Computers, OpenShift | Tagged , , , | Leave a comment

Migrating KVM Guests

Overview

This article describes the process in migrating a Virtual Machine from one physical host to another.

Background

There are two methods of how the virtual machines were built on the current hosts. The old way is to create a LVM slice in the disk and lay a base image over the top of the using dd. The second process is more common where the images are created and stored as a file on the host.

Guest Shutdown

For any of the non Openshift (OCP) systems, you have a couple of methods of shutting down the systems. You can log in to the server and shut it down.

ssh tato0cuomifnag02
sudo su -
shutdown -t 0 now -h 

Or use virsh console from the underlying host to log in and shut it down. (Reminder the _domain and _pxe are assignments created by the new automation process):

virsh console tato0cuomifnag02
login: root
password:
shutdown -t 0 now -h  

Openshift/Kubernetes

An interesting difference between a Kubernetes Control Node and an OCP Control Node are the extra pods used to manage the OCP cluster. The oauth pod, registry pods, console pods, and others for example. This means that while a drain isn’t necessary on a Kubernetes Control Node, for an OCP Control Node, you should drain so any control pod such as oauth will continue to be available to the cluster.

This is a concern though as if a control node fails for whatever reason, the cluster may be unavailable until replacement pods are created. OCP should be aware of the loss of an important pod like oauth and start it up on a different master. I suspect it would occur eventually.

In any case, evict the control and worker node from the cluster before migrating it.

$ oc adm drain bldr0cuomocpwrk02.dev.internal.pri --delete-emptydir-data --ignore-daemonsets --force
node/bldr0cuomocpwrk02.dev.internal.pri evicted
WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-marketplace/redhat-operators-8kqpc; ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-w5r84, openshift-dns/dns-default-th2ql, openshift-dns/node-resolver-vbw7d, openshift-image-registry/node-ca-j2nrk, openshift-ingress-canary/ingress-canary-d6l42, openshift-machine-config-operator/machine-config-daemon-z5hzf, openshift-monitoring/node-exporter-rqj52, openshift-multus/multus-additional-cni-plugins-h8vcd, openshift-multus/multus-mqg5z, openshift-multus/network-metrics-daemon-npcjh, openshift-network-diagnostics/network-check-target-lflxb, openshift-sdn/sdn-zgqrt
evicting pod openshift-monitoring/thanos-querier-7c8bb4cdbd-n97pv
evicting pod default/llamas-6-p84z2
evicting pod default/inventory-4-szhvw
evicting pod default/photo-manager-4-cqqbc
evicting pod openshift-marketplace/redhat-operators-8kqpc
evicting pod openshift-monitoring/alertmanager-main-1
evicting pod openshift-monitoring/prometheus-adapter-66ff97555b-x92r2
pod/redhat-operators-8kqpc evicted
pod/inventory-4-szhvw evicted
pod/alertmanager-main-1 evicted
pod/llamas-6-p84z2 evicted
pod/photo-manager-4-cqqbc evicted
pod/thanos-querier-7c8bb4cdbd-n97pv evicted
pod/prometheus-adapter-66ff97555b-x92r2 evicted
node/bldr0cuomocpwrk02.dev.internal.pri evicted
$ oc get nodes
NAME                                       STATUS                     ROLES    AGE   VERSION
bldr0cuomocpctl01.dev.internal.pri   Ready                      master   13d   v1.22.3+e790d7f
bldr0cuomocpctl02.dev.internal.pri   Ready                      master   13d   v1.22.3+e790d7f
bldr0cuomocpctl03.dev.internal.pri   Ready                      master   13d   v1.22.3+e790d7f
bldr0cuomocpwrk01.dev.internal.pri   Ready                      worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk02.dev.internal.pri   Ready,SchedulingDisabled   worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk03.dev.internal.pri   Ready                      worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk04.dev.internal.pri   Ready                      worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk05.dev.internal.pri   Ready,SchedulingDisabled   worker   13d   v1.22.3+e790d7f

The delete-emptydir-data option is used when a pod is using the emptyDir storage method. Moving a pod using this method deletes any data in that emptyDir location.

The ignore-daemonsets option is used as if a pod is using daemonsets, it means the pod is running on every node and can’t be removed. You’re just saying that, yes you know there are pods using daemonsets and it’s fine if the node is cordoned.

The force option is used when there are pods that can’t be deleted.

Once evicted, you’ll log into each OCP/K8S server that you will be migrating and shut it down.

ssh tato0cuomocpbld01
sudo su -
cd /home/ocp4
ssh -i id_rsa core@tato0cuomocpctl01
sudo su -
shutdown -t 0 now -h 

Migrate LVM Guests

This process details the process of migrating an LVM built guest.

First identify the guests on the host so you know which one to migrate. For example, the upcoming event where the physical hosts are being moved to a different data center.

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     tato0cuomifnag01               running
 4     tato0cuomifnag02               running

For example, migrating tato0cuomifnag01. You’ll need to know what the path is in order to get the LVM information.

# ls -la /dev/pool2
total 0
drwxr-xr-x.  2 root root  200 Feb  7 01:40 .
drwxr-xr-x. 23 root root 4180 Feb  7 02:07 ..
lrwxrwxrwx.  1 root root    8 Feb  7 01:40 tato0cuomifnag01 -> ../dm-44
lrwxrwxrwx.  1 root root    8 Feb  7 01:06 tato0cuomifnag02 -> ../dm-45

Now you can run lvdisplay to get the size of the image. The value you want is the Current LE value.

# lvdisplay /dev/pool2/tato0cuomifnag02
  --- Logical volume ---
  LV Path                /dev/pool2/tato0cuomifnag02
  LV Name                tato0cuomifnag02
  VG Name                pool2
  LV UUID                MFBxt1-8yFR-EOd4-TVZD-nQlh-RUIu-GweC8c
  LV Write Access        read/write
  LV Creation host, time tato0cuomifnag02, 2018-01-30 15:02:24 -0600
  LV Status              available
  # open                 1
  LV Size                20.00 GiB
  Current LE             5120
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:46

Create a new same sized LVM on the destination server.

lvcreate -l5120 -ntato0cuomifnag02 vg00

Run the following command to migrate the image. Obviously you need to be able to ssh to root on the destination server.

dd if=/dev/pool2/tato0cuomifnag02 | pv | ssh -C root@destination dd of=/dev/vg00/tato0cuomifnag02

The nice thing is the -C compresses the image and it’s an encrypted copy.

Migrate Images

This process defines the process of migrating a file and start it up on the other host.

When you shut down the guest, then per libvirt, the guest is stopped. But you’ll also need to stop the storage pool.

virsh pool-destroy tato0cuomifnag01_pool

Now that both the guest and the storage pool have been stopped, copy the image from the /opt/libvirt_images/tato0cuomifnag01_pool directory to the destination server. Use the /opt/libvirt-images directory as the target as it has sufficient space for larger images such as the katello server.

scp commoninit.iso [yourusername]@nikodemus:/opt/libvirt_images/
scp tato0cuomifnag01_amd64.qcow2 [yourusername]@nikodemus:/opt/libvirt_images/

On the destination server, create the pool directory and move the images into the /opt/libvirt-images/tato0cuomnag01_pool/ directory. You’ll need to set ownership and permissions as well.

mkdir /opt/libvirt_images/tato0cuomifnag01_pool
cd /opt/libvirt_images
mv commoninit.iso tato0cuomifnag01_pool/
mv tato0cuomifnag01_amd64.qcow2 tato0cuomifnag01_pool/
chown -R root:root
find . -type f -exec chown 644 {} \;

Extract Definitions

Once the images have been copied to the destination host, you’ll need to extract the domain and for the guests that are images, the storage desc riptions.

Extract the guest definition.

virsh dumpxml tato0cuomifnag01_domain > tato0cuomifnag01_domain.xml

For the guests that are images (the new automation process), extract the storage pool definition.

virsh pool-dumpxml tato0cuomifnag01_pool > ~/tato0cuomifnag01_pool.xml

Copy Definitions

Once you have the definitions, copy the xml files to the destination server.

scp tato0cuomifnag01.xml [yourusername]@nikodemus:/var/tmp
scp tato0cuomifnag01_pool.xml [yourusername]@nikodemus:/var/tmp

Import Definitions

Log into the destination server and import the domain definition. The LVM based guest may require editing of the xml file in case the source LVM slice is different than the destination LVM slice.

virsh define /var/tmp/tato0cuomifnag01.xml

For the image based guests, import the storage pool definition as well.

virsh pool-define /var/tmp/tato0cuomifnag01_pool.xml

Activate Guests

For the image based guests, activate the storage pool first. The guest won’t start if the storage pool hasn’t been started. Also configure it to automatically start when the underlying host boots.

virsh pool-start tato0cuomifnag01_pool
virsh pool-autostart tato0cuomifnag01_pool

Then start the guest.

virsh start tato0cuomifnag01_domain

Openshift/Kubernetes

Rejoin the migrated node to the cluster.

$ oc adm uncordon bldr0cuomocpwrk02.dev.internal.pri
node/bldr0cuomocpwrk02.dev.internal.pri uncordonedReferences

Then check the cluster status to see that the migrated node is up and Ready.

$ oc get nodes
NAME                                 STATUS  ROLES    AGE   VERSION
bldr0cuomocpctl01.dev.internal.pri   Ready   master   13d   v1.22.3+e790d7f
bldr0cuomocpctl02.dev.internal.pri   Ready   master   13d   v1.22.3+e790d7f
bldr0cuomocpctl03.dev.internal.pri   Ready   master   13d   v1.22.3+e790d7f
bldr0cuomocpwrk01.dev.internal.pri   Ready   worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk02.dev.internal.pri   Ready   worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk03.dev.internal.pri   Ready   worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk04.dev.internal.pri   Ready   worker   13d   v1.22.3+e790d7f
bldr0cuomocpwrk05.dev.internal.pri   Ready   worker   13d   v1.22.3+e790d7f

Cleanup

Finally remove the xml files.

rm /var/tmp/tato0cuomifnag01.xml
rm /var/tmp/tato0cuomifnag01_pool.xml

Recovery

The Recovery process is very similar. In the event the physical host was replaced, we’ll need to migrate all the guests back over to the replacement host.

In order to determine what guests belong on the replaced host, check the installation repositories. Both the terraform and pxeboot repositories are complete installs on all physical hosts for the site. The directory structure is based on the hostname of the physical host. Simply log in to the current hosts, navigate to the repo’s site/hostname directory for the replaced host, and determine which guests need to be migrated back to the replaced host.

Once that’s determined, follow the above process to migrate the guests back to the replaced host.

Removal

After all the guests have been migrated back to the replaced host, you’ll need to remove the guests from the holding physical hosts.

virsh undefine [guest]
virsh pool-undefine [guest]
rm -rf /opt/libvirt_images/[guest]_pool

For LVM based guests, you’ll need to use the lvremove command.

Troubleshooting

Some information that’s helpful during the work.

If you accidentally pool-destroy (stop) the wrong pool, the guest doesn’t stop working. Remember the command simply marks the storage pool as inactive. It doesn’t actually shut down storage. As long as the guest is running, the pool will remain active to the guest. If you stop the guest and try to start it again and the storage pool is inactive, the guest will not start. To restart the storage pool, run pool-start for the storage pool and it’s active again.

References

  • virt-backup.pl – Alternative Python script to migrate LVM images.
  • https://docs.openshift.com/container-platform/4.9/nodes/nodes/nodes-nodes-working.html
Posted in Computers, KVM | Tagged , , , , | Leave a comment

Computers And Me

Over the past 12 months, I’ve deep dived into automation. I’d been investigating this for some time prior to that but this was work related. This involved research into using Terraform to automatically build virtual machines and Ansible to configure the virtual machines. I’ve used Ansible in the past but this again was a deep dive. Due to the method of building an Openshift Container Platform (OCP), I also used tftp and pxe to automatically build an OCP cluster.

As a result, I built 92 virtual machines including three OCP clusters, in 2 hours.

For perspective, a relatively recent project where I built 100 virtual machines using a more manual process, took 18 months.

To be clear, it took 12 months and a ton of experience in building machines manually to get to the point where I could build 92 machines in 120 minutes. But in that time I built the systems over and over again as I tested methods and broke environments. This also means I can now rebuild a system, several systems, or even a complete site in a very short period of time. Minutes instead of days.

I’ve been building systems for over 40 years now. From local area networks, personal gaming systems, systems for my clients, to various flavors of Unix and Linux, to cloud based systems such as Amazon and Google cloud services. I also have quite a few programming projects from back when I started all the way to present day. It’s great fun and keeps me on my toes.

My current home environment is pretty extensive. I use it as a lab where I can try things, break them, and try again. I’m running both a VMware vCenter cluster and a standalone server running Ubuntu to use KVM. Over 100 TB of storage, a TB of memory, and 144 CPU cores. I have several Kubernetes environments consisting of docker servers, docker repositories, Kubernetes clusters, Elastic Stack clusters, and tools like gitlab and jenkins. I’m currently researching some gitops tools such as ArgoCD and Flux. I also have quite a few underlying infrastructure type servers and development servers. Total of about 150 servers.

All this has helped me explore and gain experience in development practices and the current work I’m doing with automation and working with developers has increased my knowledge and skills. I look forward to continuing this path and exploring new technologies.

Posted in About Carl, Computers | Leave a comment

Recognition

I’m a computer geek. I’ve worked as a programmer, local area network installer, Unix Admin, and now a DevOps Engineer. Over the past 40 years, it’s understood that folks that work in the computer industry don’t get a lot of recognition for the work we do.

In many places we can get some sort of peer recognition. A tchotchke like a hat, a t-shirt, or other small company branded toy. I’ve received a few over the years. Some are better than others. I have a nice gym bag and a couple of fleece blankets the cats like to sleep on. Occasionally we even get a nice letter from a customer.

But the larger rewards are reserved for customer facing teams such as sales or customer service. This is understandable. These positions are the face of the company. They’re what customers “see”. Other positions in the company are less visible so less likely, without effort on the manager or employee’s part, to be rewarded for the work they do.

Being paid, being employed is reward enough.

I’ve found that the extra effort for computer professionals to get these company wide awards goes to the folks who spend nights and weekends working on a very visible to management project. And even then there’s a good chance the employee won’t be recognized.

It is expected though and appreciated when it does happen.

At the previous job, rewards were tuned to the recipient. Upper management would touch base with coworkers and even family to see what sorts of things might be a good reward. Some I recall were visits to the Vatican, Olympic level personal trainers for marathons, a new bull, a new pig, partial payment on an RV, and even flight school lessons.

These and others were awesome and inspired. But it made me, as a computer professional, a bit disconnected from the overall company. When the rewards are some cash thing, it can be ignored as it’s the same thing every time. But when it’s personal, it’s a lot more visible and you feel the lack.

Even worse, other things that you might ignore are more in your face. For example during an all hands, all the business units of the company were recognized but Operations isn’t a business unit and was totally ignored. Most of the time again it’s expected and generally not an issue. But in an environment where things are more personal, being ignored is kind of jarring.

So, what’s the point of this? I like the idea of personal presentations of awards. But ignoring the backbone of the company does generate some resentment especially when the only possible recognition is a result of many lost hours to weekends and after hours.

Posted in Computers | Tagged | Leave a comment

Gitlab Runners

Overview

This article provides local configuration details specific to the site. Links to the relevant documentation will also be provided.

Description

The gitlab-runner is a tool that uses the .gitlab-ci.yml file to build, test, and deploy to the target host. Each job is unique to itself but if it fails, all subsequent jobs do not run. A gitlab-runner is similar to a Jenkins Agent. You want to install it on a server different than the main gitlab server so workloads don’t impact access to gitlab itself.

Runner Installation

Installation is pretty easy. After you create the new runner server, you pull and install the runner for the server then register it.

Pull the runner binary

# curl -LJO "https://gitlab-runner-downloads.s3.amazonaws.com/latest/rpm/gitlab-runner_amd64.rpm"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  418M  100  418M    0     0  1583k      0  0:04:30  0:04:30 --:--:-- 1807k

Install the runner.

# rpm -ivh gitlab-runner_amd64.rpm
warning: gitlab-runner_amd64.rpm: Header V4 RSA/SHA512 Signature, key ID 35dfa027: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:gitlab-runner-14.6.0-1           ################################# [100%]
GitLab Runner: creating gitlab-runner...
Home directory skeleton not used
Runtime platform                                    arch=amd64 os=linux pid=11354 revision=5316d4ac version=14.6.0
gitlab-runner: the service is not installed
Runtime platform                                    arch=amd64 os=linux pid=11363 revision=5316d4ac version=14.6.0
gitlab-ci-multi-runner: the service is not installed
Runtime platform                                    arch=amd64 os=linux pid=11387 revision=5316d4ac version=14.6.0
Runtime platform                                    arch=amd64 os=linux pid=11423 revision=5316d4ac version=14.6.0
INFO: Docker installation not found, skipping clear-docker-cache

Then register the runner (this is internal to my homelab so the token being displayed isn’t an issue).

# gitlab-runner register
Runtime platform                                    arch=amd64 os=linux pid=11468 revision=5316d4ac version=14.6.0
Running in system-mode.

Enter the GitLab instance URL (for example, https://gitlab.com/):
http://lnmt1cuomgitlab.internal.pri/
Enter the registration token:
Li7r2znM5yVedatwy7Uy
Enter a description for the runner:
[lnmt1cuomglrunr1.internal.pri]:
Enter tags for the runner (comma-separated):
local
Registering runner... succeeded                     runner=Li7r2znM
Enter an executor: docker, docker-ssh, virtualbox, docker+machine, docker-ssh+machine, kubernetes, custom, parallels, shell, ssh:
shell
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!

Final Configuration

Once it’s registered, you’ll need to create a rsa key pair and copy it to whatever target servers you intend to deploy jobs to. In my example, I have a local server where I can test to make sure things work, and the remote live site. Log in to the two servers to register the host key. I’m using php in this case so the server also needs to have php installed in order to do my minimal lint test of the php scripts.

Note to also get the rsa public keys for any of the artifact servers you pull from. For example I have artifacts on my two dev servers. The means all the gitlab-runner servers that pull from those two dev servers will need their rsa public keys added to the dev servers.

Create Jobs

You’ll need to create a .gitlab-ci.yml file in your repository that contains the steps required to build the project. In this case, I’m using my small Llamas band website for the example but it could be anything.

Here I define the stages I’ll be using for the deployment.

Each job is unique to the other jobs. Any task you want to do such as removing the .git directory, will need to be done in each stage. Using tags, you could point each stage to different runners.

stages:
  - test
  - deploy-local
  - deploy-remote

test-job:
  tags:
    - test
  stage: test
  script:
    - |
      for i in $(find "${CI_PROJECT_DIR}" -type f -name \*.php -print)
      do
        php -l ${i}
      done

deploy-local-job:
  tags:
    - home
  stage: deploy-local
  script:
    - rm -rf "${CI_PROJECT_DIR}"/.git
    - rm -f "${CI_PROJECT_DIR}"/.gitlab-ci.yml
    - /usr/bin/rsync -av --delete --no-perms --no-owner --no-group --omit-dir-times --rsync-path=/usr/bin/rsync "${CI_PROJECT_DIR}"/ svcacct@ndld1cuomtool11:/var/www/html/llamas/

deploy-remote-job:
  tags:
    - remote
  stage: deploy-remote
  script:
    - rm -rf "${CI_PROJECT_DIR}"/.git
    - rm -f "${CI_PROJECT_DIR}"/.gitlab-ci.yml
    - /usr/bin/rsync -av --delete --no-perms --no-owner --no-group --omit-dir-times --rsync-path=/usr/bin/rsync "${CI_PROJECT_DIR}"/ svcacct@remote:/usr/local/httpd/llamas/

Pipelines

When you check in the .gitlab-ci.yml file, a pipeline starts. In the project, on the left side, click on the CI/CD and then Pipelines to see the pipeline progress. Note that there is a CI Lint button where you can validate your .gitlab-ci.yml file.

You can see the failed pipeline, due to incorrect spacing for the test script (I verified in the CI Lint section). After fixing, the pipeline passed.

Clicking on the Passed or Failed button will take you to the Pipeline.

You can see the progress of the pipeline. Each stage can be rerun by clicking on the arrow-circle and you can see how the task worked by clicking on the stage.

This is the test-job stage. Line 2 shows it’s running on the dedicated runner server. It’s a ‘Shell’ executor. It pulls the git repo to the working directory. Then runs the quick php lint test on the three files.

Things to Think About

With each stage being a unique task, we could have a runner that only does testing. It would have all the necessary tools to test projects such as php in this case. You could also have a dedicated runner that has access to the local QA box but no access to any other server. Same with remote access. You create tags for test-runner, local-qa, and remote-live for example. Then the three stages in the above example would have appropriate functions.

References

  • https://docs.gitlab.com/runner/
  • https://docs.gitlab.com/runner/install/linux-manually.html
  • https://docs.gitlab.com/runner/register/index.html#linux
  • https://docs.gitlab.com/ee/ci/yaml/gitlab_ci_yaml.html

Posted in Computers, Git | Tagged , , | Leave a comment

Resize KVM Images

Overview

In order to properly support the environment, one set of images will be retrieved from the Red Hat OpenShift reference site for Debian, Ubuntu, and CentOS. These images will then be modified in order to support the necessary installations. Based on the requirements, an image will be created to provide sufficient space for the deployed product to operate efficiently.

This document will provide instructions on how to make changes to such images in order to prepare them for use.

Preparing Access

The Cloud images don’t have credentials by default. The intention is to use the cloud-init process to inject the account information for the service account, which then permits access to the image. Because the images aren’t configured to use the Linux Volume Manager (LVM), we’ll need to extend the file systems the old fashioned way. The tool to use is guestfish. It permits access to the image and the ability to mount a file system which can then be edited. In this case, we’ll want to either create a password for the root user or copy your credentials from the local system. In addition, in order for root to be able to log in, you may need to edit the /etc/ssh/sshd_config file and edit the PermitRootLogin option to be yes. With those two changes, you can then log in to the image to make any updates. Example session below.

# guestfish -a centos8.qcow2

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

><fs> run
><fs> list-filesystems
/dev/sda1: xfs
><fs> mount /dev/sda1 /
><fs> vi /etc/shadow
><fs> vi /etc/ssh/sshd_config
><fs> quit

Don’t forget to use guestfish to access the image and replace the password with an asterisk.

Extending Images

There are multiple products being deployed and all have different disk space requirements. You’ll use the following command to access the consoles and create the necessary images. In preparation, copy each of the images to a specific disk size based on the requirements. If we have consistencies and the product has no underlying Operating System requirements, keep the changes to a minimum.

The sizes below are based on a reasonable base size of 20 Gigabytes and then a review of the existing environment, both as configured and the current utilization.

  • DNS Server – 20 Gigabytes, Any Operating System
  • FreeSwitch Server – 20 Gigabytes, Debian 10 Operating System
  • NFS Server – 50 Gigabytes, Any Operating System
  • MongoDB Server – 75 Gigabytes, XFS requirement for the WiredTiger Storage Engine mandating using the CentOS Operating System
  • HAProxy Server – 20 Gigabytes, Any Operating System
  • Provisioning Server – 50 Gigabytes, Any Operating System
  • OpenShift Boot Node – 20 Gigabytes, CoreOS Operating System
  • OpenShift Master Node – 100 Gigabytes, CoreOS Operating System
  • OpenShift Worker Node – 100 Gigabytes, CoreOS Operating System

You’ll use the qemu-img command to extend the images as noted above.

# qemu-img resize debian10.qcow2 20G

Accessing a Debian Console

By default, grub on a Debian 9 and 10 Cloud image has console access disabled. This is a security measure for OpenShift to prevent Out of Band (OOB) access to an image. This does mean you have to have the ability to run an X Server on your laptop. Personally I use cygwin for my Windows laptops and XCode for the Mac. Once prepared, bring up a terminal window and run startx. This should bring up the X Server and a graphical terminal console. From there, you’ll need to use Secure Shell and a specific switch in order to access the target server. ssh -Y (target server). You can verify successful access by checking your DISPLAY variable (echo $DISPLAY). If it is set, you should then be able to access the Debian image. Don’t forget to change the image flag to Spice or VNC when opening the console.

Copy the retrieved debian10.qcow2 image into a common location where you’ll make the necessary changes such as /var/lib/libvirt/images/debian10/. Run the following command to bring up a graphical console session. Note that the –graphics flag is spice.

# virt-install \
    --memory 2048 \
    --vcpus 2 \
    --name dbtst \
    --disk /var/lib/libvirt/images/debian10/debian10.qcow2,device=disk \
    --os-type Linux \
    --os-variant debian10 \
    --virt-type kvm \
    --graphics spice \
    --network default \
    --import

Once the terminal is up, edit the /etc/default/grub file and uncomment the GRUB_TERMINAL=console line. Save it and run the update-grub command. Once that is done, you will be able to bring up a text console in the future to troubleshoot any issues. In this case, you will continue the disk space modifications through the graphical console.

Accessing the CentOS and Ubuntu Consoles

For the CentOS and Ubuntu images, copy the centos8.qcow2 and ubuntu18.img image into the image directory, in this example the /var/lib/libvirt/images/centos8|ubuntu18/ directory. You’ll need to give it a unique name when starting it as noted in the examples below. Once in the image, you can make the necessary changes, such as increasing the available disk space, then shut the image down.

For CentOS

# virt-install \
    --memory 2048 \
    --vcpus 2 \
    --name cotst \
    --disk /var/lib/libvirt/images/centos8/centos8.qcow2,device=disk \
    --os-type Linux \
    --os-variant centos8 \
    --virt-type kvm \
    --graphics none \
    --network default \
    --import

And for Ubuntu

# virt-install \
    --memory 2048 \
    --vcpus 2 \
    --name ubtst \
    --disk /var/lib/libvirt/images/ubuntu18/ubuntu18.img,device=disk \
    --os-type Linux \
    --os-variant ubuntu18.04 \
    --virt-type kvm \
    --graphics none \
    --network default \
    --import

Extending Debian EXT4 File System

By default the Debian image is 2 Gigs in size. This process extends the file system as required. Start the console and log in. This is an EXT4 file system so you’ll use the fdisk tools and partprobe and resize2fs to update the partition.

# df -k
Filesystem     1K-blocks   Used Available Use% Mounted on
udev             1014152      0   1014152   0% /dev
tmpfs             204548   2948    201600   2% /run
/dev/vda1        2030416 991160    918068  52% /
tmpfs            1022720      0   1022720   0% /dev/shm
tmpfs               5120      0      5120   0% /run/lock
tmpfs            1022720      0   1022720   0% /sys/fs/cgroup
tmpfs             204544      0    204544   0% /run/user/0

Run fdisk to see that 20 Gigs is available to the system now.

# fdisk -l
Disk /dev/vda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xe5c3b0d8

Device     Boot Start     End Sectors Size Id Type
/dev/vda1  *     2048 4194303 4192256   2G 83 Linux

For an EXT4 file system, you’ll need to delete the partition and add it back in at the full available size.

# fdisk /dev/vda

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/vda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xe5c3b0d8

Device     Boot Start     End Sectors Size Id Type
/dev/vda1  *     2048 4194303 4192256   2G 83 Linux

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-41943039, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-41943039, default 41943039):

Created a new partition 1 of type 'Linux' and of size 20 GiB.
Partition #1 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: n

Command (m for help): p

Disk /dev/vda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xe5c3b0d8

Device     Boot Start      End  Sectors Size Id Type
/dev/vda1        2048 41943039 41940992  20G 83 Linux

Command (m for help): w
The partition table has been altered.
Syncing disks.

Unfortunately, partprobe isn’t part of the Debian installation. Simply install the parted package and partprobe will be installed in /sbin.

# aptitude install parted
The following NEW packages will be installed:
  libparted2{a} parted
0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 473 kB of archives. After unpacking 809 kB will be used.
Do you want to continue? [Y/n/?] y
Get: 1 http://deb.debian.org/debian buster/main amd64 libparted2 amd64 3.2-25 [277 kB]
Get: 2 http://deb.debian.org/debian buster/main amd64 parted amd64 3.2-25 [196 kB]
Fetched 473 kB in 1s (458 kB/s)
Selecting previously unselected package libparted2:amd64.
(Reading database ... 27035 files and directories currently installed.)
Preparing to unpack .../libparted2_3.2-25_amd64.deb ...
Unpacking libparted2:amd64 (3.2-25) ...
Selecting previously unselected package parted.
Preparing to unpack .../parted_3.2-25_amd64.deb ...
Unpacking parted (3.2-25) ...
Setting up libparted2:amd64 (3.2-25) ...
Setting up parted (3.2-25) ...
Processing triggers for libc-bin (2.28-10) ...

Now run partprobe to register the new partition in the kernel.

# partprobe

Finally use resize2fs to extend the filesystem.

# resize2fs /dev/vda1
resize2fs 1.44.5 (15-Dec-2018)
Filesystem at /dev/vda1 is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 3
[ 4014.025845] EXT4-fs (vda1): resizing filesystem from 524032 to 5242624 blocks
[ 4014.172547] EXT4-fs (vda1): resized filesystem to 5242624

And we’re now at 20 Gigs.

# df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
udev             1014152       0   1014152   0% /dev
tmpfs             204548    2948    201600   2% /run
/dev/vda1       20608592 1008764  18723724   6% /
tmpfs            1022720       0   1022720   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            1022720       0   1022720   0% /sys/fs/cgroup
tmpfs             204544       0    204544   0% /run/user/0

Extending CentOS XFS File System

By default, the cloud image for CentOS 8 is 8 Gigs. The file system is XFS and not EXT4 so you’ll use the XFS tools.

# df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
devtmpfs          897776       0    897776   0% /dev
tmpfs             930128       0    930128   0% /dev/shm
tmpfs             930128   16856    913272   2% /run
tmpfs             930128       0    930128   0% /sys/fs/cgroup
/dev/vda1        8181760 1404372   6777388  18% /
tmpfs             186024       0    186024   0% /run/user/0

When running fdisk, you’ll see the current /dev/vda1 partition size of 16,384,000 sectors and the available sectors at 41,943,040 sectors.

# fdisk -l
Disk /dev/vda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xada233c8

Device     Boot Start      End  Sectors  Size Id Type
/dev/vda1  *     2048 16386047 16384000  7.8G 83 Linux

Grow the partition to the available size.

# growpart /dev/vda 1
CHANGED: partition=1 start=2048 old: size=16384000 end=16386047 new: size=41940992 end=41943039

And then extend the file system to use the entire partition.

# xfs_growfs -d /dev/vda1
meta-data=/dev/vda1              isize=512    agcount=4, agsize=512000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2048000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 2048000 to 5242624

When done, the partition is now the new size.

# df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
devtmpfs          897776       0    897776   0% /dev
tmpfs             930128       0    930128   0% /dev/shm
tmpfs             930128   16856    913272   2% /run
tmpfs             930128       0    930128   0% /sys/fs/cgroup
/dev/vda1       20960256 1493912  19466344   8% /
tmpfs             186024       0    186024   0% /run/user/0

Extending Ubuntu GPT File System

The default Ubuntu image is only 2 Gigs in size. You’ll need to use the gdisk command in the console vs the fdisk one as fdisk doesn’t work on GPT partitions. The process is similar though.

# df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
udev             1007580       0   1007580   0% /dev
tmpfs             204072     680    203392   1% /run
/dev/vda1        2058100 1072008    969708  53% /
tmpfs            1020348       0   1020348   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            1020348       0   1020348   0% /sys/fs/cgroup
/dev/vda15        106858    3696    103162   4% /boot/efi
tmpfs             204068       0    204068   0% /run/user/0

In gdisk, you’ll need to delete the existing partition and recreate it to the new size.

# gdisk /dev/vda
GPT fdisk (gdisk) version 1.0.3

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): p
Disk /dev/vda: 41943040 sectors, 20.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): E1A6C9DD-012D-4943-8697-0FE02F412F36
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 41943006
Partitions will be aligned on 2048-sector boundaries
Total free space is 37332958 sectors (17.8 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1          227328         4612062   2.1 GiB     8300
  14            2048           10239   4.0 MiB     EF02
  15           10240          227327   106.0 MiB   EF00

Command (? for help): d
Partition number (1-15): 1

Command (? for help): n
Partition number (1-128, default 1): 1
First sector (34-41943006, default = 227328) or {+-}size{KMGTP}:
Last sector (227328-41943006, default = 41943006) or {+-}size{KMGTP}:
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300):
Changed type of partition to 'Linux filesystem'

Command (? for help): p
Disk /dev/vda: 41943040 sectors, 20.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): E1A6C9DD-012D-4943-8697-0FE02F412F36
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 41943006
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1          227328        41943006   19.9 GiB    8300  Linux filesystem
  14            2048           10239   4.0 MiB     EF02
  15           10240          227327   106.0 MiB   EF00

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/vda.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

You’ll need to refresh the partition table in the kernel by running the partprobe command.

# partprobe

With the partition recognized, we now need to resize the file system to the new partition table.

# resize2fs /dev/vda1
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/vda1 is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 3
The filesystem on /dev/vda1 is now 5214459 (4k) blocks long.

And we’re at 20 Gigs.

# df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
udev             1007580       0   1007580   0% /dev
tmpfs             204072     680    203392   1% /run
/dev/vda1       20145724 1077020  19052320   6% /
tmpfs            1020348       0   1020348   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            1020348       0   1020348   0% /sys/fs/cgroup
/dev/vda15        106858    3696    103162   4% /boot/efi
tmpfs             204068       0    204068   0% /run/user/0
Posted in Computers, KVM | Tagged , , , , , | Leave a comment

Docker Best Practices

Use the Official Docker Image for your Base Image. Download the image from the docker.io site.

Never use the Latest tag for an image. In order for consistency, identify the version you want to use. This also prevents potentially breaking changes.

Use the image that satisfies your requirements. Many time something like Alpine, which is a tiny image, has all the functionality you need for your image. Something like Ubuntu/Debian or CentOS will have a ton of extra, unnecessary tools. Similar to when we build server, only start services that are needed and disable or even remove the services that aren’t used.

Optimize Caching Image Layers. If you check the dockerfile for an image, you can see how what’s been installed. Each command adds a layer to the final container. To optimize, consider how your container is built. Whatever is changed the most often, should be later in the final container. For example, your application code might be best at the end of the dockerfile. Then the rest of the unchanged layers won’t need to be rebuilt as everything after a rebuilt image is also rebuilt.

docker history image:tag

Exclude unnecessary content to reduce the size of the image. Use the .dockerignore file. Ignoring .git, etc.

Remove unnecessary files after the container is built. Use multi-stage builds. Lets you use staging images to prevent development files from the final image. For example, when you compile a C program, you have makefiles and .obj files. You would use a multi-stage build where the first image compiles the final program but the last image only contains the compiled binary.

Set up an appropriate user to run the final application and not root. It’s a security bad practice.

# create group and user
RUN groupadd -r tom && useradd -g tom tom

# set ownership and permissions
RUN chown -R tom:tom /app

# switch to user
USER tom

CMD node index.js

Scan image for vulnerabilities. Use the docker scan from the docker hub system. docker scan image:tag.

Posted in Computers, Docker | Leave a comment

Replacing An OKD Master

I’m running an OKD4 (aka an upstream Red Hat OpenShift Container Platform v4) cluster at home. Recently my bldr0cuomokdmst1 master node crapped out. I couldn’t even log in to determine the problem. For my current Kubernetes clusters, I’m casting the logs to my ELK clusters but hadn’t done that for the OKD4 cluster yet. So no idea why it failed. Now I need to delete the old master, clear out the configuration, and add it back in again.

Preparation

First off, you should have already replaced the certificate in your installation process. See my OKD4 Installation post for details on how to do that. As a reminder, the certificate is only good for 24 hours for a new cluster build. After that you need to retrieve the cluster certificate and add it to the ignition file.

High Availability Proxy (haproxy)

I removed the failed master node from the cluster in part because it was causing timeout problems with managing the cluster while I was researching the solution. Sometimes the console would work, other times I suspect it was trying to get to the failed master and timing out. However deleting the master and adding it back in is a pretty quick process so removing it from the haproxy configuration might not be necessary.

Log in to your cluster haproxy server (bldr0cuomokdhap1) and in /etc/haproxy, edit the haproxy.cfg file to comment out the bldr0cuomokdmst1 server entry from the configuration. Don’t delete it as we’ll be uncommenting it when the master has been recovered.

Remove Master

Drain the node from the cluster. While the pods won’t be removed, the node will be cleared from the system so it’s not accepting incoming connections any more.

$ oc drain bldr0cuomokdmst1 \
--delete-emptydir-data \
--ignore-daemonsets \
--force

This will take some time as requests to the failed master will need to time out. Now verify bldr0cuomokdmst1 has been removed as an endpoint in the cluster. The IP is 192.168.101.100 which should be missing from the output below.

$ oc describe svc kubernetes -n default 
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP Families: <none>
IP: 172.30.0.1
IPs: 172.30.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 192.168.101.102:6443,192.168.101.103:6443
Session Affinity: None
Events: <none>

Finally delete bldr0cuomokdmst1 from the cluster.

$ oc delete node bldr0cuomokdmst1
node "bldr0cuomokdmst1" deleted

Clear etcd

Since etcd is on the masters and not a separate cluster, we’ll need to remove the etcd configuration as well. You’ll log in to a working etcd pod, remove bldr0cuomokdmst1, then remove any secrets that belong to the master node from the cluster.

Verify Status

$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}'
2 of 3 members are available, bldr0cuomokdmst1 is unhealthy

Working Pods

$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd etcd-bldr0cuomokdmst2                3/3     Running     0          33d etcd-bldr0cuomokdmst3                3/3     Running     0          33d 

Clear Configuration

In this step, you’ll log in to one of the above working pods (2 or 3) and remove bldr0cuomokdmst1 from the etcd configuration.

$ oc rsh -n openshift-etcd etcd-bldr0cuomokdmst3
Defaulting container name to etcdctl.
Use 'oc describe pod/etcd-bldr0cuomokdmst3 -n openshift-etcd' to see all of the containers in this pod.

Get the member list

# etcdctl member list -w table
+------------------+---------+------------------+------------------------------+------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------------------+------------------------------+------------------------------+------------+
| e43c9f92fda4af5 | started | bldr0cuomokdmst3 | https://192.168.101.103:2380 | https://192.168.101.103:2379 | false |
| ac4ca03e8d200e17 | started | bldr0cuomokdmst2 | https://192.168.101.102:2380 | https://192.168.101.102:2379 | false |
| c7804a193b578f80 | started | bldr0cuomokdmst1 | https://192.168.101.101:2380 | https://192.168.101.101:2379 | false |
+------------------+---------+------------------+------------------------------+------------------------------+------------+

You can see that bldr0cuomokdmst1 is in the configuration. Now remove it.

# etcdctl member remove c7804a193b578f80
Member c7804a193b578f80 removed from cluster 617aed10ec5206e3

Finished. You can exit the etcd pod now.

Remove Secrets

There are three secrets for each master node. You’ll need to get the list and then remove them from the cluster.

$ oc get secrets -n openshift-etcd | grep bldr0cuomokdmst1
etcd-peer-bldr0cuomokdmst1 kubernetes.io/tls 2 205d
etcd-serving-bldr0cuomokdmst1 kubernetes.io/tls 2 205d
etcd-serving-metrics-bldr0cuomokdmst1 kubernetes.io/tls 2 205d
$ oc delete secret -n openshift-etcd etcd-peer-bldr0cuomokdmst1 secret
"etcd-peer-bldr0cuomokdmst1" deleted
$ oc delete secret -n openshift-etcd etcd-serving-bldr0cuomokdmst1
secret "etcd-serving-bldr0cuomokdmst1" deleted
$ oc delete secret -n openshift-etcd etcd-serving-metrics-bldr0cuomokdmst1
secret "etcd-serving-metrics-bldr0cuomokdmst1" deleted

And the failed bldr0cuomokdmst1 server has been completely removed from the cluster.

Rebuild Master

This process follows the initial build process except with a single node. You’ll boot the server to an ISO image, update the boot line, approve the Certificate Signing Request (csr), and monitor the node.

haproxy Node

If you cleared the master server from haproxy, you’ll need to uncomment that line in the haproxy.cfg file plus you’ll need the boot node so that’ll also need to be uncommented. Log in to bldr0cuomokdhap1 and in /etc/haproxy edit the haproxy.cfg file. After updated, restart haproxy.

Start Guest Image

Update the boot settings for the image to boot to the BIOS after started. Once started, attach the Fedora CoreOS ISO, make sure the system boots to CD-ROM, and save and start the image.

In the Live Image, tab to the boot line and enter in the following parameters at the end.

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101.100:8080/okd4/master.ign

Monitor the console and you’ll see the system start up, then start downloading the image, and it’ll boot. Remove the ISO from the image after it starts and reboot the system. Initially it’ll retrieve the newest image so you may see it reboot again.

Approve CSRs

Now you need to review and approve any outstanding CSRs.

$ oc get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-k258q 11m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending

There should really only be one but if more than one, feel free to investigate. Once done, approve the outstanding CSR and pods will start on the bldr0cuomokdmst1 node.

$ oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
certificatesigningrequest.certificates.k8s.io/csr-k258q approved

At this point, the replacement master node is part of the cluster and is creating pods. As there are quite a few (43 at my last count) on a single master node and I’m on high speed WiFi, it will take several minutes.

Verify etcd Configuration

Once all the pods have started and the new cluster member is Ready, verify etcd is also working. First off, check the health of the cluster.

$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}'
3 members are available

That looks good. Log in to the same etcd pod you did at the start and check the table output.

$ oc rsh -n openshift-etcd etcd-bldr0cuomokdmst3
Defaulting container name to etcdctl.
Use 'oc describe pod/etcd-bldr0cuomokdmst3 -n openshift-etcd' to see all of the containers in this pod
# etcdctl member list -w table
+------------------+---------+------------------+------------------------------+------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------------------+------------------------------+------------------------------+------------+
| e43c9f92fda4af5 | started | bldr0cuomokdmst3 | https://192.168.101.103:2380 | https://192.168.101.103:2379 | false |
| 630bbe550c81b877 | started | bldr0cuomokdmst1 | https://192.168.101.101:2380 | https://192.168.101.101:2379 | false |
| ac4ca03e8d200e17 | started | bldr0cuomokdmst2 | https://192.168.101.102:2380 | https://192.168.101.102:2379 | false |
+------------------+---------+------------------+------------------------------+------------------------------+------------+

haproxy Update

The final task is to remove the boot server from the haproxy configuration. Log in to bldr0cuomokdhap1 and edit /etc/haproxy/haproxy.cfg. Comment out the boot server lines, save, and restart haproxy.

References

Posted in Computers, OpenShift | Tagged , , | Leave a comment

Original Car Wars Play

Morning! Jeanne and I played the old Car Wars Saturday night. Took a bit to track down some of the bits and there are a few others that I haven’t found yet but I’m still hunting.

Top pic is my chit carrying case. Under are the bags of chits from the game and various expansions.

Top pic is the vehicle record sheet I created back then. I was a graphics artist so that’s an ink and paper creation. Under pic is the same record sheet but after spending about a half hour filling it out. I used one of the sample cars but still had to track down information so we had damage points, ammo, etc.

Amazingly I found my first arena, the top pic. All the post-its are holding chits down where they were. Apparently we were playing a game who knows how long ago and intended on continuing. Clearly not. The lower pic is after all the post-its have been removed leaving the original damage and car in place. I suspect the other vehicle dropped off sometime in the past and it may be in a box somewhere.

To the game itself. Jeanne and I have the same car and I didn’t want to add extra bits to the game so no dropped debris from hits. In the pic you can see the original arena, our vehicles, flaming oil chits, and the official turning key. We both have linked rocket launchers with a targeting laser (+1). To hit is 8 or higher on two dice by default and a 7 with the laser and 2d6 damage. To the rear is a Flaming Oil Dispenser, 3d6 damage. We’re coming in at 15 and 20 mph, both with a handling class of 3 and acceleration of 5mph. We also have the Control Table and Speed Table close at hand.

In the first pic, after doing some maneuvering, Jeanne cut a hard left in front of me just missing hitting each other (cars not chit dimension). In the next pic however, she learned and dropped a flaming oil patch in front of me that I can’t avoid! 11pts of damage to my undercarrage!

In the final pic, you see the ending of the game. She decided to head on crash and the damage gave her 2 pts left of front armor. I fired my rockets into her front end (and took damage too, 2″ radius explosive damage). I still had armor after the explosion but I breached her armor, removed both her rocket launchers, her engine, and her targeting laser leaving her to scramble out of the wreckage and hotfoot it over to the exit.

In comparing the two games, Jeanne liked playing in both. The issues with the game with chits is they were small and tended to adhere to the fingers so moving them around was a bit annoying at times, something 6th fixed with miniatures. We liked the original turning key better than the new turning key that slides under the base of the miniatures. Certainly set up was significantly quicker with 6th and having cards for the various weapons and accessories. If we had pre-generated vehicle record sheets, setup would have been about as quick I guess. Having to hunt through the book for stats took quite a bit of time and I had to look up the Difficulty values for some things like taking damage but once known, it was easy enough to manage (taking damage: 1-5 D1, 6-9 D2, 10+ D3). We really liked the MPH for speed vs the 1-5 increments. Made it a bit more realistic feeling.

Overall 6th is better just because of the miniatures and having cards for the weapons and accessories. 1st-4th is better though if part of your joy is building the best car. Think of it as closer to a Collectible Card Game like MtG where you’re creating the “Best” deck. And the earlier game has a ton of adventures. But we like both and Jeanne is looking forward to breaking out either game.

Posted in Gaming | Tagged , | Leave a comment