Recommended Games

Jeanne and I have played several games in the library (see prior post). I did a quick walk through for someone who asked for a list of good games. While there are quite a few games that are pretty good, for the walk through, I did a quick selection vs an indepth review of all the games. So here’s the list I provided.

Ace of Aces You’re flying WWI airplanes. Your opponent has a book with pictures of you and you have pictures of their plane. You select a maneuver which has a page number, go to an intermediate page, then the final page to see what your opponent is doing (including shooting at you).

B-Movie Card Games You’re creating a B-Movie. There are 8 or 10 different decks. You lay down a location, characters, and accessories like a whip. Then your opponents throw monsters at your movie to prevent you from creating the best B-Movie.

Bunny Kingdom Very little conflict. Gridded and numbered board. You pick cards and place your bunnies. There are a couple of squatter cards hoping the official card doesn’t come up but mostly there isn’t a way to take over a space that has a bunny on it.

Castles of Burgundy Manage and increase your hold by rolling dice and selecting resources in order to add farm animals and buildings. This is a very well balanced game for 2 players, 3 players, or 4 players.

Cosmic Encounters Take over alien races with skill, no dice rolling. However you’re playing races that modify the core rules. I think this might have influenced the creation of Magic: The Gathering. Core rules then an alien power that changes a rule which makes the game different every time you play.

DC Deck Building Basically drawing cards based on the DC comic universe from the displayed cards using your current cards to ‘take over’ the displayed card. I find this much better than Legendary BTW.

Discoveries of Lewis and Clark Explore the west and gather routes. Gather native tribes to help in gathering routes.

The Doom That Came To Atlantic City Basically reverse Monopoly. The board is set up with two houses on a similar Monopoly style board. You play an old god and create portals by destroying houses. The person with 6 portals to the nether realms wins.

Elder Sign I like all three of the Cthulhu type games (Arkham Horror and Eldritch Horror) but this is the quicker of the three to play. In this one you’re at a portal and trying to block the coming of the elder gods. Arkham Horror takes like 6 hours to play, Eldritch Horror about 3 hours, and Elder Sign about 90 minutes. There is a second edition Arkham Horror we haven’t tried yet so maybe it’s more streamlined.

Epic Tiny There are several different games like dinosaurs, space, strategy, etc. We’ve played the space one several times. Quick and easy to set up and quick to understand. We enjoy these.

Everdell You have workers and are placing them on locations, events, and such to increase your new location.

Five Tribes You place a grid of tiles and then three meeples on each tile. You pick up the three (more or less), drop one on each tile and the last one has to match the color of a meeple on the last tile. Then you collect whatever the color describes.

Formula D This one is cool. You have multiple different dice from a 4 sided up to a 30 sided, each based on a speed of your car (slower is 4, fastest is 30). You have to slow down for corners so you have to drop your speed or you could spin out and end up in the bushes. Pretty fun game.

Gizmos You’re building engines using marbles from a central pot. Whoever has the best engine at the end of the game, wins.

Gloom You’re trying to kill off your ‘family’ by telling horrible stories about their lives based on the cards drawn. These are clear cards with text and pictures so your negative points can be blocked if someone places a positive card over it. Pretty good storytelling type game.

Horrified Actually a game much easier for younger kids. You’re playing one or two classic monsters; invisible man, wolfman, mummy, etc and the villagers are trying to stop you.

Mountains of Madness You’re flying a plane into Antarctica and to the Mountains of Madness. You’ll flip cards and deal with the results.

New York 1901 You’re creating buildings. Over time you can replace your bronze tiles with silver and gold to increase from a small building to a sky scraper.

The Others This is based on the 7 deadly sins. You are trying to defeat the core monster of each of the sins. Lots of setup but it can be fun.

Pandemic Mainly Iberia although Pandemic itself is pretty good with the expansion.

Photosynthesis You are building trees. The sun rotates around the board so you only get points if your trees aren’t blocked by other, taller trees.

The Red Dragon Inn Card game where four folks have returned from an adventure and are in a tavern enjoying the spoils, drinking, and playing cards. The last one to pass out, wins 🙂

Resident Evil This is a card game similar to DC Deck Building or Legendary and does a pretty good job matching the Resident Evil video game.

Robo Rally Kind of computerized where you’re creating paths using cards to move your robot around.

Splendor A pretty simple game. You have three rows of cards. Raw gems, polished gems, and final gems (or jewelry, I forget 🙂 ). You’re building up your cards and gems in order to buy the next and final row of cards. 15 points wins the game.

Star Wars X-Wing A miniatures game but with very large models for some of the games. You have a wheel for movement and at the end of each turn, you could be destroyed.

Talisman You have three realms. An outer, inner, and the final center task to become the ruler of all the realms. You travel around the board (or expansions, I like the City one but the space one is pretty interesting), pick up sufficient gear to beat other players to the center. The main problem here is folks tend to get a lot more than they really need to win so the final encounter is pretty much a done deal. Still a fun game.

The Thing One of several “one of the team is the bad guy” sort of games 🙂 You’re exploring, trying to discover who the bad guy is. If the bad guy escapes on the helicopter, you lose.

Ticket To Ride Mainly Rails and Sails. Train placement to complete routes. The more routes completed, the more points. Rails and Sails lets you place ship routes too on a world map or map of the Great Lakes.

Trains Another deck building type game except you’re building train routes.

Tzolk’in One of the multiple paths to victory type games. You have several wheels that move during the game.

Wings of War WWI air combat. Like X-Wing, you have plans that move around a board. You select your maneuver (three cards) and flip them. At the end, you might get shot down..

Wingspan This is a pretty cool game for the bird cards. Great pictures. You’re trying to attract birds to your location. You get food, eggs, and birds. Most points wins.

Zombicide A zombie game. Lots of setup and lots of expansions. You’re trying to get from point a to point b, collecting keys or whatnot in order to pass into the prison (for example) and defeating zombies. My wife did an excellent job escaping zombies and bringing out the other players where I was at the exit and ready to just abandon everyone. She’s great 😀


Posted in Gaming | Leave a comment

Mid Year Review

This is more of a redo of the game room into a game library type post. I generally do a COMC as an end of year review so none of the extra information that’s typically added.

In this case the old game room had the Kallax shelves against the walls leaving space in the middle of the room for a table and chairs so my wife and I could play a game now and then. Last year though we bought a really big dining room table (8 seater) and we’ve been gaming up in the dining room instead of the game room.

Last week I whipped up a room layout with the existing Kallax shelves so I could move them around on the drawing and see if things would fit in the way I desired. I currently have 4 5×5 shelves, 2 4×4 shelves, 3 2×4 shelves, 2 2×2 shelves and 5 1×4 shelves.

With the change to a game library vs a game room, I was able to add two more 5×5 shelves which gives me a ton more space plus I have room to add two more 1×4 shelves on top of the 5×5 shelves and another 4×4+2×4 combo.

The process was to build the new 5×5 shelves and put them into the hallway. Then move the games from the existing shelves into the hall shelves. I had to pull down the 1×4 shelves from on top of the 5×5 shelves into the hallway as well before we could reposition the 5×5 and 4×4 shelves. After the existing shelves were rearranged the games were put back into the shelves where ever they’d fit. Then move the new shelves into the room (the white ones in the back). Finally move the second 4×4 shelf to the left side. It holds the mini games and card games plus some display stuff on top.

My wife was a great help in moving things and had a few suggestions for the move and for the final layout. I originally had the two 4×4 shelves at the front of the room but because of the angle of the wall, some of the games would be inaccessible and the door wouldn’t be able to be fully opened. We moved the single 4×4 back enough to let the door shut and put the second 4×4 against the wall as you can see.

The pictures progress from the hallway, a view of the first shelf. Then the side wall shelf followed by the shelves all the way to the back. Then the last shelves to the left. I also was able to put up a few of my posters. I have tons more but I’m a Shadowrun geek so it’s Shadowrun 🙂

My wife’s main question, “are the new shelves enough space to hold the next 5 years of games?” 😀

My final intention for this year’s End of Year Review is a video review discussing my 50 years of gaming, pointing out specific games, and generally providing a bit more targeted information.

And all the pictures are over here if you want to see larger versions.

Gameroom Pictures
Posted in Gaming | Leave a comment

Homelab 2021

It’s that time again. It’s a year later and there have been quite a few changes in the Homelab. With the release of VMware 7, I needed to replace my servers with something a bit more current. I repurposed one of the R710’s as a KVM server and replaced the others with Dell R720XD servers. As a note, I’m less a hardware/network guy and more an OS and now Kubernetes guy.

Network:

  • High-Speed WiFi connection to the Internet (the black box on the bottom left).
  • Linksys EA9500 Wifi hub for the house.
  • Linksys Wifi extender upstairs.
  • HP 1910 48 Port Gigabit Ethernet Switch.

Servers:

  • Nikodemus: Dell R710, 2 Intel 5680’s, 144G Ram, 14 TB Raid 5.
  • Slash: Dell R720XD, 2 Intel E5-2697 v2’s, 384G Ram, 27 TB Raid 5 w/Hot Spare.
  • Monkey: Dell R720XD, 2 Intel E5-2697 v2’s, 384G Ram, 27 TB Raid 5 w/Hot Spare.
  • Morgan: Dell R720XD, 2 Intel E5-2697 v2’s, 384G Ram, 27 TB RAID 5 w/Hot Spare.

UPS:

  • Three APC Back-UPS [XS]1500 split between the four servers for uninterrupted power. Lasts about 10 minutes. Sufficient time to run the Ansible playbooks to shut down all the servers before the power goes out.

Software:

I bought the VMware package from VMUG so I have license keys for a bunch of stuff. vCenter is limited to 6 CPUs. Still trying to figure out Distributed Switches. Each time I try, it fails and I have to rebuild the cluster.

All three servers are booting off an internal 16G USB thumb drive.

  • vSphere 7
  • vCenter 7

Most of what I’m doing fits into two categories. Personal stuff and a duplication of work stuff in order to improve my skills.

I have about 150 virtual machines as of the last time I checked plus another 100 for test builds on the KVM server. Most of my VMs are CentOS or Red Hat since I work in a Red Hat shop, and a few Ubuntu, a SUSE, one Solaris, one OpenBSD, and a couple of Windows workstations. I’m going to set up a Windows server for my DBA wife.

Personal:

  • pfSense. Firewall plus other internal stuff like DNS and Load Balancing. I have all servers cabled to the Internet and House Wifi so I can move pfSense to any of the three to restore access.
  • Jump Servers. I have three jump servers I use basically for utility type work. My Ansible playbooks are on these servers.
  • Hobby Software Development. This is kind of a dual purpose thing. I’m basically trying to duplicate how work builds projects by applying the same tools to my home development process. CI/CD: gitlab, jenkins, ansible tower, and artifactory. Development: original server, git server, and Laravel server for a couple of bigger programs
  • Identity Management server. Centralized user management. All servers are configured.
  • Spacewalk. I don’t want every server downloading packages from the ‘net. I’m on a high-speed wifi setup where I live. So I’m downloading the bulk of packages to the Spacewalk server and upgrading using it as the source. I have a Katello server that I want to get working due to the addition of Red Hat 8 servers.
  • Inventory. This is a local installation of the inventory program I wrote for my work servers. This has my local servers though. Basically it’s the ‘eat your own dog food’ sort of thing. 🙂
  • Media Servers. With almost 8 TB of space in use, I have two servers to split them between the R720XD’s. Movie Server. About 3 or so TB of movies I’ve ripped from my collection. Television Server. About 5 TB of television shows I’ve ripped from my collection.
  • Backups. I have two backup servers. One for local/desktop backups via Samba and one for remote backups of my physical server which is hosted in Florida.
  • Windows XP. I have two pieces of hardware that are perfectly fine but only work on XP so I have an XP workstation so I can keep using the hardware.
  • Windows 7. Just to see if I could really 🙂
  • Grafana. I’ve been graphing out portions of the environment but am still in the learning phase.

Work/Skills:

  • Work development servers. The scripts and code I write at work, backed up to this server and also spread out amongst my servers.
  • Amazon AWS specific server. For testing building EC2 and EKS servers on AWS.
  • Nagios Servers. I have 6. One to monitor my Florida server, One to monitor my personal servers (above), and four to monitor my Work type servers. All six monitor the other five servers so if one bails, I’ll get notifications.
  • NMIS Servers. Work is using NMIS so I thought I’d spin one up to get familiar with the tool.
  • Docker Server. Basically learning docker.
  • Docker Registry. Holds my Docker images on prem so I’m not hitting the cloud every time I muck with a worker.
  • Kubernetes Servers. I have three Kubernetes clusters for various testing scenario. Three masters and three workers
  • OKD Servers. I have an OKD cluster, which is upstream OpenShift for testing an extended learning. Three masters and seven workers.
  • Elastic Stack clusters. This is a Kibana, multiple Logstash, and multiple Elastic Search servers. Basically centralized log management. Just like Kubernetes, three clusters for testing.
  • A Hashicorp Vault server for testing to see if it’ll work for what we need at work (secrets management).
  • Salt. One salt master for testing. All servers are connected.
  • Terraform. For testing VMware builds.
  • Jira server. Basically trying to get familiar with the software
  • Confluence. Again, trying to get used to it. I normally use a Wiki but work is transferring things over to Confluence.
  • Wiki. This is a duplicate of my work wikis, basically copying all the documentation I’ve written over the past few years.
  • Solaris 2540.

My wife is a DBA so I have a few database servers up, partly for her and partly for my use.

  • Cassandra
  • Postgresql.
  • MongoDB
  • Postgresql – This one I stood up for Jira and Confluence
  • Microsoft SQL Server
  • MySQL Master/Master cluster. This is mainly used by my applications but there for my wife to test things on as well.

The KVM server (Nikodemus) is more for Terraform testing. I’ve rebuilt my work server environment (just the OS part) and am currently rebuilding my various home environments to really dig into Infrastructure as Code. This is generally about 100 VMs depending on which environment I’m building.

Of course, this isn’t an exhaustive list of the VMs I’m running. I’m constantly spinning up VMs for testing purposes. Some stay up, some are shut down after I’m done mucking with it.

I will note that I’m 64 and have been mucking about with computers for about 40 years now. I have several expired and active certifications. 3Com 3Wizard, Sun System and Network Admin, Cisco CCNA and CCNP, Red Hat RHCSA, and most recently a CKA, CKAD, and Amazon Certified Cloud Practitioner.



Posted in Computers | Tagged | Leave a comment

Installing OKD4

Overview

What I’m trying to accomplish here is to set up a homelab OpenShift type environment. I’m installing OKD4.7 in this case but OKD is an upstream of the Red Hat OpenShift so should be just fine for my use-case.

This installation is focused on my own requirements, as such I don’t have near the detail the Red Hat documentation has. This has been extremely helpful in understanding the Red Hat installation process.

I intend on documenting what I currently have, environment wise, and how to configure it appropriately. If we don’t have a similar environment, I highly recommend the article in the References section below. It greatly helped me understand the installation process.

Environment

  • I am running this project initially on VMware but will also be working on a second KVM server.
  • I currently run a pfSense package to manage my Firewall plus the DNS and DHCP services which are used for this project.
  • I’m using HAProxy to provide Load Balancing for the API-Server in the cluster. This is running on CentOS 8 Server.
  • I’m using a Service box to provide the necessary web server and NFS storage services. This is running on CentOS 8.
  • For the bootstrap node and cluster elements, I’ll be using a Fedora CoreOS image.

Virtual Machines

The servers are built in my Development environment (192.168.101.0/24). The assumption for this build is you already have the Gold Images for the non CoreOS nodes. Building the Operating Systems is beyond the scope of this article. Note that the function of the two non CoreOS nodes is pretty minor. The CPU, RAM, and Storage settings for the cluster are the recommendations from Red Hat.

Some general information to make the table a bit smaller.

  • Bootstrap/Control Plane – 4 CPU, 16 Gigs of RAM, 120 Gigs of storage.
  • Compute – 2 CPU, 8 Gigs of RAM, 120 Gigs of storage.
  • Operating System is Fedora CoreOS for the above three server types.
  • For the service and haproxy node, CentOS 8, 2 CPU, 4 Gigs RAM, and 30 Gigs of storage.
  • For the Machine Names, all are prefixed with bldr0cuomokd. So boot is bldr0cuomokdboot.
Machine TypeMachine NameIP AddressMAC
Bootstrap Serverboot192.168.101.10700:50:56:b3:a3:7e
Control Planemst1 192.168.101.101 00:50:56:b3:02:f4
Control Planemst2 192.168.101.102 00:50:56:b3:2a:62
Control Planemst3 192.168.101.103 00:50:56:b3:1a:11
Computewrk1 192.168.101.104 00:50:56:b3:42:30
Computewrk2 192.168.101.105 00:50:56:b3:8b:67
Computewrk3 192.168.101.106 00:50:56:b3:d8:81
Computewrk4 192.168.101.109 00:50:56:91:93.c3
Computewrk5 192.168.101.110 00:50:56:91:a7:21
Computewrk6 192.168.101.111 00:50:56:91:2f:be
Computewrk7 192.168.101.112 00:50:56:91:7e:7a
NFS/Websvc1 192.168.101.100 00:50:56:b3:38:31
HAProxyhap1 192.168.101.108 00:50:56:b3:e0:0a

Note that worker nodes 4 and 5 were added after the cluster was built but prior to the 24 hour period expiring. Worker nodes 6 and 7 were added after the 24 hour window expired.

DNS

The above Machine Names have been entered into DNS. In addition, the following aliases need to be added to DNS.

The cluster domain name will be [site].internal.pri. For example, bldr0-0.internal.pri.

  • api – The api server. Aliased to the haproxy hostname.
  • api-int – The internal api server alias. Aliased to the haproxy hostname.
  • console-openshift-console – Console access to the cluster. Aliased to the haproxy hostname.
  • oauth-openshift – The authentication method. Aliased to the haproxy hostname.
  • etcd-0 – The first etcd node. Aliased to the first master node.
  • etcd-1 – The second etcd node. Aliased to the second master node.
  • etcd-2 – The third etcd node. Aliased to the third master node.
  • bootstrap – The bootstraper server. Aliased to the bootstrap server.
  • address=/okd.internal.pri/192.168.101.108 – This needs to be added to the Options in the DNS Forwarder section of pfSense. This lets the ingress router work.

Note that I used okd.internal.pri as I didn’t quite understand the naming convention for the domain. If I were to create clusters in all four of my environments, I’d be using bldr0-0, cabo0-0, tato0-0, and lnmt1-2 which matches my existing Kubernetes cluster naming convention.

DHCP

The components will start and will need to get an IP address from the DHCP server. The IP will need to be external to the DHCP server range and will require the MAC address in order for discovery to work.

Note that I used the wrong MAC for the last two worker nodes and the IPs came up in the DHCP range. So ensure you have the correct MAC address (apparently VMware uses 00:50:56:b3:00:00 and 00:50:56:91:00:00, note the b3 and 91 differences). Just a note of caution.

Fedora CoreOS

We’ll be downloading two images. You’ll need the ISO image so you can boot to it to start the installation and you’ll need the raw image for the actual component installation.

On Fedora CoreOS Download Site click on the Bare Metal &Virtualized tab and then down load the ISO image plus the signature and sha256 keys. In addition, download the raw image plus the signature and sha256 keys.

Prepare Nodes

Again, I’m on a VMware environment. To prepare for the cluster build, create 7 blank VMs using the configuration as noted in the Virtual Machines section. In the VM Options tab under Boot Options, check the Force BIOS Setup checkbox. This lets you attach the CoreOS image to the server so you can install CoreOS.

Load Balancer

You need to install haproxy on the HAProxy node and configure it to be a Load Balancer for the cluster.

# yum install -y haproxy

For the global and defaults sections, increase the maxconn line to 20000.

Delete everything after the end of the defaults section.

For statistics, add:

listen stats
    bind :9000
    mode http
    stats enable
    stats uri /

For the API Server (port 6443), add the following front end and back end sections:

frontend okd4_k8s_api_fe
    bind :6443
    default_backend okd4_k8s_api_be
    mode tcp
    option tcplog

backend okd4_k8s_api_be
    balance source
    mode tcp
    server      bldr0cuomokdboot 192.168.101.107:6443 check
    server      bldr0cuomokdmst1 192.168.101.101:6443 check
    server      bldr0cuomokdmst2 192.168.101.102:6443 check
    server      bldr0cuomokdmst3 192.168.101.103:6443 check

When the servers are being built, the following section will be needed:

frontend okd4_machine_config_server_fe
    bind :22623
    default_backend okd4_machine_config_server_be
    mode tcp
    option tcplog

backend okd4_machine_config_server_be
    balance source
    mode tcp
    server      bldr0cuomokdboot 192.168.101.107:22623 check
    server      bldr0cuomokdmst1 192.168.101.101:22623 check
    server      bldr0cuomokdmst2 192.168.101.102:22623 check
    server      bldr0cuomokdmst3 192.168.101.103:22623 check

For port 80 traffic, if any, the following section is needed:

frontend okd4_http_ingress_traffic_fe
    bind :80
    default_backend okd4_http_ingress_traffic_be
    mode tcp
    option tcplog

backend okd4_http_ingress_traffic_be
    balance source
    mode tcp
    server      bldr0cuomokdwrk1 192.168.101.104:80 check
    server      bldr0cuomokdwrk2 192.168.101.105:80 check
    server      bldr0cuomokdwrk3 192.168.101.106:80 check

And for port 443 traffic, the following section is needed:

frontend okd4_https_ingress_traffic_fe
    bind *:443
    default_backend okd4_https_ingress_traffic_be
    mode tcp
    option tcplog

backend okd4_https_ingress_traffic_be
    balance source
    mode tcp
    server      bldr0cuomokdwrk1 192.168.101.104:443 check
    server      bldr0cuomokdwrk2 192.168.101.105:443 check
    server      bldr0cuomokdwrk3 192.168.101.106:443 check

Once edited, enable and start haproxy.

systemctl enable haproxy
systemctl start haproxy

Web Server

In order for the OpenShift nodes to retrieve the image and ignition files, you’ll need to install a web server on the Service node. Also configure the node to listen on port 8080.

dnf install -y httpd
sed -i 's/Listen 80/Listen 8080/' /etc/httpd/conf/httpd.conf

Once done, enable and start the server.

systemctl enable httpd
systemctl start httpd

OpenShift Binaries

You need to have the oc binary plus the openshift-installer on your Service Node.

OpenShift Downloads

While not necessary for the installation, you should retrieve the kubectl binary as well.

OpenShift install-config.yaml File

The following file is used for building the cluster.

apiVersion: v1
baseDomain: [domain]                     # update the domain info
metadata:
  name: [sub-domain]                     # update the sub-domain info

compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0

controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14                   # Verify your network
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16                         # Verify the service network.

platform:
  none: {}

fips: false

pullSecret: '{"auths": ...}'               # add the pullSecret
sshKey: 'ssh-ed25519 AAAA...'              # add your ssh public key

OpenShift Pull Secret

In the configuration file, update the pullSecret from the download below.

OpenShift Pull Secret Download

SSH Key

When building the compute nodes, in order to access the servers, you’ll need to provide an SSH Key from the account that will need to access the servers. If you don’t already have a public key ready, generate one.

ssh-keygen -t rsa

To log in to the servers, log in as core@[servername].

Perform The Installation

Create an install_dir directory and make a copy of the install-config.yaml file into this directory. Then run the installer.

openshift-install create manifests --dir=install_dir
INFO Consuming Install Config from target directory
INFO Manifests created in: install_dir/manifests and install_dir/openshift

Rename the install_dir to the manifests directory.

Run Installer Again

This time you’re creating the ignition-configs. Note that the ignition files contain a certificate that expires in 24 hours. If you need to rebuild the cluster or add nodes after the 24 hours has expired, you’ll need a new certificate. See the Adding Workers section at the end of this document.

$ openshift-install create ignition-configs --dir=install_dir/
INFO Consuming Install Config from target directory
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
INFO Ignition-Configs created in: install_dir and install_dir/auth

Install Configuration

In the /var/www/html directory, create the okd4 directory, copy the new configuration files, copy the fedora CoreOS files and rename them (makes it easier to type on the console), and set the permissions.

mkdir /var/www/html/okd4
cp -r manifests/* /var/www/html/okd4/
cp -r install_dir/* /var/www/html/okd4/
mv fedora-coreos-33.20210426.3.0-metal.x86_64.raw.xz /var/www/html/okd4/fcos.raw.xz
mv fedora-coreos-33.20210426.3.0-metal.x86_64.raw.xz.sig /var/www/html/okd4/fcos.raw.xz.sig 
chown -R apache: /var/www/html
chmod -R 755 /var/www/html

Build The bootstrap Server

You’ll need to boot into the Fedora CoreOS Live image, press tab to jump to the kernel line, and enter in the following information:

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101.100:8080/okd4/bootstrap.ign

Now The Control Plane

Start the master servers and after hitting tab, enter the following lines.

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101.100:8080/okd4/master.ign

It can take 10 or 15 minutes for the masters to register.

Now The Compute Nodes

Same here, start the workers and enter the following lines

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.101.100:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.101/100:8080/okd4/worker.ign

It can take quite some time for the workers to register, up to 30 minutes.

Update HAProxy

When all nodes have been bootstrapped, you need to remove the bootstrap entry in the HAProxy configuration. Just comment it out and restart haproxy.

Certificate Signing Requests

Once the workers have been accepted into the cluster, you’ll need to approve any CSRs so they can start loading up pods. You’ll need to get the jq tool so you can approve a bunch of CSRs in a bundle.

wget -O jq https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64
chmod +x jq
sudo mv jq /usr/local/bin/
jq --version

Once you have the jq file, check for pending CSRs. At the start, there will be a ton. For new nodes, it can take several minutes as the new nodes tend to upgrade the CoreOS OS during the bootstrap process. Once things settle out, new CSRs should be pending.

oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-6n8c6   91s     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-85mmn   31m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-8tn26   6m38s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-lgxlv   16m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

Once you see the pending CSRs, then run this command to approve all pending CSRs

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
certificatesigningrequest.certificates.k8s.io/csr-6n8c6 approved
certificatesigningrequest.certificates.k8s.io/csr-85mmn approved
certificatesigningrequest.certificates.k8s.io/csr-8tn26 approved
certificatesigningrequest.certificates.k8s.io/csr-lgxlv approved

For some things such as new worker nodes, it will take two passes to approve all the CSRs.

Console Access

Finally check the status of the clusteroperators, specifically the console. Once it’s up and running, you can get your password from the install_dir/auth/kubeadmin_password file. Log in to the console as kubeadmin and you’re in!

$ oc get clusteroperators
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.okd-2021-04-24-103438   True        False         False      13h
baremetal                                  4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
cloud-credential                           4.7.0-0.okd-2021-04-24-103438   True        False         False      5d16h
cluster-autoscaler                         4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
config-operator                            4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
console                                    4.7.0-0.okd-2021-04-24-103438   True        False         False      4d23h
csi-snapshot-controller                    4.7.0-0.okd-2021-04-24-103438   True        False         False      4d23h
dns                                        4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
etcd                                       4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
image-registry                             4.7.0-0.okd-2021-04-24-103438   True        False         True       2d17h
ingress                                    4.7.0-0.okd-2021-04-24-103438   True        False         True       5d14h
insights                                   4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
kube-apiserver                             4.7.0-0.okd-2021-04-24-103438   True        False         False      5d13h
kube-controller-manager                    4.7.0-0.okd-2021-04-24-103438   True        False         False      5d13h
kube-scheduler                             4.7.0-0.okd-2021-04-24-103438   True        False         False      5d14h
kube-storage-version-migrator              4.7.0-0.okd-2021-04-24-103438   True        False         False      13h
machine-api                                4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
machine-approver                           4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
machine-config                             4.7.0-0.okd-2021-04-24-103438   True        False         False      46h
marketplace                                4.7.0-0.okd-2021-04-24-103438   True        False         False      4d23h
monitoring                                 4.7.0-0.okd-2021-04-24-103438   True        False         False      45h
network                                    4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
node-tuning                                4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
openshift-apiserver                        4.7.0-0.okd-2021-04-24-103438   True        False         False      13h
openshift-controller-manager               4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
openshift-samples                          4.7.0-0.okd-2021-04-24-103438   True        False         False      5d13h
operator-lifecycle-manager                 4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
operator-lifecycle-manager-catalog         4.7.0-0.okd-2021-04-24-103438   True        False         False      5d15h
operator-lifecycle-manager-packageserver   4.7.0-0.okd-2021-04-24-103438   True        False         False      13h

Persistent Storage

Next up is to create some persistent storage.

NFS Server

In order to set up a persistent volume, you’ll need to configure an NFS mount on an accessible server.

On the Service server:

dnf install -y nfs-utils
systemctl enable nfs-server rpcbind
systemctl start nfs-server rpcbind
mkdir -p /var/nfsshare/registry
chmod -R 777 /var/nfsshare
chown -R nobody:nobody /var/nfsshare

Then set up the share.

echo '/var/nfsshare 192.168.101.0/24(rw,sync,no_root_squash,no_all_squash,no_wdelay)' > /etc/exports

Assuming selinux and a firewall is running, you’ll need to make the following changes. Ignore the setsebool and firewall-cmd lines if one or both are not configured.

sudo setsebool -P nfs_export_all_rw 1
sudo systemctl restart nfs-server
sudo firewall-cmd --permanent --zone=public --add-service mountd
sudo firewall-cmd --permanent --zone=public --add-service rpc-bind
sudo firewall-cmd --permanent --zone=public --add-service nfs
sudo firewall-cmd --reload

Image Registry

Apply the following registry file to the cluster. Make sure the server IP is accurate.

$ cat registry_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: registry-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /var/nfsshare/registry
    server: 192.168.101.100

And

$ oc apply -f registry_pv.yaml

By default, persistent storage isn’t configured. As such the operator sets it up as ‘Removed’. You’ll need to edit the configuration to tell OpenShift that persistent storage is available.

$ oc edit configs.imageregistry.operator.openshift.io

Update the following settings.

managementState: Removed

  storage: {}

Change to:

managementState: Managed

  storage:
    pvc:
      claim:

Create Accounts

In order to create accounts for users to access the cluster, you use the htpasswd program and submit it to the cluster. You’ll also need to create a rolebinding or clusterrolebinding to provide permissions.

htpasswd

Simply create a file that contains your username and password for accessing the OKD cluster.

  • -c = Create New File. This does overwrite an existing file so use caution.
  • -B = use most secure algorithm.
  • -b = Accept the username and userpassword on the command line.

Create Secret

Next create the htpass-secret secret.

oc create secret generic htpass-secret --from-file=htpasswd=htpasswd -n openshift-config

Provider

You’ll need to apply a provider for the htpasswd supplied credentials. Use the following file. This says the OAuth object will use the htpass-secret for credentials.

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd_provider
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret

Bindings

If you log in as yourself to the console now, you can access but have no authority to see or do anything. You’ll need to bind your account to a RBAC set of permissions.

oc adm policy add-cluster-role-to-user cluster-admin [username]

And now I have access.

Adding Workers

When the ignition files were initially created, they have a certificate that expires 24 hours after it’s been created. Within that window, you can add multiple worker nodes. After the certificate expires, you’ll need to get a new certificate from the cluster. This certificate expires in 10 years and you can add multiple worker nodes from then on.

Things to remember:

  • Add the bootstrap servers back into the haproxy server configuration and restart.
  • Also add the new workers to the worker section in the haproxy configuration.
  • Create the DNS entries.
  • Update the DHCP configuration for the new Workers.

Within The Window

You can simply add the new Worker Nodes by following the above instructions for creating a blank VM, starting it with the Fedora CoreOS Live ISO, and entering in the kernel parameters to bootstrap the Workers. The worker.ign files are valid for any number of Worker Nodes.

Window Has Expired

You need to extract the new certificate from the cluster and add it to the worker.ign file. Make sure you back up the current worker.ign file, just in case.

Extract certificate:

openssl s_client -connect api-int.okd.internal.pri:22623 -showcerts </dev/null 2>/dev/null|openssl x509 -outform PEM > api-int.pem

The creates the api-int.pem file. This file now needs to be converted into a base64 string. The –wrap=0 parameter turns the block into a single line.

base64 --wrap=0 ./api-int.pem 1> ./api.int.base64

Now back up the worker.ign file.

cp worker.ign worker.ign.backup

And replace the current certificate with the new one located in the api.int.base64 file.

{"ignition":{"config":{"merge":[{"source":"https://api-int.okd.internal.pri:22623/config/worker"}]},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,[ADD CERTIFICATE HERE"}]}},"version":"3.2.0"}}

And finally follow the process above to add a new worker to the cluster.

References

I used the following link as it was more focused on what I’m running vs the Red Hat site which has a ton of options to consider. I’ve built the cluster three times now and with the third build, I rewrote this article as instruction for my specific use-case. If you have a similar environment, this article might be helpful. The following link lets you create a similar environment but firewalled away from your central homelab environment. Ultimately it made me a bit more skilled and able to better understand the more extensive Red Hat docs.

The following links are the Red Hat docs for a Bare Metal deployment and copies of the links in the above article so you’re not hunting for the pullSecret link.

Posted in Computers, OpenShift | Tagged , , , | Leave a comment

Kubernetes Upgrade to 1.20.6

Upgrading Kubernetes Clusters

The following lists what software and pods will be upgraded during this quarter.

  • Upgrade the Operating System.
  • Upgrade Kubernetes.
    • Upgrade kubeadm, kubectl, and kubelet RPMs from 1.19.6 to 1.20.6.
    • kube-apiserver is upgraded from 1.19.6 to 1.20.6 automatically.
    • kube-controller-manager is upgraded from 1.19.6 to 1.20.6 automatically.
    • kube-scheduler is upgraded from 1.19.6 to 1.20.6 automatically.
    • kube-proxy is upgraded from 1.19.6 to 1.20.6 automatically.
    • pause is upgraded from 3.2 to 3.4.1
  • Upgrade docker from 1.13.1-203 to 1.13.1-204.
  • Upgrade Calico from 3.17.1 to 3.18.2.
  • Upgrade Filebeat from 7.10.0 to 7.12.1
  • metrics-server is upgraded from 0.4.1 to 0.4.3.
  • kube-state-metrics is upgraded from 1.9.7 to 2.0.0.

Unchanged Products

The following products do not have an upgrade this quarter.

  • kubernetes-cni remains at 0.8.7-0.
  • coredns remains at 1.7.0.
  • etcd remains at 3.4.13-0.

Upgrade Notes

The following notes provide information on what changes might be affecting users of the clusters when upgrading from one version to the next. The notes I’m adding reflect what I think relevant to our environment so no discussions on Azure although I might call it out briefly. For more details, click on the provided links. If you find something you think relevant, please let me know and I’ll add it in.

Kubernetes Core

The following notes will reflect changes that might be relevant between the currently installed 1.19.6 up through 1.20.6, the target upgrade for Q2. While I’m trying to make sure I don’t miss something, the checks are for my specific environment. If you’re not sure, check the links to see if any changes apply to your product/project. Reminder that many of the 1.19 updates are the same as the 1.20 updates. As 1.20 is updated and patched, similar 1.19 releases address the same patches.

  • 1.19.7 – CPUmanager bug fix and cadvisor metrics fix.
  • 1.19.8 – Avoid marking a node as ready before it validates all API calls at least once. Static pods are deleted gracefully.
  • 1.19.9 – Counting a pods overhead resource usage as part of the ResourceQuota.
  • 1.19.10 – Nothing relevant to my environment.
  • 1.20.0 – The biggest is dockershim being deprecated and replaced with containerd. The new API Priority and Fairness configurations are in beta. This lets you prevent an overflow of API Server requests which might impact the API Server.
  • 1.20.1
  • 1.20.2
  • 1.20.3
  • 1.20.4
  • 1.20.5
  • 1.20.6

Calico

The major release notes are on a single page. Versions noted here to describe the upgrade for each version. For example, 3.17.2 through 3.17.4 all point to the 3.17 Release Notes. Here I’m describing the changes, if relevant, between the point releases.

Note that we’re not currently using many of the features of Calico yet so improvements, changes, and fixes for Calico issues aren’t likely to impact any current services.

Filebeat

docker

Run rpm -q --changelog docker
  • 1.13.1-204 –

kube-state-metrics

metrics-server

References

Posted in Computers, Kubernetes | Tagged , | Leave a comment

Kubernetes Preparation Steps for 1.20.6

Upgrading Kubernetes Clusters

The purpose of this document is to provide the background information on what is being upgraded, what versions, and the steps required to prepare for the upgrade itself. These steps are only done once. Once all these steps have been completed and all the configurations checked into github and gitlab, all clusters are then ready to be upgraded.

Reference links to product documentation at the end of this document.

Upgrade Preparation Steps

Upgrades to the Sandbox environment are done a few weeks before the official release for more in depth testing. Checking the release docs, changelog, and general operational status for the various tools that are in use.

Server Preparations

With the possibility of an upgrade to Spacewalk and to ensure the necessary software is installed prior to the upgrade, make sure all repositories are enabled and that the yum-plugin-versionlock software is installed.

Enable Repositories

Check the Spacewalk configuration and ensure that upgrades are coming from the local server and not from the internet.

Install yum versionlock

The critical components of Kubernetes are locked into place using the versionlock yum plugin. If not already installed, install it before beginning work.

# yum install yum-plugin-versionlock -y

Load Images

Next step is to load all the necessary Kubernetes, etcd, and additional images like coredns to the local repository so that all the clusters aren’t pulling all images from the internet. As a note, pause:3.1 has been upgraded to pause:3.2. Make sure you pull and update the image.

# docker pull k8s.gcr.io/kube-apiserver:v1.20.6
v1.20.6: Pulling from kube-apiserver
d94d38b8f0e6: Pull complete
6ee16ead6dee: Pull complete
ee5e6c27aaae: Pull complete
Digest: sha256:e6d960baa4219fa810ee26da8fe8a92a1cf9dae83b6ad8bda0e17ee159c68501
Status: Downloaded newer image for k8s.gcr.io/kube-apiserver:v1.20.6
k8s.gcr.io/kube-apiserver:v1.20.6
 
# docker pull k8s.gcr.io/kube-controller-manager:v1.20.6
v1.20.6: Pulling from kube-controller-manager
d94d38b8f0e6: Already exists
6ee16ead6dee: Already exists
a484c6338761: Pull complete
Digest: sha256:a1a6e8dbcf0294175df5f248503c8792b3770c53535670e44a7724718fc93e87
Status: Downloaded newer image for k8s.gcr.io/kube-controller-manager:v1.20.6
k8s.gcr.io/kube-controller-manager:v1.20.6
 
# docker pull k8s.gcr.io/kube-scheduler:v1.20.6
v1.20.6: Pulling from kube-scheduler
d94d38b8f0e6: Already exists
6ee16ead6dee: Already exists
1db6741b5f3c: Pull complete
Digest: sha256:ebb0350893fcfe7328140452f8a88ce682ec6f00337015a055d51b3fe0373429
Status: Downloaded newer image for k8s.gcr.io/kube-scheduler:v1.20.6
k8s.gcr.io/kube-scheduler:v1.20.6
 
# docker pull k8s.gcr.io/kube-proxy:v1.20.6
v1.20.6: Pulling from kube-proxy
e5a8c1ed6cf1: Pull complete
f275df365c13: Pull complete
6a2802bb94f4: Pull complete
cb3853c52da4: Pull complete
db342cbe4b1c: Pull complete
9a72dd095a53: Pull complete
a6a3a90a2713: Pull complete
Digest: sha256:7c1710c965f55bca8d06ebd8d5774ecd9ef924f33fb024e424c2b9b565f477dc
Status: Downloaded newer image for k8s.gcr.io/kube-proxy:v1.20.6
k8s.gcr.io/kube-proxy:v1.20.6
 
# docker pull k8s.gcr.io/pause:3.4.1
3.4.1: Pulling from pause
fac425775c9d: Pull complete
Digest: sha256:6c3835cab3980f11b83277305d0d736051c32b17606f5ec59f1dda67c9ba3810
Status: Downloaded newer image for k8s.gcr.io/pause:3.4.1
k8s.gcr.io/pause:3.4.1
 
# docker image ls
REPOSITORY                                                 TAG           IMAGE ID       CREATED         SIZE
k8s.gcr.io/kube-proxy                                      v1.20.6       9a1ebfd8124d   12 days ago     118MB
k8s.gcr.io/kube-scheduler                                  v1.20.6       b93ab2ec4475   12 days ago     47.2MB
k8s.gcr.io/kube-controller-manager                         v1.20.6       560dd11d4550   12 days ago     116MB
k8s.gcr.io/kube-apiserver                                  v1.20.6       b05d611c1af9   12 days ago     122MB
k8s.gcr.io/pause                                           3.4.1         0f8457a4c2ec   3 months ago    683kB

Next up is to tag all the images so they’ll be hosted locally on the bldr0cuomrepo1.internal.pri server.

# docker tag k8s.gcr.io/kube-apiserver:v1.20.6          bldr0cuomrepo1.internal.pri:5000/kube-apiserver:v1.20.6
# docker tag k8s.gcr.io/kube-controller-manager:v1.20.6 bldr0cuomrepo1.internal.pri:5000/kube-controller-manager:v1.20.6
# docker tag k8s.gcr.io/kube-scheduler:v1.20.6          bldr0cuomrepo1.internal.pri:5000/kube-scheduler:v1.20.6
# docker tag k8s.gcr.io/kube-proxy:v1.20.6              bldr0cuomrepo1.internal.pri:5000/kube-proxy:v1.20.6
# docker tag k8s.gcr.io/pause:3.4.1                     bldr0cuomrepo1.internal.pri:5000/pause:3.4.1
 
# docker image ls
REPOSITORY                                                 TAG           IMAGE ID       CREATED         SIZE
bldr0cuomrepo1.internal.pri:5000/kube-proxy                v1.20.6       9a1ebfd8124d   12 days ago     118MB
k8s.gcr.io/kube-proxy                                      v1.20.6       9a1ebfd8124d   12 days ago     118MB
bldr0cuomrepo1.internal.pri:5000/kube-controller-manager   v1.20.6       560dd11d4550   12 days ago     116MB
k8s.gcr.io/kube-controller-manager                         v1.20.6       560dd11d4550   12 days ago     116MB
k8s.gcr.io/kube-scheduler                                  v1.20.6       b93ab2ec4475   12 days ago     47.2MB
bldr0cuomrepo1.internal.pri:5000/kube-scheduler            v1.20.6       b93ab2ec4475   12 days ago     47.2MB
k8s.gcr.io/kube-apiserver                                  v1.20.6       b05d611c1af9   12 days ago     122MB
bldr0cuomrepo1.internal.pri:5000/kube-apiserver            v1.20.6       b05d611c1af9   12 days ago     122MB
bldr0cuomrepo1.internal.pri:5000/pause                     3.4.1         0f8457a4c2ec   3 months ago    683kB
k8s.gcr.io/pause                                           3.4.1         0f8457a4c2ec   3 months ago    683kB

The final step is to push them all up to the local repository.

# docker push bldr0cuomrepo1.internal.pri:5000/kube-apiserver:v1.20.6
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/kube-apiserver]
d88bc16e0414: Pushed
a06ec64d2560: Pushed
28699c71935f: Pushed
v1.20.6: digest: sha256:d21627934fb7546255475a7ab4472ebc1ae7952cc7ee31509ee630376c3eea03 size: 949
 
# docker push bldr0cuomrepo1.internal.pri:5000/kube-controller-manager:v1.20.6
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/kube-controller-manager]
1387661b583c: Pushed
a06ec64d2560: Mounted from kube-apiserver
28699c71935f: Mounted from kube-apiserver
v1.20.6: digest: sha256:ca13f2bf278e3157d75fd08a369390b98f976c6af502d4579a9ab62b97248b5b size: 949
 
# docker push bldr0cuomrepo1.internal.pri:5000/kube-scheduler:v1.20.6
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/kube-scheduler]
f17938017a0a: Pushed
a06ec64d2560: Mounted from kube-controller-manager
28699c71935f: Mounted from kube-controller-manager
v1.20.6: digest: sha256:eee174e9eb4499f31bfb10d0350de87ea90431f949716cc4af1b5c899aab2058 size: 949
 
# docker push bldr0cuomrepo1.internal.pri:5000/kube-proxy:v1.20.6
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/kube-proxy]
0c96004b5be1: Pushed
94812b0f02ce: Pushed
3a90582021f9: Pushed
f6be8a0f65af: Pushed
2b046f2c8708: Pushed
6ee930b14c6f: Pushed
f00bc8568f7b: Pushed
v1.20.6: digest: sha256:1689b5ac14d4d6e202a6752573818ce952e0bd3359b6210707b8b2031fedaa4d size: 1786
 
# docker push bldr0cuomrepo1.internal.pri:5000/pause:3.4.1
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/pause]
915e8870f7d1: Pushed
3.4.1: digest: sha256:9ec1e780f5c0196af7b28f135ffc0533eddcb0a54a0ba8b32943303ce76fe70d size: 526

Software Preparations

This section describes the updates that need to be made to the various containers that are installed in the Kubernetes clusters. Most of the changes involve updating the location to point to my Docker Repository vs pulling directly from the Internet.

You’ll need to clone if new, or pull the current playbook repo from gitlab as all the work will be done in various directories under the kubernetes/configurations directory. You’ll want to do that before continuing. All subsequent sections assume you’re in the kubernetes/configurations directory.

$ git clone git@lnmt1cuomgitlab.internal.pri:external-unix/playbooks.git
$ git pull git@lnmt1cuomgitlab.internal.pri:external-unix/playbooks.git

Make sure you add and commit the changes to your repo.

$ git add [file]
$ git commit [file] -m "commit comment"

And once done with all the updates, push the changes back up to gitlab.

$ git push

Update calico.yaml

In the calico directory run the following command to get the current calico.yaml file:

$ curl https://docs.projectcalico.org/manifests/calico.yaml -O

Basically grep out the image lines and pull the new images down to the local repository in order to retrieve the images locally.

# docker pull docker.io/calico/cni:v3.18.2
v3.18.2: Pulling from calico/cni
69606a78e084: Pull complete
85f85638f4b8: Pull complete
70ce15fa0c8a: Pull complete
Digest: sha256:664e1667fae09516a170ddd86e1a9c3bd021442f1e1c1fad19ce33d5b68bb58e
Status: Downloaded newer image for calico/cni:v3.18.2
docker.io/calico/cni:v3.18.2
 
# docker pull docker.io/calico/pod2daemon-flexvol:v3.18.2
v3.18.2: Pulling from calico/pod2daemon-flexvol
a5a0edbd6170: Pull complete
b10b71798d0d: Pull complete
5c3c4f282980: Pull complete
052e1842c6c3: Pull complete
6f392ce4dbcf: Pull complete
bc1f9a256ba0: Pull complete
fa4be31a19e9: Pull complete
Digest: sha256:7808a18ac025d3b154a9ddb7ca6439565d0af52a37e166cb1a14dcdb20caed67
Status: Downloaded newer image for calico/pod2daemon-flexvol:v3.18.2
docker.io/calico/pod2daemon-flexvol:v3.18.2
 
# docker pull docker.io/calico/node:v3.18.2
v3.18.2: Pulling from calico/node
2aee75817f4e: Pull complete
e1c64009c125: Pull complete
Digest: sha256:c598c6d5f43080f4696af03dd8784ad861b40c718ffbba5536b14dbf3b2349af
Status: Downloaded newer image for calico/node:v3.18.2
docker.io/calico/node:v3.18.2
 
# docker pull docker.io/calico/kube-controllers:v3.18.2
v3.18.2: Pulling from calico/kube-controllers
94ca07728981: Pull complete
c86a87d48320: Pull complete
f257a15e509c: Pull complete
8aad47abc588: Pull complete
Digest: sha256:ae544f188f2bd9d2fcd4b1f2b9a031c903ccaff8430737d6555833a81f4824d1
Status: Downloaded newer image for calico/kube-controllers:v3.18.2
docker.io/calico/kube-controllers:v3.18.2

Then tag the images for local storage.

# docker tag calico/cni:v3.18.2                bldr0cuomrepo1.internal.pri:5000/cni:v3.18.2
# docker tag calico/pod2daemon-flexvol:v3.18.2 bldr0cuomrepo1.internal.pri:5000/pod2daemon-flexvol:v3.18.2
# docker tag calico/node:v3.18.2               bldr0cuomrepo1.internal.pri:5000/node:v3.18.2
# docker tag calico/kube-controllers:v3.18.2   bldr0cuomrepo1.internal.pri:5000/kube-controllers:v3.18.2

Then push them up to the local repository.

# docker push bldr0cuomrepo1.internal.pri:5000/cni:v3.18.2
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/cni]
145c410196dc: Pushed
aec93328a278: Pushed
fd6f5b9d2ec9: Pushed
v3.18.2: digest: sha256:42ffea5056c9b61783423e16390869cdc16a8797eb9231cf7c747fe70371dfef size: 946
 
# docker push bldr0cuomrepo1.internal.pri:5000/pod2daemon-flexvol:v3.18.2
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/pod2daemon-flexvol]
125832445a60: Pushed
682e2fee7907: Pushed
12f496e83a60: Pushed
45acaaeabd00: Pushed
427dd33e9f20: Pushed
76ecd8aaf249: Pushed
63c82d5fed4a: Pushed
v3.18.2: digest: sha256:f243b72138e8e1d0e6399d000c03f38a052f54234f3d3b8a292f3c868a51ab07 size: 1788
 
# docker push bldr0cuomrepo1.internal.pri:5000/node:v3.18.2
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/node]
7c3bf8ac29b3: Pushed
534f69678b53: Pushed
v3.18.2: digest: sha256:d51436d6da50afc73d9de086aa03f7abd6938ecf2a838666a0e5ccb8dee25087 size: 737
 
# docker push bldr0cuomrepo1.internal.pri:5000/kube-controllers:v3.18.2
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/kube-controllers]
5d1855397d0b: Pushed
4769d3354700: Pushed
4ea3707886e0: Pushed
054ba5c2f771: Pushed
v3.18.2: digest: sha256:d8d2c4a98bbdbfd19fe2e4cc9492552852a9d11628e338142b1d1268b51593ce size: 1155

Edit the file, search for image: and insert in front of the images, the image path:

bldr0cuomrepo1.internal.pri:5000

Make sure you follow the documentation to update calicoctl to 3.18.2 as well.

Update metrics-server

In the metrics-server directory, back up the existing components.yaml file and run the following command to get the current components.yaml file:

$ wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.1/components.yaml

Run a diff against the two files to see what might have changed. Then edit the file, search for image: and replace k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000.

Download the new image and save it locally.

# docker pull k8s.gcr.io/metrics-server/metrics-server:v0.4.3
v0.4.3: Pulling from metrics-server/metrics-server
5dea5ec2316d: Pull complete
ef7ee42a1880: Pull complete
Digest: sha256:eb6b6153494087bde59ceb14e68280f1fbdd17cfff2efc3a68e30a1adfa8807d
Status: Downloaded newer image for k8s.gcr.io/metrics-server/metrics-server:v0.4.3
k8s.gcr.io/metrics-server/metrics-server:v0.4.3

Tag the image.

# docker tag k8s.gcr.io/metrics-server/metrics-server:v0.4.3 bldr0cuomrepo1.internal.pri:5000/metrics-server:v0.4.3

And push the newly tagged image.

# docker push bldr0cuomrepo1.internal.pri:5000/metrics-server:v0.4.3
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/metrics-server]
abc161b95845: Pushed
417cb9b79ade: Pushed
v0.4.3: digest: sha256:2b6814cb0b058b753cb6cdfe906493a8128fabb03d405f60024a47ab49ddaa09 size: 739

Update kube-state-metrics

Updating kube-state-metrics is a bit more involved as there are several files that are part of the distribution however you only need a small subset. You’ll need to clone or pull the kube-state-metrics repo.

$ git clone https://github.com/kubernetes/kube-state-metrics.git

Once you have the repo, in the kube-state-metrics/examples/standard directory, copy all the files into the playbooks kube-state-metrics directory.

Edit the deployment.yaml file, search for image: and replace quay.io with bldr0cuomrepo1.internal.pri:5000

After you’ve updated the files, download the image:

# docker pull k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
v2.0.0: Pulling from kube-state-metrics/kube-state-metrics
5dea5ec2316d: Already exists
2c0aab77c223: Pull complete
Digest: sha256:eb2f41024a583e8795213726099c6f9432f2d64ab3754cc8ab8d00bdbc328910
Status: Downloaded newer image for k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0

Tag the image.

# docker tag k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0 bldr0cuomrepo1.internal.pri:5000/kube-state-metrics:v2.0.0

And push the newly tagged image.

# docker push bldr0cuomrepo1.internal.pri:5000/kube-state-metrics:v2.0.0
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/kube-state-metrics]
d2bc11882435: Pushed
417cb9b79ade: Mounted from metrics-server
v2.0.0: digest: sha256:ee13833414a49b0d2370e8edff5844eba96630cda80cfcd37c444bf88522cc51 size: 738

Update filebeat-kubernetes.yaml

In the filebeat directory, run the following command to get the current filebeat-kubernetes.yaml file:

curl -L -O https://raw.githubusercontent.com/elastic/beats/7.12/deploy/kubernetes/filebeat-kubernetes.yaml

Change all references in the filebeat-kubernetes.yaml file from kube-system to monitoring. If a new installation, create the monitoring namespace.

Update the local repository with the new docker image.

# docker pull docker.elastic.co/beats/filebeat:7.12.1
7.12.1: Pulling from beats/filebeat
a4f595742a5b: Pull complete
f7bc9401458a: Pull complete
ce7f9b59a9d3: Pull complete
e0ba09632c1a: Pull complete
3a0a0a9a5b5f: Pull complete
4f7abff72235: Pull complete
8cf479d85574: Pull complete
3b62c2ebd4b6: Pull complete
79a6ebf558dc: Pull complete
0c22790a6b07: Pull complete
dfd98a660972: Pull complete
Digest: sha256:e9558ca6e2df72a7933d4f175d85e8cf352da08bc32d97943bb844745d4a063a
Status: Downloaded newer image for docker.elastic.co/beats/filebeat:7.12.1
docker.elastic.co/beats/filebeat:7.12.1

Tag the image appropriately.

# docker tag docker.elastic.co/beats/filebeat:7.12.1 bldr0cuomrepo1.internal.pri:5000/filebeat:7.12.1

Finally, push it up to the local repository.

# docker push bldr0cuomrepo1.internal.pri:5000/filebeat:7.12.1
The push refers to repository [bldr0cuomrepo1.internal.pri:5000/filebeat]
446d15d628e2: Pushed
19bc11b9258e: Pushed
8ee55e79c98f: Pushed
851de8b3f92f: Pushed
eacdcb47588f: Pushed
bc27d098296e: Pushed
9c4f2da5ee8b: Pushed
2c278752a013: Pushed
bd82c7b8fd60: Pushed
f9b1f5eda8ab: Pushed
174f56854903: Pushed
7.12.1: digest: sha256:02a034166c71785f5c2d1787cc607994f68aa0521734d11da91f8fbd0cfdc640 size: 2616

Once the image is hosted locally, copy the file into each of the cluster directories and make the following changes.

DaemonSet Changes

In the filebeat folder are two files. A config file and an update file. These files automatically make changes to the filebeat-kubernetes.yaml file based on some of the changes that are performed below. The below changes are made to prepare for the script which populates the different clusters with correct information.

  • Switches the docker.elastic.co/beats image with bldr0cuomrepo1.internal.pri:5000
  • Replaces <elasticsearch> with the actual ELK Master server name
  • Switches the kube-system namespace with monitoring. You’ll need to ensure the monitoring namespace has been created before applying this .yaml file.
  • Replaces DEPLOY_ENV with the expected deployment environment name; dev, sqa, staging, or prod. These names are used in the ELK cluster to easily identify where the logs are sourced.

In order for the script to work, change the values in the following lines to match:

        - name: ELASTICSEARCH_HOST
          value: "<elasticsearch>"
        - name: ELASTICSEARCH_PORT
          value: "9200"
        - name: ELASTICSEARCH_USERNAME
          value: ""
        - name: ELASTICSEARCH_PASSWORD
          value: ""

In addition, remove the following lines. They confuse the container if they exist.

        - name: ELASTIC_CLOUD_ID
          value:
        - name: ELASTIC_CLOUD_AUTH
          value:

Add the default username and password to the following lines as noted:

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME:elastic}
      password: ${ELASTICSEARCH_PASSWORD:changeme}
ConfigMap Changes

In the ConfigMap section, activate the filebeat.autodiscover section by uncommenting it and delete the filebeat.inputs configuration section. In the filebeat.autodiscover section make the following three changes:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      host: ${NODE_NAME}                          # rename node to host
      hints.enabled: true
      hints.default_config.enabled: false         # add this line
      hints.default_config:
        type: container
        paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log
        exclude_lines: ["^\\s+[\\-`('.|_]"]  # drop asciiart lines  # add this line

In the processors section, remove the cloud.id and cloud.auth lines, add the following lines, and change DEPLOY_ENV to the environment filebeat is being deployed to: dev, sqa, staging, or prod. Indentation is important!

processors:
- add_cloud_metadata:
- add_host_metadata:
- add_fields:                             # add these 4 lines. pay attention to indentation!
target: ''
fields:
environment: 'DEPLOY_ENV'
Elastic Stack in Development

This Elastic Stack cluster is used by the Development Kubernetes clusters. Update the files in the bldr0-0 directory.

- name: ELASTICSEARCH_HOST
  value: bldr0cuomifem1.internal.pri
Elastic Stack in QA

This Elastic Stack cluster is used by the QA Kubernetes clusters. Update the files in the cabo0-0 directory.

- name: ELASTICSEARCH_HOST
  value: cabo0cuomifem1.internal.pri
Elastic Stack in Staging

This Elastic Stack cluster is used by the Staging Kubernetes clusters. Update the files in the tato0-1 directory.

- name: ELASTICSEARCH_HOST
  value: tato0cuomifem1.internal.pri
Elastic Stack in Production

This Elastic Stack cluster is used by the Production Kubernetes cluster. Update the file in the lnmt1-2 directory.

- name: ELASTICSEARCH_HOST
  value: lnmt1cuelkmstr1.internal.pri
Posted in Computers, Kubernetes | Tagged , | Leave a comment

Kubernetes Manual Upgrade to 1.20.6

Upgrading Kubernetes Clusters

This documentation is intended to provide the manual process for upgrading the server Operating Systems, Kubernetes to 1.20.6, and any additional updates. This provides example output and should help in troubleshooting should the automated processes experience a problem.

All of the steps required to prepare for an installation should be completed prior to starting this process.

Server and Kubernetes Upgrades

Patch Servers

As part of quarterly upgrades, the Operating Systems for all servers need to be upgraded.

For the control plane, there isn’t a “pool” so just patch each server and reboot it. Do one server at a time and check the status of the cluster before moving to subsequent master servers on the control plane.

For the worker nodes, you’ll need to drain each of the workers before patching and rebooting. Run the following command to both confirm the current version of 1.19.6 and that all nodes are in a Ready state to be patched:

$ kubectl get nodes
NAME                           STATUS   ROLES    AGE    VERSION
bldr0cuomknode1.internal.pri   Ready    <none>   214d   v1.19.6
bldr0cuomknode2.internal.pri   Ready    <none>   214d   v1.19.6
bldr0cuomknode3.internal.pri   Ready    <none>   214d   v1.19.6
bldr0cuomkube1.internal.pri    Ready    master   214d   v1.19.6
bldr0cuomkube2.internal.pri    Ready    master   214d   v1.19.6
bldr0cuomkube3.internal.pri    Ready    master   214d   v1.19.6

To drain a server, patch, and then return the server to the pool, follow the steps below.

kubectl drain [nodename] --delete-local-data --ignore-daemonsets

Then patch the server and reboot:

yum upgrade -yshutdown -t 0 now -r

Finally bring the node back into the pool.

kubectl uncordon [nodename]

Update Versionlock Information

Currently the clusters have locked kubernetes to version 1.19.6, kubernetes-cni to version 0.8.7, and docker to 1.13.1-203. The locks on each server need to be removed and new locks put in place for the new versions of kubernetes, kubernetes-cni, and docker where appropriate.

Versionlock file location: /etc/yum/pluginconf.d/

Simply delete the existing locks:

/usr/bin/yum versionlock delete "kubelet.*"
/usr/bin/yum versionlock delete "kubectl.*"
/usr/bin/yum versionlock delete "kubeadm.*"
/usr/bin/yum versionlock delete "kubernetes-cni.*"
/usr/bin/yum versionlock delete "docker.*"
/usr/bin/yum versionlock delete "docker-common.*"
/usr/bin/yum versionlock delete "docker-client.*"
/usr/bin/yum versionlock delete "docker-rhel-push-plugin.*"

And then add in the new locks at the desired levels:

/usr/bin/yum versionlock add "kubelet-1.20.6-0.*"
/usr/bin/yum versionlock add "kubectl-1.20.6-0.*"
/usr/bin/yum versionlock add "kubeadm-1.20.6-0.*"
/usr/bin/yum versionlock "docker-1.13.1-204.*"
/usr/bin/yum versionlock "docker-common-1.13.1-204.*"
/usr/bin/yum versionlock "docker-client-1.13.1-204.*"
/usr/bin/yum versionlock "docker-rhel-push-plugin-1.13.1-204.*"

Then install the updated kubernetes and docker binaries. Note that the versionlocked versions and the installed version must match:

/usr/bin/yum install kubelet-1.20.6-0.x86_64
/usr/bin/yum install kubectl-1.20.6-0.x86_64
/usr/bin/yum install kubeadm-1.20.6-0.x86_64
/usr/bin/yum install docker-1.13.1-204.git0be3e21.el7_8.x86_64
/usr/bin/yum install docker-common-1.13.1-204.git0be3e21.el7*
/usr/bin/yum install docker-client-1.13.1-204.git0be3e21.el7*
/usr/bin/yum install docker-rhel-push-plugin-1.13.1-204.git0be3e21.el7*

2.3 Upgrade Kubernetes

Using the kubeadm command on the first master server, you can review the plan and then upgrade the cluster:

# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.19.6
[upgrade/versions] kubeadm version: v1.20.6
I0427 17:46:38.139615   20479 version.go:254] remote version is much newer: v1.21.0; falling back to: stable-1.20
[upgrade/versions] Latest stable version: v1.20.6
[upgrade/versions] Latest stable version: v1.20.6
[upgrade/versions] Latest version in the v1.19 series: v1.19.10
[upgrade/versions] Latest version in the v1.19 series: v1.19.10
 
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
kubelet     6 x v1.19.6   v1.19.10
 
Upgrade to the latest version in the v1.19 series:
 
COMPONENT                 CURRENT    AVAILABLE
kube-apiserver            v1.19.6    v1.19.10
kube-controller-manager   v1.19.6    v1.19.10
kube-scheduler            v1.19.6    v1.19.10
kube-proxy                v1.19.6    v1.19.10
CoreDNS                   1.7.0      1.7.0
etcd                      3.4.13-0   3.4.13-0
 
You can now apply the upgrade by executing the following command:
 
kubeadm upgrade apply v1.19.10
 
_____________________________________________________________________
 
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
kubelet     6 x v1.19.6   v1.20.6
 
Upgrade to the latest stable version:
 
COMPONENT                 CURRENT    AVAILABLE
kube-apiserver            v1.19.6    v1.20.6
kube-controller-manager   v1.19.6    v1.20.6
kube-scheduler            v1.19.6    v1.20.6
kube-proxy                v1.19.6    v1.20.6
CoreDNS                   1.7.0      1.7.0
etcd                      3.4.13-0   3.4.13-0
 
You can now apply the upgrade by executing the following command:
 
kubeadm upgrade apply v1.20.6
 
_____________________________________________________________________
 
 
The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.
 
API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no
_____________________________________________________________________

There are likely newer versions of Kubernetes control plane containers available. In order to maintain consistency across all clusters, only upgrade the masters to 1.19.6:

# kubeadm upgrade apply v1.20.6
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.20.6"
[upgrade/versions] Cluster version: v1.19.6
[upgrade/versions] kubeadm version: v1.20.6
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.20.6"...
Static pod: kube-apiserver-bldr0cuomkube1.internal.pri hash: 2742aa8dcdc3cb47ed265f67f1a04783
Static pod: kube-controller-manager-bldr0cuomkube1.internal.pri hash: dd7adc86b875b67ba03820b12d904fa9
Static pod: kube-scheduler-bldr0cuomkube1.internal.pri hash: 6a43bc71ab534486758c1d56bd907ea3
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-bldr0cuomkube1.internal.pri hash: 7e320baf6cd06f441f462de7da1d6f05
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Renewing etcd-server certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-04-27-23-31-35/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: etcd-bldr0cuomkube1.internal.pri hash: 7e320baf6cd06f441f462de7da1d6f05
Static pod: etcd-bldr0cuomkube1.internal.pri hash: 7e320baf6cd06f441f462de7da1d6f05
Static pod: etcd-bldr0cuomkube1.internal.pri hash: 7e320baf6cd06f441f462de7da1d6f05
Static pod: etcd-bldr0cuomkube1.internal.pri hash: 7e320baf6cd06f441f462de7da1d6f05
...
[apiclient] Found 3 Pods for label selector component=etcd
[upgrade/staticpods] Component "etcd" upgraded successfully!
[upgrade/etcd] Waiting for etcd to become available
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests040252515"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-04-27-23-31-35/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-bldr0cuomkube1.internal.pri hash: 2742aa8dcdc3cb47ed265f67f1a04783
Static pod: kube-apiserver-bldr0cuomkube1.internal.pri hash: 7426ddce1aafd033ae049eefb6d56b1e
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-04-27-23-31-35/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-bldr0cuomkube1.internal.pri hash: dd7adc86b875b67ba03820b12d904fa9
Static pod: kube-controller-manager-bldr0cuomkube1.internal.pri hash: 281525a644d92747499c625139b84436
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-04-27-23-31-35/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-bldr0cuomkube1.internal.pri hash: 6a43bc71ab534486758c1d56bd907ea3
Static pod: kube-scheduler-bldr0cuomkube1.internal.pri hash: aa70347866b81f5866423fcccb0c6aca
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upgrade/postupgrade] Applying label node-role.kubernetes.io/control-plane='' to Nodes with label node-role.kubernetes.io/master='' (deprecated)
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
 
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.6". Enjoy!
 
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
Update Control Planes

On the second and third master, run the kubeadm upgrade apply v1.20.6 command and the control plane will be upgraded.

Update File and Directory Permissions

Verify the permissions match the table below once the upgrade is complete:

/etc/kubernetes/manifests/etcd.yamlroot:root0644
/etc/kubernetes/manifests/kube-apiserver.yamlroot:root0644
/etc/kubernetes/manifests/kube-controller-manager.yamlroot:root0644
/etc/kubernetes/manifests/kube-schedulerroot:root0644
/var/lib/etcdroot:root0700
/etc/kubernetes/admin.confroot:root0644
/etc/kubernetes/scheduler.confroot:root0644
/etc/kubernetes/controller-manager.confroot:root0644
/etc/kubernetes/pkiroot:root0755
/etc/kubernetes/pki/ca.crtroot:root0644
/etc/kubernetes/pki/apiserver.crtroot:root0644
/etc/kubernetes/pki/apiserver-kubelet-client.crtroot:root0644
/etc/kubernetes/pki/front-proxy-ca.crtroot:root0644
/etc/kubernetes/pki/front-proxy-client.crtroot:root0644
/etc/kubernetes/pki/sa.pubroot:root0644
/etc/kubernetes/pki/ca.keyroot:root0600
/etc/kubernetes/pki/apiserver.keyroot:root0600
/etc/kubernetes/pki/apiserver-kubelet-client.keyroot:root0600
/etc/kubernetes/pki/front-proxy-ca.keyroot:root0600
/etc/kubernetes/pki/front-proxy-client.keyroot:root0600
/etc/kubernetes/pki/sa.keyroot:root0600
/etc/kubernetes/pki/etcdroot:root0755
/etc/kubernetes/pki/etcd/ca.crtroot:root0644
/etc/kubernetes/pki/etcd/server.crtroot:root0644
/etc/kubernetes/pki/etcd/peer.crtroot:root0644
/etc/kubernetes/pki/etcd/healthcheck-client.crtroot:root0644
/etc/kubernetes/pki/etcd/ca.keyroot:root0600
/etc/kubernetes/pki/etcd/server.keyroot:root0600
/etc/kubernetes/pki/etcd/peer.keyroot:root0600
/etc/kubernetes/pki/etcd/healthcheck-client.keyroot:root0600

Update Manifests

During the kubeadm upgrade, the current control plane manifests are moved from /etc/kubernetes/manifests into /etc/kubernetes/tmp and new manifest files deployed. There are multiple settings and permissions that need to be reviewed and updated before the task is considered completed.

The kubeadm-config configmap has been updated to point to bldr0cuomrepo1.internal.pri:5000 however it and the various container configurations should be checked anyway. One of the issues is if it’s not updated or used, you’ll have to make the update manually including manually editing the kube-proxy daemonset configuration.

Note that when a manifest is updated, the associated image is reloaded. No need to manage the pods once manifests are updated.

etcd Manifest

Verify and update etcd.yaml

  • Change imagePullPolicy to Always.
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000
kube-apiserver Manifest

Verify and update kube-apiserver.yaml

  • Add AlwaysPullImages and ResourceQuota admission controllers to the –enable-admission-plugins line
  • Change imagePullPolicy to Always.
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000
kube-controller-manager Manifest

Verify and update kube-controller-manager.yaml

  • Add “ – –cluster-name=kubecluster-[site]” after “ – –cluster-cidr=192.168.0.0/16
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000
kube-scheduler Manifest

Varify and update kube-scheduler.yaml

  • Change imagePullPolicy to Always.
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Update kube-proxy

Verify where the kube-proxy images is being loaded from. If not the local repository, you’ll need to edit the kube-proxy daemonset to change the imagePullPolicy. Check the image tag at the same time.

kubectl edit daemonset kube-proxy -n kube-system
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Update coredns

Verify where the coredns images is being loaded from. If not the local repository, you’ll need to edit the coredns deployment to change the imagePullPolicy. Check the image tag at the same time.

kubectl edit deployment coredns -n kube-system
  • Change imagePullPolicy to Always
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Restart kubelet

Once done, kubelet and docker needs to be restarted on all nodes.

systemctl daemon-reload
systemctl restart kubelet
systemctl restart docker

Verify

Once kubelet has been restarted on all nodes, verify all nodes are at 1.20.6.

$ kubectl get nodes
NAME                           STATUS   ROLES                  AGE    VERSION
bldr0cuomknode1.internal.pri   Ready    <none>                 215d   v1.20.6
bldr0cuomknode2.internal.pri   Ready    <none>                 215d   v1.20.6
bldr0cuomknode3.internal.pri   Ready    <none>                 215d   v1.20.6
bldr0cuomkube1.internal.pri    Ready    control-plane,master   215d   v1.20.6
bldr0cuomkube2.internal.pri    Ready    control-plane,master   215d   v1.20.6
bldr0cuomkube3.internal.pri    Ready    control-plane,master   215d   v1.20.6

Configuration Upgrades

Configuration files are on the tool servers (lnmt1cuomtool11) in the /usr/local/admin/playbooks/cschelin/kubernetes/configurations directory and the expectation is you’ll be in that directory when directed to apply configurations.

Calico Upgrade

In the calico directory, run the following command:

$ kubectl apply -f calico.yaml
configmap/calico-config unchanged
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers configured
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.io/calico-node unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
daemonset.apps/calico-node configured
serviceaccount/calico-node unchanged
deployment.apps/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged
poddisruptionbudget.policy/calico-kube-controllers unchanged

After calico.yaml is applied, the calico-kube-controllers pod will restart and then the calico-node pod restarts to retrieve the updated image.

Pull the calicoctl binary and copy it to /usr/local/bin, then verify the version. Note that this has likely already been done on the tool server. Verify it before pulling the binary.

$ curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.18.2/calicoctl
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
100   615  100   615    0     0    974      0 --:--:-- --:--:-- --:--:--   974
100 38.1M  100 38.1M    0     0  1505k      0  0:00:25  0:00:25 --:--:-- 1562k
Verification
$ calicoctl version
Client Version:    v3.18.2
Git commit:        528c5860
Cluster Version:   v3.18.2
Cluster Type:      k8s,bgp,kubeadm,kdd
Update CNI File Permissions

Verify the permissions of the files once the upgrade is complete.

/etc/cni/net.d/10-calico-conflistroot:root644
/etc/cni/net.d/calico-kubeconfigroot:root644

metrics-server Upgrade

In the metrics-server directory, run the following command:

$ kubectl apply -f components.yaml
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged

Once the metrics-server deployment is updated, the pod will restart.

kube-state-metrics Upgrade

In this case, we’ll be applying the entire directory so from the configurations directory, apply the following command:

$ kubectl apply -f kube-state-metrics/
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics configured
clusterrole.rbac.authorization.k8s.io/kube-state-metrics configured
deployment.apps/kube-state-metrics configured
serviceaccount/kube-state-metrics configured
service/kube-state-metrics configured

Once the kube-state-metrics deployment is updated, the pod will restart.

Filebeat Upgrade

Filebeat uses Elastic Stack clusters in four environments. Filebeat itself is installed on all clusters. Ensure you’re managing the correct cluster when upgrading the filebeat container as configurations are specific to each cluster.

Change to the appropriate cluster context directory and run the following command:

$ kubectl apply -f filebeat-kubernetes.yaml
configmap/filebeat-config configured
daemonset.apps/filebeat configured
clusterrolebinding.rbac.authorization.k8s.io/filebeat unchanged
clusterrole.rbac.authorization.k8s.io/filebeat configured
serviceaccount/filebeat unchanged
Verification

Essentially monitor each cluster. You should see the filebeat containers restarting and returning to a Running state.

$ kubectl get pods -n monitoring -o wide
Posted in Computers, Kubernetes | Tagged , | Leave a comment

Kubernetes Ansible Upgrade to 1.20.6

Upgrading Kubernetes Clusters

This document provides a guide to upgrading the Kubernetes clusters in the quickest manner. Much of the upgrade process can be done using Ansible Playbooks. There are a few processes that need to be done centrally on the tool server. And the OS and control plane updates are also manual in part due to the requirement to manually remove servers from the Kubernetes API pool.

In most cases, examples are not provided as it is assumed that you are familiar with the processes and can perform the updates without having to be reminded of how to verify.

For any process that is performed with an Ansible Playbook, it is assumed you are on the lnmt1cuomtool11 server in the /usr/local/admin/playbooks/cschelin/kubernetes directory. All Ansible related steps expect to start from that directory. In addition, the application of pod configurations will be in the configurations subdirectory.

Perform Upgrades

Patch Servers

In the 00-osupgrade directory, you’ll be running the master and worker scripts. I recommend opening two windows, one for master and one for worker, and running each script with master -t [tag] and worker -t [tag]. This will verify a node is Ready, drain the node from the pool if a worker, perform a yum upgrade and reboot, uncordon again if a worker, and verify the nodes are Ready again. Should a node fail to be ready in time, the script will exit.

Update Versionlock

In the 03-packages directory, run the update -t [tag] script. This will install yum-plugin-versionlock if missing, remove old versionlocks, create new versionlocks for kubernetes, kubernetes-cni, and docker, and then the components will be upgraded.

Upgrade Kubernetes

Using the kubeadm command on the first master server, upgrade the first master server.

# kubeadm upgrade apply v1.20.6
Update Control Planes

On the second and third master, run the kubeadm upgrade apply v1.20.6 command and the control plane will be upgraded.

Update kube-proxy

Check the kube-proxy daemonset and update the image tag if required.

kubectl edit daemonset kube-proxy -n kube-system
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Update coredns

Check the coredns deployment and update the image tag if required.

kubectl edit deployment coredns -n kube-system
  • Change image switching k8s.gcr.io with bldr0cuomrepo1.internal.pri:5000

Save the changes.

Restart kubelet and docker

In the 04-kubelet directory, run the update -t [tag] script. This will restart kubelet and docker on all servers.

Calico Upgrade

In the configurations/calico directory, run the following command:

$ kubectl apply -f calico.yaml

calicoctl Upgrade

Pull the updated calicoctl binary and copy it to /usr/local/bin. It’s likely already there but verify.

$ curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.18.2/calicoctl

kube-state-metrics Upgrade

In the configurations directory, /kube-state-metrics directory, run the following command:

$ kubectl apply -f kube-state-metrics/

metrics-server Upgrade

In the configurations/metrics-server directory, run the following command:

$ kubectl apply -f components.yaml

Filebeat Upgrade

In the configurations directory, change to the appropriate cluster context directory (bldr0-0, cabo0-0, tato0-1, and lnmt1-2) and run the following command:

$ kubectl apply -f filebeat-kubernetes.yaml

Update File and Directory Permissions and Manifests

In the postinstall directory, run the update -s [site] script. This will perform the following steps.

  • Add the cluster-name to the kube-controller-manager.yaml file
  • Update the imagePullPolicy and image lines to all manifests
  • Add the AlwaysPullImages and ResourceQuota admission controllers to the kube-apiserver.yaml file
  • Update the permissions of all files and directories.
Posted in Computers, Kubernetes | Tagged , | Leave a comment

Ansible Tags – A Story

Started a new job back in October. The team is just me and another guy and the boss. And the other guy quit in December.

The real good thing is it’s a small single project shop and pretty much all the server work is done with Ansible so lots of playbooks. Of course the bad thing is it’s just me so I’m dissecting the playbooks to see what the previous folks did and why.

One of the things is the use of Tags. There are defined tags in several places but in the calling playbook and apparently not used when running the playbook or in the roles. It’s not defined in any documentation (what little there is) and the playbooks themselves don’t seem to need the tags.

I pulled up the Ansible docs on tags, checked a couple of youtube videos and an O’Reilly book and really didn’t see a need for Tags. Anything large enough where Tags might be useful probably should be broken down into smaller tasks anyway.

Then the boss made a request. We’re changing the IPs in the load balancer and the load balancer IP and I’d like it done via Ansible.

My first attempt was a task with a list of old IPs and a second task with a list of the new IPs. Use with_items and go. Added a backout task in case there was a problem that just reversed the lists.

Boss updated the request. We bring down Side A first, test to make sure it’s good, then Side B. A sequential list of tasks vs just delete and add. Okay, let’s see…

Started creating a bunch of little playbooks in part because of a manual check between changes.

  • Remove Side A from the Load Balancer
  • Remove the old IP from Side A
  • Add the new IP to Side A
  • Validate
  • Add Side A back to the Load Balancer
  • Remove Side B from the Load Balancer
  • Remove the old IP from Side B
  • Add the new IP to Side B
  • Validate
  • Add Side B back to the Load Balancer
  • Validate

So three playbooks. Well, let’s not forget creating similar playbooks to back out the change in case Validate == Failed. So three more playbooks. Plus a couple of edge cases. For example, if Side A is fine but there’s some network issue with Side B, backing out Side B might mean three of the backout tasks can be run but we’d want to leave the new Side A in the Load Balancer.

That’s a lot of playbooks.

Hey, Tags! Create one Update playbook and tag the tasks appropriately. Then a second Backout playbook and tag those tasks. Then run the Update playbook with –tags delsidealb,delsidea,addsidea.

So not necessarily a long playbook but also for a bunch of simple tasks that need backouts and manual verifications.

Well, I thought it was cool 🙂 Learning new things is always fun and I thought I’d share.

Posted in ansible, Computers | Tagged , | Leave a comment

Ansible Tags

Overview

Simply enough, Ansible Tags let you run specific tasks in a play. If you have a lengthy playbook or are testing tasks within a playbook, you can assign tags to tasks that let you run a specific task vs the entire playbook.

This is simply a summary of the uses of Ansible Tags. More of a cheat sheet than trying to instruct you in how to use Ansible Tags. The Ansible Tags Documentation is fairly short and does a good job explaining how to use Ansible Tags.

Uses

Examples

$ ansible-playbook -i inventory dns-update.yaml --tags bind9               # only run tasks tagged with bind9
$ ansible-playbook -i inventory dns-update.yaml --skip-tags bind9          # run all tasks except the ones tagged with bind9
$ ansible-playbook -i inventory dns-update.yaml --tags "bind9,restart"     # run tasks tagged with bind9 and restart
$ ansible-playbook -i inventory dns-update.yaml --tags untagged            # only run untagged tasks
$ ansible-playbook -i inventory dns-update.yaml --tags tagged              # only run tagged tasks
# ansible-playbook -i inventory dns-update.yaml --tags all                 # run all tasks (default)

You can assign a tag to one or more tasks.

Tasks can have multiple tags.

When you create a block of tasks, you can assign a tag to that block and all tasks within the block are run when the tag is used.

An interesting idea might be to add a debug tag to all the debug statements in your playbooks and then when ready to run live, pass the –skip-tags debug flag to the playbook. Then only the tasks are executed.

Special Tags

If you assign an always tag to a task, it will always run no matter what the passed –tags value is unless you specially pass –skip-tags always.

If you assign a never tag to a task, it will not run unless you call it out specifically. Something like calling the playbook with –tags all,never.

Tag Inheritance

There are two types of statements that add tasks. A Dynamic include_role, include_tasks, and include_vars, and a Static import_role and import_tasks.

If you tag a task that contains an include_role or include_tasks function, only tasks within that included file that are similarly tagged will run when the tag is passed.

If you tag a task that contains an import_role or import_tasks function, all tasks within that imported file will be run when the tag is passed.

Listing Tags

By using the –list-tags option to ansible-playbooks, it lists all the tags and exits the playbook without running anything.

References

There are several sites that provide information on tags but the obvious one is the Ansible Documentation

Posted in ansible, Computers | Tagged , | Leave a comment