Disaster Recovery – Wave Four

Overview

This is the fourth post on restoring my VMware to Proxmox environment.

Inventory

My main goal is to get the inventory system up and processing. This includes getting nightly scripts that the service account runs plus the additional scripts that root runs.

The Inventory generates the hosts file used by Ansible to run playbooks. While I have a hosts file of course, it’s a step forward to get this working.

The Inventory itself has had several updates which are in git but haven’t been updated to the central Inventory server. As such I’ll need to git pull the updated files into a different directory and then do a comparison with the existing one.

The updates were pretty easy. First script was a compare of missing scripts. Those were identified and copied into the appropriate directories. Then a comparison between existing scripts. Those were mostly where I was extracting the dialog boxes out (same twice) into a single file that was included twice. Third were the database updates. New tables and adding a few columns to existing tables.

Once done, the new inventory was up and running without issues.

Regular Scripts

Once the Inventory is up, I can then install the script library across all systems. This does a lot of setup for various tasks and basically ensures all are running as expected.

Scripts are installed via Ansible. With the restored directory, I have the existing playbooks. I had to make a couple of updates and add a few playbooks. I expect I’ll be adding more as I move on.

Config 2 HTML

This is a program that was created years ago which essentially captures a system configuration backup which is then retrieved and places on the web server where it can be viewed.

Cfg2HTML is installed via Ansible. This includes the cron job which generates the data files.

DNS

As I worked forward, I found DNS wasn’t working. Forwarding specifically. It’s working with each environment but no forwarding. I’ve done a lot of research to determine the problem but have run into a wall. I have some deep diving to attempt but in the mean time, with Ansible I can install a common hosts file across all machines.

I did make a change where the old production environment was prod.internal.pri and I removed the prod from the configuration and all the servers.

So this is a TODO task!

At this point, we can move into the next post!

Posted in Computers | Tagged , , , | Leave a comment

Disaster Recovery – Wave Three

Overview

This is the third wave. The initial article made references to different waves but as I progress, the waves show what the priorities are for getting the environment back up.

For example, I spent a lot of time getting the Inventory back up since that’s where all my inventory hosts files are located for Ansible. This let me add appropriate tags and rebuild the hosts files so I wasn’t doing any steps manually.

Gitlab

Next up is to get Gitlab Enterprise Edition installed. I did perform a backup of my old system however I realized it was on Gitlab EE 15 and current is Gitlab EE 18. When I tried to locate the v15 software, it’s so old that it’s unavailable.

Realize that the whole point of git itself is everything involved in the project is on everyone’s computer. So I made the decision to just install the current version of Gitlab EE v18, and simply recreate all the Projects.

There are some 45 Projects spread across four systems but again, they’re all complete.

What’s the loss then? Mainly the Pull Request history itself. When you commit a change to your code, you probably created a short log entry. The Pull Request is handled by the overall tool, Gitlab in this case. You can provide a more detailed description of the change, you can see a graphical representation of the branches and who approved the Pull Request. There are other parts in a more production like environment that are also missed.

In my case though, it’s just me learning how CI/CD works and the effort to locate the v15 software and then perform upgrades, is a bit unnecessary. So I created all the git projects I’d been working on and loaded them into Gitlab.

Next up is getting Gitlab Runners in place and installing Jenkins and the two Jenkins agents. Then the actual CI/CD process configure.

Jenkins

Well shoot, restoring Jenkins worked perfectly. I followed the instructions over on the Jenkins.io site.

Add the repository. Install fontconfig and java-21-openjdk. Install Jenkins. Restore Jenkins home directory in /var/lib/jenkins. Fix the firewall. Start Jenkins. Access Website.

Since I was running a version 300 Jenkins and this is a version 500, there are a bunch of deprecated plugins and other plugins that needed to be updated. I really do need to pop out and clean up the plugins. Another day.

Anyway, it just works! Now to do the two agent servers.

Jenkins Agents

These were actually pretty easy. Create the Jenkins account, home directory in /var/lib/jenkins. Make sure the directory is properly owned. Restore the backed up Jenkins home directories. And done!

The main things I needed to do was to make sure the Jenkins Controller had ssh access to the two Jenkins Agents. Since it was a restore, all the information was already there.

The problem though was the home directories weren’t owned by jenkins:jenkins, they were owned by root:root. So the Agents weren’t connecting. I deleted and then added in new keys but it still didn’t work. Docs were talking about agent.jar but I had remoting.jar. I tried renaming it but got a permission denied error. A puzzle until I realized the jenkins account home directory was owned by root:root and changed the ownership to jenkins:jenkins.

Next up, connect Jenkins with the brand new Gitlab server.

SSH Keys

Since this is a brand new Gitlab server, some of the original settings are missing. Plus Jenkins is an actual restore so the credentials are configured and associated with the various Jenkins jobs.

But the process is simple enough. Instead of a personal or project token, I simply log into the Jenkins account on the Controller, change to the .ssh directory, and capture the id_rsa.pub key information. This gets added to my account’s list of SSH keys.

And since I recreated the keys, I updated the credentials in Jenkins I was already using with the new id_rsa private key, and the Jenkins jobs were able to successfully access gitlab.

Remote Server

One of the things I need to do is enable various systems such as the Jenkins account to access the remote server. But because the remote server is still rather old, I’m getting the following error message when attempting to connect:

Unable to negotiate with [remote server] port 22: no matching host key type found. Their offer: ssh-rsa,ssh-dss

On the Windows system, I can add the following to my .ssh/config file:

Host [hostname]
        user=[account]
        HostKeyAlgorithms +ssh-rsa
        PubkeyAcceptedAlgorithms +ssh-rsa

While this works with Windows (and my Mac laptop as well), on my linux servers, I’m getting the following error:

ssh_dispatch_run_fatal: Connection to [remote server] port 22: error in libcrypto

This is related to the crypto policies which are located in the /etc/crypto-policies directory. The initial configuration in the config file is set to DEFAULT. Check the /usr/share/crypto-policies directory for a list of options:

$ ls -al
total 28
drwxr-xr-x.   9 root root  152 Jan 26 03:00 .
drwxr-xr-x. 125 root root 4096 Feb 13 21:01 ..
drwxr-xr-x.   6 root root   61 Oct 31 04:20 back-ends
drwxr-xr-x.   2 root root 4096 Jan 26 03:00 DEFAULT
-rw-r--r--.   1 root root  680 Sep  5  2025 default-config
drwxr-xr-x.   2 root root 4096 Jan 26 03:00 FIPS
drwxr-xr-x.   2 root root 4096 Jan 26 03:00 FUTURE
drwxr-xr-x.   2 root root 4096 Jan 26 03:00 LEGACY
drwxr-xr-x.   3 root root  109 Jan 26 03:00 policies
drwxr-xr-x.   5 root root  136 Jan 26 03:04 python
-rw-r--r--.   1 root root  167 Oct 31 04:20 reload-cmds.sh

You can use the update-crypto-policies command to modify the configuration, but that does change it for all applications on the server. Since this is an internal environment, setting it isn’t critical, however be aware of the hazards:

# update-crypto-policies --set DEFAULT:SHA1
Setting system policy to DEFAULT:SHA1
Note: System-wide crypto policies are applied on application start-up.
It is recommended to restart the system for the change of policies
to fully take place.

While it does say to reboot, the command does reload all pertinent applications. Rebooting just ensures a clean loading of the policies. Check the /etc/crypto-policies/config file and the /etc/crypto-policies/state/current file to verify the change.

Posted in ansible, Computers, Git, gitlab | Tagged , , , , | Leave a comment

Disaster Recovery – Wave Two

Overview

This is the second part in a series of posts where I’m restoring my homelab environment. My goal is to automate as much as I can and make notes where it’s not working. This will consist of sections for each bit I’m installing.

I may repeat information as I’ll need to check my previous post.

MariaDB

In order to get the general stuff working, I need to install the web servers. This requires PHP, HTTP, and MARIADB packages and associated packages.

For MariaDB, I did some hunting and found a set of playbooks however they fail on the initialization tasks. The software is installed and started but when it tries to emulate the steps from the mysql_secure_installation program, it fails with a password error.

Eventually I simply ran the program by hand. I’ll have to revisit this.

PHP

This one was pretty simply. I simply need PHP and the PHP-MYSQLND packages. They installed cleanly with no errors.

HTTPD

This one also installed easily enough. We’ll need to follow up with some configuration work though in order to access the Inventory applications as they’re under the inventory DNS name for each site.

Import

Next up is to load up the websites. First follow any instructions such as creating the appropriate accounts and creating the database. For the Inventory:

mysql --user=root -p
Password:
create database inventory;
CREATE USER 'invadmin'@'localhost' IDENTIFIED BY '[password]';
GRANT ALL PRIVILEGES ON inventory.* TO 'invadmin'@'localhost';
FLUSH PRIVILEGES;

Set the appropriate password of course.

Database backups are in /opt/mysqlbackups. Simply uncompress the most recent full backup, drop into the directory then into the directory where the actual data is and import the SQL and data.

cd /opt/mysqlbackups
gzip -dc mysql.20260115.tar.gz | tar fvx -
cd mysql.20260115/inventory
for IMPORT in $(ls *sql)
do
  echo ${IMPORT}
  mysql --user=root -p inventory < ${IMPORT}
done
for IMPORT in $(ls *txt)
do
  mysqlimport --user=root -p inventory $(pwd)/${IMPORT}
done

One of the changes was removing prod from all the production hostnames. This required a minor update to the settings.php file in the inventory to remove prod from the credential blocks. Once that was done, I was able to log in and view all the servers!

At this point, I can start adding the cronjobs back in to manage the inventory and ansible playbook inventories.

Next up is to hit the other tool servers and get them going plus any non-tool servers such as the wiki server.

Posted in Computers | Tagged , , , , | Leave a comment

Disaster Recovery – Wave One

Overview

If you saw my previous article on Disaster Recovery, you know what Wave One is. Mainly it’s the steps needed to be able to start running the Ansible Playbooks. This blog post lists all the steps I need to do to get going, then what additional work I needed to do to be ready to start configurations.

Build Templates

As this is a Proxmox environment, I need to set up templates for use when building systems. I downloaded the following Operating System Installation Media (ISOs):

  • Rocky Linux 9.3
  • OpenSUSE Linux 16.0
  • Ubuntu Linux 24.04.2
  • OpenBSD Install 68
  • FreeBSD 12.1
  • Solaris 10

Of these, I created two templates to start. The other ISOs will be modified and made available for testing. I will note that these are all systems I’ve used in the past in one workplace or another or for personal projects.

I generally set up the file systems out of habit. The default Hard Disk is 50 Gigabytes. I set for 2 CPU cores and 2 Gigabytes of RAM. Most of what I’m poking at don’t require much more and I plan on having any additional installations to into their own drive vs extending the original boot drive.

File System Layout

  • Boot – Location of the kernel and associated files. 2 Gigabytes
  • Root – Root file system. 4 Gigabytes
  • Usr – Utilities. 4 Gigabytes
  • Home – Home Directories. 8 Gigabytes
  • Opt – Optional Applications. 4 Gigabytes
  • Var – Logs and application settings. 8 Gigabytes
  • Swap – 4 Gigabytes

Service Account

For my systems, the service account is, ‘unixsvc‘. The scripts all expect it to have a 5000 UID and be a member of the sysadmin group.

Initialization

I have a backup of the old VMware based VMs. The bad part is I didn’t back up home directories and didn’t have the cron jobs used to manage systems documented. I was able to pull the old VMs off of the old VMware ESXI servers so I recovered most of what I needed.

Once I got pfSense running and properly configured, I set up all the VMs. I have multiple environments, VLAN101 – VLAN105. Since I’m using the pfSense firewall and system level firewalls, I disabled the firewall setting for every network interface for all the systems. In addition, I needed to add the VLAN tag for each network for each network interface. Once that was done and all systems accessible, it was on to the next step.

Setup

As the unixsvc account, I logged into the console for every VM and ran the following commands:

  • hostnamectl set-hostname [hostname].[domain]
  • nmcli con mod ens18 ipv4.method manual ipv4.addresses 192.168.[net].[ip]/24 ipv4.gateway 192.168.[net].254 ipv4.dns 8.8.8.8,192.168.[net].154 ipv4.dns-search ‘[domain],schelin.org’

Then I rebooted to set all the information. Then since everything was on Rocky 9.3, I ran a dnf upgrade -y on every system.

Restoration

For the unixsvc account, I restored the backups of that account to every system where I felt I needed a backup. Then from the tool servers, I added the IPs and hostnames with domain to the /etc/hosts file. With that, I logged into every system, created the .ssh directory, set it to 0700, then copied the new id_rsa.pub file into the .ssh directory as authorized_keys. This gave the unixsvc account access to all systems in its domain.

As I gradually restored directories, I copied backed up files and started getting things going again. I still needed to run the initialize ansible playbook and the unixsuite ansible playbook to get every system under control and reporting. After that, we’re just setting up systems.

Ansible Playbooks

On the tool servers, before we can run any ansible playbooks, ansible has to be installed.

In order for it to be installed, the epel-release needs to be installed. Once that’s installed, activate the Code Ready Builder repository then install Ansible. You might also install Python3.

  • dnf install epel-release -y
  • crb enable
  • dnf install ansible -y

Initialization

The first script to run is the initialization.yaml file. I did have a few errors in my script as my servers are now a lot newer. Plus the root systems are all 192.168.5.0/24 now so I had to update a few files where the configuration was still 192.168.1.0/24.

It took a bit of searching the error for something to click. The installation had bailed because I hadn’t used nmcli on every system, just manually updating the /etc/resolv.conf file due to not having DNS servers up yet. The second error was installing mailx changed to a different package, s-nail which has mailx in the package. Unfortunately I started getting the following error:

fatal: [bldr0cuomknode2]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"failed": true, "module_stderr": "Shared connection to bldr0cuomknode2 closed.\r\n", "module_stdout": "\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}}, "msg": "The following modules failed to execute: ansible.legacy.setup\n"}

It took some searching but one of the installations was an updated sudoers on the redhat side. But I didn’t create the group or add the users to the group first so the error is my service account can’t become root as it’s not in the sudoers file any more. I had to go fix the servers before I could continue.

One of the more amusing aspects is I replace the sudoers file but in the mean time, add the unixsvc account to the new sysadmin group. Unless I log out as unixsvc on the tool server, it fails subsequent runs because of course group changes only occur when you log out and back in. Once done, rerunning the playbook works fine.

DNS

One of the things I needed to do was get my git server up and running. Since I’m having to make changes to the configurations, the 192.168.1.0/24 to 192.168.5.0/24 for example, I wanted to make sure my repos were up to date.

First though I had to create a git ansible playbook 🙂 There are four servers that need git installed, at least initially. I created it on the tool server and then copied it over to my installation repository. Ran it and the bldr0cuomgit1 server is ready for updates. I added the git playbooks and on to DNS.

The main DNS issue was originally my production environment was prod.internal.pri. I’d modified it but left the template with the prod designation. Since I’m trying to rebuild from scratch, I had to go in and add a check to create the named configurations and zone files properly. For production, just internal.pri, but for the other zones, dev.internal.pri and so on.

And as always, errors creep in. The annoying thing about jinja2 is there isn’t an easy way to determine what the errors are. After a bunch of troubleshooting, I finally just backed up the file and copied the one from the git server and gradually updated the template. I reran the ansible-playbook command each time until it matched what I was trying to do, then did a diff against the backed up file and found the errors, in both places! After sorting them out, I was able to get the production DNS servers up along with the dev, and qa dns servers.

Stage wasn’t working though. No hosts found. Looks like my automatically generated inventory didn’t have the stage dns servers tagged. Quick update and stage is working.

And now home isn’t working. Missing name. It took a similar, delete blocks of code until I found the errors. I’d created CNAME entries but it was a dictionary and not designated entries so I changed dv.name to just dv and it build the master zone file, however it failed to start the server. Quick grep of the messages file and I found the error. I have CNAMEs for my development and tool server. Dev has the dev websites and Tool has the finished builds for testing. Unfortunately it’s the same name duplicated so I blocked out the Tool side for now in order to get DNS running and I’ll dig into it later.

There were a few other minor issues with the named.conf files. I had a forwarder block for internal.pri and changing prod.internal.pri to internal.pri had conflicts. Since I have a default forwarder, the internal.pri forwarder block was unnecessary so I removed it and reinstalled.

Finally I copied all the changed files to my git server and added and commited the changes.

Conclusion

Once the necessary files are restored (mainly the website on the tool servers for now), we can start actually configuring the various systems. Mainly looking at what I already have configured and followed by creating new playbooks for other systems.

Posted in Computers | Tagged , , , | Leave a comment

Disaster Recovery

Overview

While not really a ‘Disaster’, I am forced to rebuild my servers. Fortunately I have backups plus on the blog, a set of instructions for my Kubernetes and Openshift setups. This article will provide information on getting a DR site up and what the steps need to be in order to quickly get going.

Environment

I have 150 or so servers that do a bunch of things. About 20 of those are experimental stuff or little things I spun up a VM for in order to test or try something out. Do they need to come up? Probably. Do they need to come up first, or even soon? Nah.

Begin

Once I got my Proxmox servers up, templates created, and servers created, it’s time to determine which servers need to be in wave one.

Wave One

There are two main types of servers that need to be configured.

  • Tool Servers – These are the Jump Servers used to access all the other servers. These also run the Ansible playbooks that configure every system. Some servers will require manual bits but automation should be attempted for all the others.
  • Name Servers – Since all the servers actually resolve through the Name Servers, these also need to be brought up so all the other servers can successfully start.

Wave Two

I’m looking at the CI/CD pipeline to be set up next. This consists of a gitlab server, gitlab runners, jenkins servers, git servers, and development servers. At least for now, we’re pulling binary information from the development servers.

We do have a Nexus server so we’ll be investigating that to hold all the binaries in the future. In addition, on Kubernetes we have an AWX set of containers which is the upstream code for the Ansible Automation Platform. We’ll get this going when we work on Kubernetes.

  • git servers – This holds the Ansible playbooks and configurations.
  • gitlab server – This is the repository of all the playbooks. While I do copy to my github account, that server doesn’t always have the most complete list of projects.
  • gitlab runners – These are really only used for testing and building containers for Kubernetes and Openview, however they are part of the CI/CD pipeline so should be built.
  • jenkins servers – These systems manage the building of sites and deployment of Ansible playbooks to the various Tool Servers.
  • development servers – These two systems have all the binary files in the /opt directory structure. Jenkins retrieves them when building out the final product.

Wave Three

At this point I’m looking at getting the Kubernetes servers up plus the supporting servers.

  • NFS Servers – The Persistent Storage for the applications.
  • HAProxy Servers – Ingress Routers
  • Control Servers – Kubernetes API Binaries
  • Worker Servers – Location for the containers

Once this is up, we can install the main containers.

  • AWX – Ansible Automation
  • ArgoCD – Gitops Tool
  • Ingress – The internal Ingress routers.

Wave Four

At this point, we’re looking at the Personal Servers. Tool Server, git and development servers, and other such as media and backup servers.

Posted in ansible, CI/CD, Computers, Git, gitlab, Jenkins, Kubernetes | Tagged | Leave a comment

Setting Up Proxmox

Background

Since my VMware experience has ended, and I do want to get more experience with other tools, I’ve copied my important files off of the VMware servers to my KVM host (the R710). Once I verified I had everything, I’ve installed Proxmox on the three R720XDs.

I’ve done this in the past on my KVM system and couldn’t easily figure out how to get things set up. A couple of jobs back, I was finally able to do some automation and converted the system to KVM only (libvirt and qemu) and used the terraform provider for libvirt to actually build systems. Worked just peachy.

But I have access to a set up environment now at work so I have some data I can lean on. It works there. I can and have created VMs. So I can lean into it a little, do some searching, and come up with the proper way to set it up. As it’s a Home Lab, I can make mistakes. I do want to see if I can use terraform with Proxmox. I saw a couple of notes saying it wasn’t a great provider so we’ll see.

Proxmox Setup

I downloaded proxmox-ve_9.0-1.iso, used Rufus to format one of the 16G mini thumb drives, and booted each of the three servers to the drive (select F11 to go into the Boot Menu, then select the Cruzer USB drive).

The process was pretty simple. Select country and timezone, set up root’s password and email, then configure the hostname, ip, and gateway. It installs a configured Debian 13 (trixie) system. I will note that at the start, sudo isn’t installed. Not sure what else is missing 🙂

The two things I need to better understand with Proxmox is networking and storage.

Storage

Well, in this case I have three systems with 32 Terabytes of spinning disk in each so 96 total Terrabytes of storage. This is instead of a central drive array. Eventually when I need to upgrade the hardware, I may search for more power thin servers and a separate large storage array. Future plans.

Networking

This was the more interesting part. My configuration is an incoming connection from Netlight, city fiber, which goes to this cluster. There’s an outbound connection to the house WiFi router.

On the three servers is a pfSense server VM. This is the middle part of this network path. It’s connected to one of the network ports on each of the R720XD servers. The Proxmox servers use the pfSense network as the route back out to the internet. And pfSense has one of the ad blocker tools installed. See the pfSense blog post for details on the configuration.

When creating VMs, they will be on one of several internal networks. I have these created to emulate a Development to Production corporate configuration. In addition, a 5th network are my personal servers. Corporate lets me practice automation, kubernetes, and other CI/CD type DevOps tasks. Personal has my pictures and other media.

Virtual Machines

Initially you have to upload the ISOs for the different installations to the Proxmox server where you’ll be creating VMs. In my case, with 150 or so VMs, I have the same ISOs on each Proxmox server.

Next up is I create Templates from the ISOs. Basically 2 gigs of RAM, 50 gigs of Storage, and 2 CPUs. They can be adjusted as required but most systems can operate without issue at this level.

The following VLANs are used for each of the environments.

  • VLAN101 – Development
  • VLAN102 – QA
  • VLAN103 – Staging
  • VLAN104 – Production
  • VLAN105 – Personal

When creating the VM, there are two networking configuration options that must be set. For the network interface, uncheck the Firewall option. These are internal systems, pfSense manages firewalls, and third, each system has a firewall configured.

Second, add the appropriate VLAN tag for the interface.

Posted in Computers, Proxmox, Virtualization | Tagged , , | Leave a comment

Potato Bites

Ingredients:

  • 4-6 potatoes
  • Cajun butter (melted butter + Cajun seasoning)
  • Grated Parmesan
  • Chopped parsley

Instructions:

  • Preheat oven to 400°F
  • Slice potatoes long-ways into half-inch slices
  • Use metal skewers to cut about 1/4 inch apart
  • Turn potato 90° and cut the same (crosshatch)
  • Cut that into quarters and square off
  • Drop in ice water to draw off the starch. This can take a while (hours to overnight).
  • Slather in butter and roast for about 30 mins

While they roast, make the ranch. For the last few minutes, set oven to broil until the potatoes have a nice char. Garnish with parsley and Parmesan!

Chipotle Ranch:

  • ½ cup sour cream
  • ¼ cup mayonnaise
  • 1 diced chipotle pepper from a can of chipotles in adobo sauce
  • 1/2 tsp garlic powder
  • 1/2 tsp onion powder
  • 1 tsp fresh dill
  • 1/4 tsp sea salt

Posted in Cooking | Tagged , , , | Leave a comment

Best Pancakes

Preparation is simple enough. Use a griddle if possible as you get more pancakes cooked at once. Get maple syrup, which ever strikes your fancy; real or flavored sugar. Serves 4-6.

In a bowl, whisk the following ingredients:

  • 360g/3 cups of All purpose flour
  • 50g/1/4 cup of sugar
  • 10g/2 teaspoons of salt
  • 11g/2 teaspoons of baking powder
  • 5-6g/1 teaspoon of baking soda

In a bowl, mix the following ingredients:

  • 460ml/2 cups of milk
  • 55ml//1/4 cup of vegetable oil
  • 10g/2 teaspoons of vanilla
  • 55ml/1/4 cup of apple cider vinegar
  • 3 eggs

Once mixed, pour the liquid into the dry and using a spatula, fold the ingredients. Don’t finely stir it, there should be some small lumps when done. Around 20 times with the spatula.

Put a large pat of butter on the griddle. Using a scoop such as an ice cream scoop or soup spoon, carefully, and without stirring the mixture, scoop up an egg sized amount of batter and place it on the griddle.

As always, while it cooks, watch it for bubbles. As bubbles appear on the surface of the pancake, watch for a few popping. At about that point, using a plastic spatula, flip the pancakes. To test, softly push on the surface. As it’s cooked, it should be soft and fluffy and should bounce back from the push.

Posted in Cooking | Tagged , , , , | Leave a comment

Cygwin, KVM, and X Window

With the conversion of systems from VMware over to KVM, and of course just managing the KVM servers, I need to document the process for accessing the servers and viewing consoles.

I’ve been using cygwin for many years. I’ve even used the X Window system to access X applications on servers. For the purpose of accessing KVM VMs, we’ll bring up a terminal and start X:

startx

This brings up a full screen window. Generally not bad but I have a 43″ monitor so it’s pretty large. Resize it down to a more manageable size and in the term window, ssh over to the KVM server. Mine is my Nikodemus system at 192.168.5.10. Pass the -Y option to tell ssh this is to be used as a tunnel with the X Window system.

ssh -Y 192.168.5.10

To make sure it worked, verify your DISPLAY environment variable. It should show something like this:

printenv | grep -i display
DISPLAY=localhost:10.0

Since it’s a tunnel, it’ll show localhost.

Now you can start the virt-manager for viewing all the VMs, regardless of status, or virt-viewer which shows only the running VMs. Note that resizing is done by clicking on the double box in the upper right corner of the window.

Posted in KVM | Tagged , , , , , | Leave a comment

VMware to KVM

I’ve been a member of VMUG and user of VMware on my Dell R710 and then Dell R720XDs for almost 10 years now. It’s been interesting and valuable in helping me understand VMware.

A couple of jobs back, I was reintroduced to KVM. I’d tried it before getting into VMware and couldn’t get the hang of it but with it being my job, and my new skills with virtual machines, I had a better grasp and was even able to build terraform scripts to build sites. Cool stuff.

With Broadcom restricting access to VMUG members, and my license having expired, plus moving, had left my existing VMs out in the cold. While I have terraform scripts and documentation (and backups) for most of my systems, I do need to get some data off of some where I either failed to retrieve the data or backed up the operations but not my home directories or just verify the backups I have are complete enough.

Some of the files aren’t a big deal. Jenkins and Gitlab, I’d just reinstall and reimport or rebuild the processes. I’m not in a production or developer environment where I need to bring in all the changes over the past 10 years. Just recreate the setup, git push the files, and move forward. Heck, I’ll even have clean installations. When I first installed Jenkins, I installed everything that was suggested. With experience now, I’ll just install what I need.

The first step is to pull them off the VMware systems. I can bring them up, I just can’t start any VMs. I enabled SSH access to the systems and on my R710 KVM box, simply scp’d the ones I wanted to review over to a /opt/vms directory. I reviewed the system specs in order to properly start them and off we go.

The first step is to convert the images to the qcow2 format. Install the qemu-img package and run the following command:

qemu-img convert -f vmdk -O qcow2 /opt/vms/monkey/bldr0cuomaws1/bldr0cuomaws1.vmdk bldr0cuomaws1.qcow2

Next up is to install it into Qemu. This makes it visible to KVM in order to run the system. I used the settings I retrieved in order to properly configure the domain.

virt-install --name bldr0cuomaws1 --ram 4096 --vcpus 2 --disk bldr0cuomaws1.qcow2,format=qcow2 --import

Here’s the tricky part. For most of the systems, I just wanted to retrieve the data. Once the domain has been configured, you can try to start the new server but I wasn’t having much success. I did find I could actually just use a command called guestmount and simply mount the image to /mnt and copy the data from the system.

guestmount -d bldr0cuomaws1 --ro -i /mnt

Once done, I changed over to /mnt and simply copied the data from my home directory to a central location. After that, I didn’t really need this image any more so I deactivated it and removed it.

virsh pool-destroy bldr0cuomaws_pool
virsh pool-delete bldr0cuomaws_pool

Next up, I need to see how this will work with multiple disks.

Oh, one thing. You can make sure the image was copied and converted properly before you delete the VM from VMware.

# qemu-img info bldr0cuomrepo1.qcow2
image: bldr0cuomrepo1.qcow2
file format: qcow2
virtual size: 100 GiB (107374182400 bytes)
disk size: 16.8 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

For a multi-disk image setup, we’ll need to convert the disks and then attach them to the primary image. They are all LVM though so you can’t really mount the entire system using guestmount. You’d only mount individual mount points.

For LVM systems, when you add the disk, you’ll need to add them as the appropriate sd. They’re added sequentially so a drive with 4 extra drives, you’d add sdb, sdc, sdd, and sde.

Once that’s done, you’ll need to look at the mount points and then mount them individually as guestmount won’t actually mount everything.

We’ll walk this through from conversion to mounting.

First off, here’s the original directory and files for this server. It basically held a bunch of linux images used when kickstarting servers. Kind of an automatic build process from the past. Do I need everything? Probably not. In this case, I’m just seeing what’s there, maybe in the home directory, and copying it off:

# ls -al
total 1656836168
drwxr-xr-x 2 root root         4096 Oct 30 23:02 .
drwxr-xr-x 6 root root         4096 Oct 31 23:30 ..
-rw-r--r-- 1 root root         1298 Oct 30 16:49 lnmt1cuomjs1-52466e53.hlog
-rw------- 1 root root  85899345920 Oct 30 17:08 lnmt1cuomjs1-flat.vmdk
-rw------- 1 root root         8684 Oct 30 23:02 lnmt1cuomjs1.nvram
-rw------- 1 root root          508 Oct 30 17:08 lnmt1cuomjs1.vmdk
-rw-r--r-- 1 root root            0 Oct 30 23:02 lnmt1cuomjs1.vmsd
-rwxr-xr-x 1 root root         4084 Oct 30 23:02 lnmt1cuomjs1.vmx
-rw------- 1 root root 536870912000 Oct 30 19:06 lnmt1cuomjs1_1-flat.vmdk
-rw------- 1 root root          511 Oct 30 19:06 lnmt1cuomjs1_1.vmdk
-rw------- 1 root root 536870912000 Oct 30 21:04 lnmt1cuomjs1_2-flat.vmdk
-rw------- 1 root root          511 Oct 30 21:04 lnmt1cuomjs1_2.vmdk
-rw------- 1 root root 536870912000 Oct 30 23:02 lnmt1cuomjs1_3-flat.vmdk
-rw------- 1 root root          457 Oct 30 23:02 lnmt1cuomjs1_3.vmdk
-rw------- 1 root root       186501 Oct 30 23:02 vmware-89.log
-rw------- 1 root root       413874 Oct 30 23:02 vmware-90.log
-rw-r--r-- 1 root root       309911 Oct 30 23:02 vmware-91.log
-rw-r--r-- 1 root root       226039 Oct 30 23:02 vmware-92.log
-rw-r--r-- 1 root root       281301 Oct 30 23:02 vmware-93.log
-rw-r--r-- 1 root root       334962 Oct 30 23:02 vmware-94.log
-rw-r--r-- 1 root root       191944 Oct 30 23:02 vmware.log
-rw------- 1 root root     85983232 Oct 30 23:02 vmx-lnmt1cuomjs1-24cae1aa722f12da9b70e188df14347036fca212-2.vswp

The files we’re interested in are just the vmdk files. These have a description of each disk, like so:

# cat lnmt1cuomjs1.vmdk
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=bbd23a17
parentCID=ffffffff
createType="vmfs"

# Extent description
RW 167772160 VMFS "lnmt1cuomjs1-flat.vmdk"

# The Disk Data Base
#DDB

ddb.adapterType = "lsilogic"
ddb.geometry.cylinders = "10443"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.longContentID = "876a84261b8e8ba71481a111bbd23a17"
ddb.toolsInstallType = "4"
ddb.toolsVersion = "11269"
ddb.uuid = "60 00 C2 9c 1b 27 9b c2-8a a4 da a6 3e ae eb 89"
ddb.virtualHWVersion = "11"

Honestly, to me, these don’t mean a whole lot. Once I have a list of the vmdk files (initial, 1, 2, and 3), I can convert them. First I created a directory for the files in /opt/libvirt_images which is where I have all the pool files. Then ran the qemu-img commands to convert all the disk images.

qemu-img convert -f vmdk -O qcow2 /opt/vms/morgan/lnmt1cuomjs1/lnmt1cuomjs1.vmdk lnmt1cuomjs1.qcow2
qemu-img convert -f vmdk -O qcow2 /opt/vms/morgan/lnmt1cuomjs1/lnmt1cuomjs1_1.vmdk lnmt1cuomjs1_disk1.qcow2
 qemu-img convert -f vmdk -O qcow2 /opt/vms/morgan/lnmt1cuomjs1/lnmt1cuomjs1_2.vmdk lnmt1cuomjs1_disk2.qcow2
 qemu-img convert -f vmdk -O qcow2 /opt/vms/morgan/lnmt1cuomjs1/lnmt1cuomjs1_3.vmdk lnmt1cuomjs1_disk3.qcow2

Once everything is converted, you’ll need to install the main qcow2 file, then add the VMs.

You’ll have to get the domain created before you can attach the disks. To do that, you use virt-install.

virt-install --name lnmt1cuomjs1 --ram 4096 --vcpus 2 --disk lnmt1cuomjs1.qcow2,format=qcow2 --import

You can run a virsh list to see the domain once it’s created. Now attach the three disks to the domain.

virsh attach-disk lnmt1cuomjs1 /opt/libvirt_images/lnmt1cuomjs1_pool/lnmt1cuomjs1_disk1.qcow2 sdb --type disk --config
virsh attach-disk lnmt1cuomjs1 /opt/libvirt_images/lnmt1cuomjs1_pool/lnmt1cuomjs1_disk2.qcow2 sdc --type disk --config
virsh attach-disk lnmt1cuomjs1 /opt/libvirt_images/lnmt1cuomjs1_pool/lnmt1cuomjs1_disk3.qcow2 sdd --type disk --config

Run the qemu-img info command to verify the integrity of the new VM.

virt-filesystems -a /opt/libvirt_images/lnmt1cuomjs1_pool/lnmt1cuomjs1.qcow2
/dev/sda1
/dev/vg00/home
/dev/vg00/opt
/dev/vg00/root
/dev/vg00/tmp
/dev/vg00/usr
/dev/vg00/var

Update: This was interesting. After attaching the disks via the command line, I’d start a system and it’d fail, sitting at the maintenance prompt. I started virt-manager after starting the X Window system and a few of the systems I’d transferred, still weren’t done so I added the converted drive in the GUI vs using the virsh attach-disk command and low and behold, the system came up. I started going through the systems one at a time. The single drive systems, I was able to start without problem other than the Solaris one (I’ll just rebuild that one). Then I used virsh edit domain and removed the CLI added disk. Then in the GUI, added the disk back in. I was finding some of the SCSI drives were still failing so I changed them to IDE and the system came up. So no all the transferred systems have come up with no problem and I can either transfer them into the Proxmox cluster or for some, just copy the data over to a fresh installation. I do want to make all the systems current as most are running CentOS 7. Plus I want to use different Linux distros for different network zones.

Posted in Qemu, Virtualization | Leave a comment