Overview
This article describes the process in migrating a Virtual Machine from one physical host to another.
Background
There are two methods of how the virtual machines were built on the current hosts. The old way is to create a LVM slice in the disk and lay a base image over the top of the using dd. The second process is more common where the images are created and stored as a file on the host.
Guest Shutdown
For any of the non Openshift (OCP) systems, you have a couple of methods of shutting down the systems. You can log in to the server and shut it down.
ssh tato0cuomifnag02
sudo su -
shutdown -t 0 now -h
Or use virsh console from the underlying host to log in and shut it down. (Reminder the _domain and _pxe are assignments created by the new automation process):
virsh console tato0cuomifnag02
login: root
password:
shutdown -t 0 now -h
Openshift/Kubernetes
An interesting difference between a Kubernetes Control Node and an OCP Control Node are the extra pods used to manage the OCP cluster. The oauth pod, registry pods, console pods, and others for example. This means that while a drain isn’t necessary on a Kubernetes Control Node, for an OCP Control Node, you should drain so any control pod such as oauth will continue to be available to the cluster.
This is a concern though as if a control node fails for whatever reason, the cluster may be unavailable until replacement pods are created. OCP should be aware of the loss of an important pod like oauth and start it up on a different master. I suspect it would occur eventually.
In any case, evict the control and worker node from the cluster before migrating it.
$ oc adm drain bldr0cuomocpwrk02.dev.internal.pri --delete-emptydir-data --ignore-daemonsets --force
node/bldr0cuomocpwrk02.dev.internal.pri evicted
WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-marketplace/redhat-operators-8kqpc; ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-w5r84, openshift-dns/dns-default-th2ql, openshift-dns/node-resolver-vbw7d, openshift-image-registry/node-ca-j2nrk, openshift-ingress-canary/ingress-canary-d6l42, openshift-machine-config-operator/machine-config-daemon-z5hzf, openshift-monitoring/node-exporter-rqj52, openshift-multus/multus-additional-cni-plugins-h8vcd, openshift-multus/multus-mqg5z, openshift-multus/network-metrics-daemon-npcjh, openshift-network-diagnostics/network-check-target-lflxb, openshift-sdn/sdn-zgqrt
evicting pod openshift-monitoring/thanos-querier-7c8bb4cdbd-n97pv
evicting pod default/llamas-6-p84z2
evicting pod default/inventory-4-szhvw
evicting pod default/photo-manager-4-cqqbc
evicting pod openshift-marketplace/redhat-operators-8kqpc
evicting pod openshift-monitoring/alertmanager-main-1
evicting pod openshift-monitoring/prometheus-adapter-66ff97555b-x92r2
pod/redhat-operators-8kqpc evicted
pod/inventory-4-szhvw evicted
pod/alertmanager-main-1 evicted
pod/llamas-6-p84z2 evicted
pod/photo-manager-4-cqqbc evicted
pod/thanos-querier-7c8bb4cdbd-n97pv evicted
pod/prometheus-adapter-66ff97555b-x92r2 evicted
node/bldr0cuomocpwrk02.dev.internal.pri evicted
$ oc get nodes
NAME STATUS ROLES AGE VERSION
bldr0cuomocpctl01.dev.internal.pri Ready master 13d v1.22.3+e790d7f
bldr0cuomocpctl02.dev.internal.pri Ready master 13d v1.22.3+e790d7f
bldr0cuomocpctl03.dev.internal.pri Ready master 13d v1.22.3+e790d7f
bldr0cuomocpwrk01.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk02.dev.internal.pri Ready,SchedulingDisabled worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk03.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk04.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk05.dev.internal.pri Ready,SchedulingDisabled worker 13d v1.22.3+e790d7f
The delete-emptydir-data option is used when a pod is using the emptyDir storage method. Moving a pod using this method deletes any data in that emptyDir location.
The ignore-daemonsets option is used as if a pod is using daemonsets, it means the pod is running on every node and can’t be removed. You’re just saying that, yes you know there are pods using daemonsets and it’s fine if the node is cordoned.
The force option is used when there are pods that can’t be deleted.
Once evicted, you’ll log into each OCP/K8S server that you will be migrating and shut it down.
ssh tato0cuomocpbld01
sudo su -
cd /home/ocp4
ssh -i id_rsa core@tato0cuomocpctl01
sudo su -
shutdown -t 0 now -h
Migrate LVM Guests
This process details the process of migrating an LVM built guest.
First identify the guests on the host so you know which one to migrate. For example, the upcoming event where the physical hosts are being moved to a different data center.
# virsh list --all
Id Name State
----------------------------------------------------
2 tato0cuomifnag01 running
4 tato0cuomifnag02 running
For example, migrating tato0cuomifnag01. You’ll need to know what the path is in order to get the LVM information.
# ls -la /dev/pool2
total 0
drwxr-xr-x. 2 root root 200 Feb 7 01:40 .
drwxr-xr-x. 23 root root 4180 Feb 7 02:07 ..
lrwxrwxrwx. 1 root root 8 Feb 7 01:40 tato0cuomifnag01 -> ../dm-44
lrwxrwxrwx. 1 root root 8 Feb 7 01:06 tato0cuomifnag02 -> ../dm-45
Now you can run lvdisplay to get the size of the image. The value you want is the Current LE value.
# lvdisplay /dev/pool2/tato0cuomifnag02
--- Logical volume ---
LV Path /dev/pool2/tato0cuomifnag02
LV Name tato0cuomifnag02
VG Name pool2
LV UUID MFBxt1-8yFR-EOd4-TVZD-nQlh-RUIu-GweC8c
LV Write Access read/write
LV Creation host, time tato0cuomifnag02, 2018-01-30 15:02:24 -0600
LV Status available
# open 1
LV Size 20.00 GiB
Current LE 5120
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:46
Create a new same sized LVM on the destination server.
lvcreate -l5120 -ntato0cuomifnag02 vg00
Run the following command to migrate the image. Obviously you need to be able to ssh to root on the destination server.
dd if=/dev/pool2/tato0cuomifnag02 | pv | ssh -C root@destination dd of=/dev/vg00/tato0cuomifnag02
The nice thing is the -C compresses the image and it’s an encrypted copy.
Migrate Images
This process defines the process of migrating a file and start it up on the other host.
When you shut down the guest, then per libvirt, the guest is stopped. But you’ll also need to stop the storage pool.
virsh pool-destroy tato0cuomifnag01_pool
Now that both the guest and the storage pool have been stopped, copy the image from the /opt/libvirt_images/tato0cuomifnag01_pool directory to the destination server. Use the /opt/libvirt-images directory as the target as it has sufficient space for larger images such as the katello server.
scp commoninit.iso [yourusername]@nikodemus:/opt/libvirt_images/
scp tato0cuomifnag01_amd64.qcow2 [yourusername]@nikodemus:/opt/libvirt_images/
On the destination server, create the pool directory and move the images into the /opt/libvirt-images/tato0cuomnag01_pool/ directory. You’ll need to set ownership and permissions as well.
mkdir /opt/libvirt_images/tato0cuomifnag01_pool
cd /opt/libvirt_images
mv commoninit.iso tato0cuomifnag01_pool/
mv tato0cuomifnag01_amd64.qcow2 tato0cuomifnag01_pool/
chown -R root:root
find . -type f -exec chown 644 {} \;
Extract Definitions
Once the images have been copied to the destination host, you’ll need to extract the domain and for the guests that are images, the storage desc riptions.
Extract the guest definition.
virsh dumpxml tato0cuomifnag01_domain > tato0cuomifnag01_domain.xml
For the guests that are images (the new automation process), extract the storage pool definition.
virsh pool-dumpxml tato0cuomifnag01_pool > ~/tato0cuomifnag01_pool.xml
Copy Definitions
Once you have the definitions, copy the xml files to the destination server.
scp tato0cuomifnag01.xml [yourusername]@nikodemus:/var/tmp
scp tato0cuomifnag01_pool.xml [yourusername]@nikodemus:/var/tmp
Import Definitions
Log into the destination server and import the domain definition. The LVM based guest may require editing of the xml file in case the source LVM slice is different than the destination LVM slice.
virsh define /var/tmp/tato0cuomifnag01.xml
For the image based guests, import the storage pool definition as well.
virsh pool-define /var/tmp/tato0cuomifnag01_pool.xml
Activate Guests
For the image based guests, activate the storage pool first. The guest won’t start if the storage pool hasn’t been started. Also configure it to automatically start when the underlying host boots.
virsh pool-start tato0cuomifnag01_pool
virsh pool-autostart tato0cuomifnag01_pool
Then start the guest.
virsh start tato0cuomifnag01_domain
Openshift/Kubernetes
Rejoin the migrated node to the cluster.
$ oc adm uncordon bldr0cuomocpwrk02.dev.internal.pri
node/bldr0cuomocpwrk02.dev.internal.pri uncordonedReferences
Then check the cluster status to see that the migrated node is up and Ready.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
bldr0cuomocpctl01.dev.internal.pri Ready master 13d v1.22.3+e790d7f
bldr0cuomocpctl02.dev.internal.pri Ready master 13d v1.22.3+e790d7f
bldr0cuomocpctl03.dev.internal.pri Ready master 13d v1.22.3+e790d7f
bldr0cuomocpwrk01.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk02.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk03.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk04.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
bldr0cuomocpwrk05.dev.internal.pri Ready worker 13d v1.22.3+e790d7f
Cleanup
Finally remove the xml files.
rm /var/tmp/tato0cuomifnag01.xml
rm /var/tmp/tato0cuomifnag01_pool.xml
Recovery
The Recovery process is very similar. In the event the physical host was replaced, we’ll need to migrate all the guests back over to the replacement host.
In order to determine what guests belong on the replaced host, check the installation repositories. Both the terraform and pxeboot repositories are complete installs on all physical hosts for the site. The directory structure is based on the hostname of the physical host. Simply log in to the current hosts, navigate to the repo’s site/hostname directory for the replaced host, and determine which guests need to be migrated back to the replaced host.
Once that’s determined, follow the above process to migrate the guests back to the replaced host.
Removal
After all the guests have been migrated back to the replaced host, you’ll need to remove the guests from the holding physical hosts.
virsh undefine [guest]
virsh pool-undefine [guest]
rm -rf /opt/libvirt_images/[guest]_pool
For LVM based guests, you’ll need to use the lvremove command.
Troubleshooting
Some information that’s helpful during the work.
If you accidentally pool-destroy (stop) the wrong pool, the guest doesn’t stop working. Remember the command simply marks the storage pool as inactive. It doesn’t actually shut down storage. As long as the guest is running, the pool will remain active to the guest. If you stop the guest and try to start it again and the storage pool is inactive, the guest will not start. To restart the storage pool, run pool-start for the storage pool and it’s active again.
References
- virt-backup.pl – Alternative Python script to migrate LVM images.
- https://docs.openshift.com/container-platform/4.9/nodes/nodes/nodes-nodes-working.html