On the Kubernetes cluster, one of the etcd members had a falling out and is reporting the data is stale. While troubleshooting, we came up with several ideas including just rebuilding the cluster. It’s not all that hard overall but still causes some angst because everyone gets new tokens and applications have to be redeployed.
The process itself is simple enough.
etcdctl member list
etcdctl member remove [member hex code]
Since it’s a TLS based node with certificates, you actually have to pass the certificate information on the command line. In addition, you may actually have to go into the pod to use its etcdctl command if you don’t have a current etcdctl binary installed.
The command is the same though, whether you’re in the pod itself (easy to do from a central console) or running it on one of the masters where the etcd certs are also installed.
etcdctl member list --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
59721c313837f64a, started, bldr0cuomkube3.internal.pri, https://192.168.101.71:2380, https://192.168.101.71:2379, false
cd0ea44e64569de6, started, bldr0cuomkube2.internal.pri, https://192.168.101.73:2380, https://192.168.101.73:2379, false
e588b22b4be790ad, started, bldr0cuomkube1.internal.pri, https://192.168.101.72:2380, https://192.168.101.72:2379, false
Then you simply run the command again.
etcdctl member remove e588b22b4be790ad --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
And the etcd member has been removed.