Optimize Kubernetes HA failover time

2. Feb 2026

In this short article I want to show how you can optimize RKE2 Kubernetes HA failover time. In the default configuration of a RKE2 Kubernetes High Availability Cluster, Workloads are migrating from a failed Node only after 5 minutes. To drop the failover time to 30 seconds we need to edit the following config.yaml file

Repeat the following steps on ALL Nodes in your RK2 Kubernetes HA Cluster !!!

(If the file or folder does not exist please create it)

sudo mkdir -p /etc/rancher/rke2/
sudo touch /etc/rancher/rke2/config.yaml
sudo nano /etc/rancher/rke2/config.yaml

APPEND the following code to the config.yaml file and save it:

kube-apiserver-arg:
  - '--default-not-ready-toleration-seconds=30'
  - '--default-unreachable-toleration-seconds=30'
kube-controller-manager-arg:
  - '--node-monitor-period=2s'
  - '--node-monitor-grace-period=16s'
  - '--pod-eviction-timeout=30s'
kubelet-arg:
  - '--node-status-update-frequency=4s'
  - '--max-pods=200'

Please reboot each Node after each other to apply the new settings.

Mount Longhorn Volume to Host TrueNAS Backup while preserving file permissions and ownership