In this guide, I will try to explain how to set up a High Availability NFS cluster. In total we need to setup two Ubuntu-Servers.
Components to make NFS high available:
Good documentation can be found on Docker docs: https://linbit.com/blog/highly-available-nfs-targets-with-drbd-pacemaker The hostnames and IP addresses of the two machines are now entered in the /etc/hosts file on all machines. Thus everyone knows everyone.
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
echo "192.168.0.200 nfsserver1" | sudo tee -a /etc/hosts
echo "192.168.0.201 nfsserver2" | sudo tee -a /etc/hosts
Install DRBD and related utilities
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo apt install -y drbd-utils pacemaker resource-agents-extra iptables
Install NFS on Ubuntu
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo apt install -y nfs-kernel-server nfs-common
Stop and disable NFS server and its related services in systemd because Pacemaker will control these.
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo systemctl disable --now nfs-kernel-server.service
Enable pcsd.service, pacemaker.service and corosync.service
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo systemctl enable pcsd --now
sudo systemctl enable pacemaker --now
sudo systemctl enable corosync --now
Change password for the hacluster user
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo echo 'hacluster:secretpassword' | sudo chpasswd
Create the pacemaker cluster with the following commands:
!!! Only on machine nfsserver1 !!!:
sudo pcs host auth -u hacluster -p secretpassword nfsserver1 addr=192.168.0.200 nfsserver2 addr=192.168.0.201
sudo pcs cluster setup --force linbit-cluster nfsserver1 addr=192.168.0.200 nfsserver2 addr=192.168.0.201
sudo pcs cluster start --all
OPTIONAL: Verifying the pacemaker cluster services and state
sudo pcs status
Now we create logical volume and directories for the NFS share (the secound hard drive on each system (50GB) in the most cases /dev/sdb)
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo pvcreate /dev/sdb
sudo vgcreate nfs_vg /dev/sdb
Now we create two logical volumes.
The first logical volume will be for storing NFS stateful connection information and the “tickle” directory used by the portblock OCF (Open Cluster Framework) resource agent. If the NFS stateful connection is not highly available or otherwise synchronized between cluster nodes, then in some failover cases, it might take a long time for NFS exports to become available. This volume will not hold much data and 20M can be a sufficient size.
The second volume will store data that we will share by using NFS.
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo lvcreate -L 20M -n ha_nfs_internal_lv nfs_vg
sudo lvcreate -l 100%FREE -n ha_nfs_exports_lv nfs_vg
sudo mkdir -p /srv/drbd-nfs/exports/HA
sudo mkdir -p /srv/drbd-nfs/internal
Now we are creating a DRBD resource configuration file on each system and after that we are initializing the DRBD resource.
The following commands are repeated for the following systems (nfsserver1; nfsserver2):
sudo bash -c 'cat <<EOF > /etc/drbd.d/ha_nfs.res
resource "ha_nfs" {
volume 0 {
device "/dev/drbd1000";
disk "/dev/nfs_vg/ha_nfs_internal_lv";
meta-disk internal;
}
volume 1 {
device "/dev/drbd1001";
disk "/dev/nfs_vg/ha_nfs_exports_lv";
meta-disk internal;
}
on "nfsserver1" {
address 192.168.0.200:7788;
}
on "nfsserver2" {
address 192.168.0.201:7788;
}
}
EOF'
sudo drbdadm create-md ha_nfs
sudo drbdadm up ha_nfs
Next we are creating filesystems on the two logical volumes and promote one of the machines to primary
!!! Only on machine nfsserver1 !!!:
sudo drbdadm primary --force ha_nfs
#Watch the sync process with the following command. Wait for sync to be completed
watch -n 1 cat /proc/drbd
sudo mkfs.ext4 /dev/drbd1000
sudo mkfs.ext4 /dev/drbd1001
sudo mount /dev/drbd1000 /srv/drbd-nfs/internal
sudo mkdir /srv/drbd-nfs/internal/nfs_info_dir
sudo mkdir /srv/drbd-nfs/internal/portblock_tickle_dir
sudo umount /dev/drbd1000
First we create a cib database with the name drbdconf. Now we create the first pacemaker rule. If a machine in the pacemaker cluster is crashed, the pacemaker rule promotes the only operational machine left in the cluster to be the primary drbd machine. The last command is updating the drbdconf cib database.
!!! Only on machine nfsserver1 !!!:
sudo pcs cluster cib drbdconf
sudo pcs -f drbdconf resource create p_drbd_ha_nfs ocf:linbit:drbd \
drbd_resource=ha_nfs \
op start interval=0s timeout=40s \
stop interval=0s timeout=100s \
monitor interval=31s timeout=20s role=Unpromoted \
monitor interval=29s timeout=20s role=Promoted
sudo pcs -f drbdconf resource promotable p_drbd_ha_nfs \
promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true
sudo pcs cluster cib-push drbdconf
Now we need to configure a file system primitive in Pacemaker so that the file system that will back the NFS share is only mounted on a node in a primary role for the backing DRBD resource. The file system primitive is based on the Filesystem OCF resource agent and configures colocation and order constraints to accomplish this.
Because the DRBD resource has two volumes, we will need to configure two file system primitives in Pacemaker.
Enter the following commands to configure the file system primitive in Pacemaker, for the NFS "stateful information directory" and the "data share directory":
!!! Only on machine nfsserver1 !!!:
#### policies for the nfs "stateful information directory"
sudo pcs -f drbdconf resource create p_fs_nfs_internal_info_HA ocf:heartbeat:Filesystem \
device=/dev/drbd1000 \
directory="/srv/drbd-nfs/internal" \
fstype=ext4 \
run_fsck=no \
op start interval=0s timeout=60s \
stop interval=0s timeout=60s \
monitor OCF_CHECK_LEVEL=0 interval=15s timeout=40s
sudo pcs -f drbdconf constraint order \
promote p_drbd_ha_nfs-clone then start p_fs_nfs_internal_info_HA
sudo pcs -f drbdconf constraint colocation \
add p_fs_nfs_internal_info_HA with p_drbd_ha_nfs-clone INFINITY with-rsc-role=Promoted
#### policies for the nfs "data share directory"
sudo pcs -f drbdconf resource create p_fs_nfsshare_exports_HA \
ocf:heartbeat:Filesystem \
device=/dev/drbd1001 \
directory="/srv/drbd-nfs/exports/HA" \
fstype=ext4 \
run_fsck=no \
op start interval=0s timeout=60s \
stop interval=0s timeout=60s \
monitor OCF_CHECK_LEVEL=0 interval=15s timeout=40s
sudo pcs -f drbdconf constraint order \
promote p_drbd_ha_nfs-clone then start p_fs_nfsshare_exports_HA
sudo pcs -f drbdconf constraint colocation \
add p_fs_nfsshare_exports_HA with p_drbd_ha_nfs-clone INFINITY with-rsc-role=Promoted
sudo pcs cluster cib-push drbdconf
OPTIONAL: We can enter a df -h command on the machine in the cluster that the Filesystem resources are started on, to further verify that Pacemaker has mounted the two file systems, backed by the DRBD devices.
df -h
Example Output:
Filesystem Size Used Avail Use% Mounted on
[...]
/dev/drbd1000 18M 60K 16M 1% /srv/drbd-nfs/internal
/dev/drbd1001 3.9G 8.0K 3.7G 1% /srv/drbd-nfs/exports/HA
[...]
After configuring the Pacemaker resource primitives that will back the HA NFS share, the DRBD volumes and the file systems that will mount on them, we can configure the Pacemaker primitives that will serve the NFS share in the cluster. We are using the IP address 192.168.0.199 as our NFS cluster IP address.
For the secound rule set (exportfs) we are defining a network IP address of 192.168.0.0/24, so all clients in that network can access the NFS share (192.168.0.1 - 192.168.0.254).
After that we set the no-quorum-policy=ignore and stonith-enabled=false because we are operating the cluster only with two machines.
!!! Only on machine nfsserver1 !!!:
#### policies for the nfs-server
sudo pcs -f drbdconf resource create p_nfsserver ocf:heartbeat:nfsserver \
nfs_shared_infodir=/srv/drbd-nfs/internal/nfs_info_dir \
nfs_ip=192.168.0.199 \
op start interval=0s timeout=40s \
stop interval=0s timeout=20s \
monitor interval=10s timeout=20s
sudo pcs -f drbdconf constraint colocation \
add p_nfsserver with p_fs_nfs_internal_info_HA INFINITY
sudo pcs -f drbdconf constraint order \
p_fs_nfs_internal_info_HA then p_nfsserver
#### policies for the nfs exportfs (wich client IP address range can access the NFS share)
sudo pcs -f drbdconf resource create p_exportfs_HA ocf:heartbeat:exportfs \
clientspec=192.168.0.0/24 \
directory=/srv/drbd-nfs/exports/HA fsid=1 \
unlock_on_stop=1 options=rw,sync,no_root_squash \
op start interval=0s timeout=40s \
stop interval=0s timeout=120s \
monitor interval=10s timeout=20s
sudo pcs -f drbdconf constraint order \
p_nfsserver then p_exportfs_HA
sudo pcs -f drbdconf constraint colocation \
add p_exportfs_HA with p_nfsserver INFINITY
sudo pcs cluster cib-push drbdconf
sudo pcs property set no-quorum-policy=ignore
sudo pcs property set stonith-enabled=false
#### OPTIONAL: Check if the configured resources are "started"
sudo pcs status
The next Pacemaker resource primitive to configure in the cluster will create and manage the virtual IP address (192.168.0.199) for the NFS server. Using a virtual IP address makes the NFS server available within the network from a single, unchanging IP address, regardless of which node in the cluster is currently hosting the service. To add the virtual IP address to the Pacemaker-managed resources, enter the following commands:
!!! Only on machine nfsserver1 !!!:
sudo pcs -f drbdconf resource create p_virtip_HA ocf:heartbeat:IPaddr2 \
ip=192.168.0.199 cidr_netmask=24 \
op monitor interval=20s timeout=20s \
start interval=0s timeout=20s \
stop interval=0s timeout=20s
sudo pcs -f drbdconf constraint order \
p_exportfs_HA then p_virtip_HA
sudo pcs -f drbdconf constraint colocation \
add p_virtip_HA with p_exportfs_HA INFINITY
sudo pcs cluster cib-push drbdconf
The final Pacemaker resource primitives that you need to configure and add to your setup use the portblock OCF resource agent. Configuring this will allow for faster TCP reconnections for clients on failover. Some rules are using the virtual IP address (192.168.0.199) of the NFS cluster.
!!! Only on machine nfsserver1 !!!:
sudo pcs -f drbdconf resource create p_pb_unblock ocf:heartbeat:portblock \
action=unblock \
ip=192.168.0.199 \
portno=2049 \
tickle_dir="/srv/drbd-nfs/internal/portblock_tickle_dir" \
reset_local_on_unblock_stop=1 \
protocol=tcp \
op monitor interval=10s timeout=20s
sudo pcs -f drbdconf constraint order \
start p_virtip_HA then p_pb_unblock
sudo pcs -f drbdconf constraint colocation \
add p_pb_unblock with p_virtip_HA INFINITY
sudo pcs cluster cib-push drbdconf
On a client we are installing a nfs client and mounting the nfs share via the virtual cluster IP address of the nfs-cluster (192.168.0.199)
The following commands are repeated on a client system 🙂:
sudo apt install -y nfs-common
sudo mkdir -p /mnt/HA
echo '192.168.0.199:/srv/drbd-nfs/exports/HA /mnt/HA nfs defaults,_netdev 0 0' | sudo tee -a /etc/fstab
sudo mount -a
# Verify if it is mounted
mount | grep /mnt/HA