High Available NFS Server with DRBD

In this guide, I will try to explain how to set up a High Availability NFS cluster. In total we need to setup two Ubuntu-Servers.

Components to make NFS high available:

We are using the DRBD (Distributed Replicated Storage System) to sync block storage between our two Ubuntu-Servers.
In addition we are using pacemaker for the cluster configuration.

Requirements for VMs

A disk dedicated for ubuntu operating system. (20GB)
A second disk that is used for the drbd (shared filesystem) (50 GB or more).
The two machines should be in the same IP subnet (low-latency).
We need a third free IP address in the same subnet as the two ubuntu-server machines (virtual cluster IP address).

Topology

1. Install packages

Good documentation can be found on Docker docs: https://linbit.com/blog/highly-available-nfs-targets-with-drbd-pacemaker The hostnames and IP addresses of the two machines are now entered in the /etc/hosts file on all machines. Thus everyone knows everyone.

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

echo "192.168.0.200 nfsserver1" | sudo tee -a /etc/hosts
echo "192.168.0.201 nfsserver2" | sudo tee -a /etc/hosts

Install DRBD and related utilities

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo apt install -y drbd-utils pacemaker resource-agents-extra iptables

Install NFS on Ubuntu

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo apt install -y nfs-kernel-server nfs-common

Stop and disable NFS server and its related services in systemd because Pacemaker will control these.

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo systemctl disable --now nfs-kernel-server.service

Enable pcsd.service, pacemaker.service and corosync.service

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo systemctl enable pcsd --now

sudo systemctl enable pacemaker --now

sudo systemctl enable corosync --now

Change password for the hacluster user

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo echo 'hacluster:secretpassword' | sudo chpasswd

Create the pacemaker cluster with the following commands:

!!! Only on machine nfsserver1 !!!:

sudo pcs host auth -u hacluster -p secretpassword nfsserver1 addr=192.168.0.200 nfsserver2 addr=192.168.0.201
sudo pcs cluster setup --force linbit-cluster nfsserver1 addr=192.168.0.200 nfsserver2 addr=192.168.0.201
sudo pcs cluster start --all

OPTIONAL: Verifying the pacemaker cluster services and state

sudo pcs status

2. NFS share setup

Now we create logical volume and directories for the NFS share (the secound hard drive on each system (50GB) in the most cases /dev/sdb)

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo pvcreate /dev/sdb
sudo vgcreate nfs_vg /dev/sdb

Now we create two logical volumes.

The first logical volume will be for storing NFS stateful connection information and the “tickle” directory used by the portblock OCF (Open Cluster Framework) resource agent. If the NFS stateful connection is not highly available or otherwise synchronized between cluster nodes, then in some failover cases, it might take a long time for NFS exports to become available. This volume will not hold much data and 20M can be a sufficient size.

The second volume will store data that we will share by using NFS.

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo lvcreate -L 20M -n ha_nfs_internal_lv nfs_vg
sudo lvcreate -l 100%FREE -n ha_nfs_exports_lv nfs_vg
sudo mkdir -p /srv/drbd-nfs/exports/HA
sudo mkdir -p /srv/drbd-nfs/internal

3. Configure DRBD

Now we are creating a DRBD resource configuration file on each system and after that we are initializing the DRBD resource.

The following commands are repeated for the following systems (nfsserver1; nfsserver2):

sudo bash -c 'cat <<EOF > /etc/drbd.d/ha_nfs.res
resource "ha_nfs" {
  volume 0 {
    device "/dev/drbd1000";
    disk "/dev/nfs_vg/ha_nfs_internal_lv";
    meta-disk internal;
  }
  volume 1 {
    device "/dev/drbd1001";
    disk "/dev/nfs_vg/ha_nfs_exports_lv";
    meta-disk internal;
  }
  on "nfsserver1" {
    address 192.168.0.200:7788;
  }
  on "nfsserver2" {
    address 192.168.0.201:7788;
  }
}
EOF'
sudo drbdadm create-md ha_nfs
sudo drbdadm up ha_nfs

Next we are creating filesystems on the two logical volumes and promote one of the machines to primary

!!! Only on machine nfsserver1 !!!:

sudo drbdadm primary --force ha_nfs

#Watch the sync process with the following command. Wait for sync to be completed
watch -n 1 cat /proc/drbd

sudo mkfs.ext4 /dev/drbd1000
sudo mkfs.ext4 /dev/drbd1001
sudo mount /dev/drbd1000 /srv/drbd-nfs/internal
sudo mkdir /srv/drbd-nfs/internal/nfs_info_dir
sudo mkdir /srv/drbd-nfs/internal/portblock_tickle_dir
sudo umount /dev/drbd1000

4. Configure pacemaker rules

First we create a cib database with the name drbdconf. Now we create the first pacemaker rule. If a machine in the pacemaker cluster is crashed, the pacemaker rule promotes the only operational machine left in the cluster to be the primary drbd machine. The last command is updating the drbdconf cib database.

!!! Only on machine nfsserver1 !!!:

sudo pcs cluster cib drbdconf

sudo pcs -f drbdconf resource create p_drbd_ha_nfs ocf:linbit:drbd \
drbd_resource=ha_nfs \
op start interval=0s timeout=40s \
stop interval=0s timeout=100s \
monitor interval=31s timeout=20s role=Unpromoted \
monitor interval=29s timeout=20s role=Promoted

sudo pcs -f drbdconf resource promotable p_drbd_ha_nfs \
promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true

sudo pcs cluster cib-push drbdconf

Now we need to configure a file system primitive in Pacemaker so that the file system that will back the NFS share is only mounted on a node in a primary role for the backing DRBD resource. The file system primitive is based on the Filesystem OCF resource agent and configures colocation and order constraints to accomplish this.

Because the DRBD resource has two volumes, we will need to configure two file system primitives in Pacemaker.

Enter the following commands to configure the file system primitive in Pacemaker, for the NFS "stateful information directory" and the "data share directory":

!!! Only on machine nfsserver1 !!!:

#### policies for the nfs "stateful information directory"

sudo pcs -f drbdconf resource create p_fs_nfs_internal_info_HA ocf:heartbeat:Filesystem \
device=/dev/drbd1000 \
directory="/srv/drbd-nfs/internal" \
fstype=ext4 \
run_fsck=no \
op start interval=0s timeout=60s \
stop interval=0s timeout=60s \
monitor OCF_CHECK_LEVEL=0 interval=15s timeout=40s

sudo pcs -f drbdconf constraint order \
promote p_drbd_ha_nfs-clone then start p_fs_nfs_internal_info_HA

sudo pcs -f drbdconf constraint colocation \
add p_fs_nfs_internal_info_HA with p_drbd_ha_nfs-clone INFINITY with-rsc-role=Promoted

#### policies for the nfs "data share directory"

sudo pcs -f drbdconf resource create p_fs_nfsshare_exports_HA \
ocf:heartbeat:Filesystem \
device=/dev/drbd1001 \
directory="/srv/drbd-nfs/exports/HA" \
fstype=ext4 \
run_fsck=no \
op start interval=0s timeout=60s \
stop interval=0s timeout=60s \
monitor OCF_CHECK_LEVEL=0 interval=15s timeout=40s

sudo pcs -f drbdconf constraint order \
promote p_drbd_ha_nfs-clone then start p_fs_nfsshare_exports_HA

sudo pcs -f drbdconf constraint colocation \
add p_fs_nfsshare_exports_HA with p_drbd_ha_nfs-clone INFINITY with-rsc-role=Promoted

sudo pcs cluster cib-push drbdconf

OPTIONAL: We can enter a df -h command on the machine in the cluster that the Filesystem resources are started on, to further verify that Pacemaker has mounted the two file systems, backed by the DRBD devices.

df -h

Example Output:

Filesystem                   Size  Used Avail Use% Mounted on
[...]
/dev/drbd1000                 18M   60K   16M   1% /srv/drbd-nfs/internal
/dev/drbd1001                3.9G  8.0K  3.7G   1% /srv/drbd-nfs/exports/HA
[...]

After configuring the Pacemaker resource primitives that will back the HA NFS share, the DRBD volumes and the file systems that will mount on them, we can configure the Pacemaker primitives that will serve the NFS share in the cluster. We are using the IP address 192.168.0.199 as our NFS cluster IP address.

For the secound rule set (exportfs) we are defining a network IP address of 192.168.0.0/24, so all clients in that network can access the NFS share (192.168.0.1 - 192.168.0.254).

After that we set the no-quorum-policy=ignore and stonith-enabled=false because we are operating the cluster only with two machines.

!!! Only on machine nfsserver1 !!!:

#### policies for the nfs-server

sudo pcs -f drbdconf resource create p_nfsserver ocf:heartbeat:nfsserver \
nfs_shared_infodir=/srv/drbd-nfs/internal/nfs_info_dir \
nfs_ip=192.168.0.199 \
op start interval=0s timeout=40s \
stop interval=0s timeout=20s \
monitor interval=10s timeout=20s

sudo pcs -f drbdconf constraint colocation \
add p_nfsserver with p_fs_nfs_internal_info_HA INFINITY

sudo pcs -f drbdconf constraint order \
p_fs_nfs_internal_info_HA then p_nfsserver

#### policies for the nfs exportfs (wich client IP address range can access the NFS share)

sudo pcs -f drbdconf resource create p_exportfs_HA ocf:heartbeat:exportfs \
clientspec=192.168.0.0/24 \
directory=/srv/drbd-nfs/exports/HA fsid=1 \
unlock_on_stop=1 options=rw,sync,no_root_squash \
op start interval=0s timeout=40s \
stop interval=0s timeout=120s \
monitor interval=10s timeout=20s

sudo pcs -f drbdconf constraint order \
p_nfsserver then p_exportfs_HA

sudo pcs -f drbdconf constraint colocation \
add p_exportfs_HA with p_nfsserver INFINITY

sudo pcs cluster cib-push drbdconf

sudo pcs property set no-quorum-policy=ignore
sudo pcs property set stonith-enabled=false

#### OPTIONAL: Check if the configured resources are "started"
sudo pcs status

The next Pacemaker resource primitive to configure in the cluster will create and manage the virtual IP address (192.168.0.199) for the NFS server. Using a virtual IP address makes the NFS server available within the network from a single, unchanging IP address, regardless of which node in the cluster is currently hosting the service. To add the virtual IP address to the Pacemaker-managed resources, enter the following commands:

!!! Only on machine nfsserver1 !!!:

sudo pcs -f drbdconf resource create p_virtip_HA ocf:heartbeat:IPaddr2 \
ip=192.168.0.199 cidr_netmask=24 \
op monitor interval=20s timeout=20s \
start interval=0s timeout=20s \
stop interval=0s timeout=20s

sudo pcs -f drbdconf constraint order \
p_exportfs_HA then p_virtip_HA

sudo pcs -f drbdconf constraint colocation \
add p_virtip_HA with p_exportfs_HA INFINITY

sudo pcs cluster cib-push drbdconf

The final Pacemaker resource primitives that you need to configure and add to your setup use the portblock OCF resource agent. Configuring this will allow for faster TCP reconnections for clients on failover. Some rules are using the virtual IP address (192.168.0.199) of the NFS cluster.

!!! Only on machine nfsserver1 !!!:

sudo pcs -f drbdconf resource create p_pb_unblock ocf:heartbeat:portblock \
action=unblock \
ip=192.168.0.199 \
portno=2049 \
tickle_dir="/srv/drbd-nfs/internal/portblock_tickle_dir" \
reset_local_on_unblock_stop=1 \
protocol=tcp \
op monitor interval=10s timeout=20s

sudo pcs -f drbdconf constraint order \
start p_virtip_HA then p_pb_unblock

sudo pcs -f drbdconf constraint colocation \
add p_pb_unblock with p_virtip_HA INFINITY

sudo pcs cluster cib-push drbdconf

5. Mount NFS share on a client

On a client we are installing a nfs client and mounting the nfs share via the virtual cluster IP address of the nfs-cluster (192.168.0.199)

The following commands are repeated on a client system 🙂:

sudo apt install -y nfs-common
sudo mkdir -p /mnt/HA
echo '192.168.0.199:/srv/drbd-nfs/exports/HA /mnt/HA nfs defaults,_netdev 0 0' | sudo tee -a /etc/fstab
sudo mount -a
# Verify if it is mounted
mount | grep /mnt/HA

Previous Post Next Post

Add a comment

Name *

Comment *