Posts Tagged ‘ NFS

NFS cluster status and HighlyAvailableNFS

While working on an NFS cluster setup, I stumbled upon these two articles which are maybe helpful for someone:

http://billharlan.com/pub/papers/NFS_for_clusters.html

Saturated network?

$ time dd if=/dev/zero of=testfile bs=4k count=8182
  8182+0 records in
  8182+0 records out
  real    0m8.829s
  user    0m0.000s
  sys     0m0.160s

 

First exercise your disk with your own code or with a simple write operation like writing files should be enough to test network saturation. When profiling reads instead of writes, call umount and mount to flush caches, or the read will seem instantaneous:

$ cd /
$ umount /mnt/test
$ mount /mnt/test
$ cd /mnt/test
$ dd if=testfile of=/dev/null bs=4k count=8192

Check for failures on a client machine with:

  $ nfsstat -c
or
  $ nfsstat -o rpc

If more than 3% of calls are retransmitted, then there are problems with the network or NFS server. Look for NFS failures on a shared disk server with:

  $ nfsstat -s
or
  $ nfsstat -o rpc

It is not unreasonable to expect 0 badcalls. You should have very few “badcalls” out of the total number of “calls.”

Lost packets

NFS must resend packets that are lost by a busy host. Look for permanently lost packets on the disk server with:

$ head -2 /proc/net/snmp | cut -d' ' -f17
  ReasmFails
  2

If you can see this number increasing during nfs activity, then you are losing packets. You can reduce the number of lost packets on the server by increasing the buffer size for fragmented packets:

$ echo 524288 > /proc/sys/net/ipv4/ipfrag_low_thresh
$ echo 524288 > /proc/sys/net/ipv4/ipfrag_high_thresh

This is about double the default.

Server threads

See if your server is receiving too many overlapping requests with:

$ grep th /proc/net/rpc/nfsd
  th 8 594 3733.140 83.850 96.660 0.000 73.510 30.560 16.330 2.380 0.000 2.150

The first number is the number of threads available for servicing requests, and the the second number is the number of times that all threads have been needed. The remaining 10 numbers are a histogram showing how many seconds a certain fraction of the threads have been busy, starting with less than 10% of the threads and ending with more than 90% of the threads. If the last few numbers have accumulated a significant amount of time, then your server probably needs more threads.
Increase the number of threads used by the server to 16 by changing RPCNFSDCOUNT=16 in /etc/rc.d/init.d/nfs

Invisible or stale files

If separate clients are sharing information through NFS disks, then you have special problems. You may delete a file on one client node and cause a different client to get a stale file handle. Different clients may have cached inconsistent versions of the same file. A single client may even create a file or directory and be unable to see it immediately. If these problems sound familiar, then you may want to adjust NFS caching parameters and code multiple attempts in your applications.

 

https://help.ubuntu.com/community/HighlyAvailableNFS

Introduction

 

In this tutorial we will set up a highly available server providing NFS services to clients. Should a server become unavailable, services provided by our cluster will continue to be available to users.

Our highly available system will resemble the following: drbd.jpg