GNU Queue load-balancing system: Why does queue give me an error about fcntl and locks?

GNU Queue load-balancing system:
Why does queue give me an error about fcntl and locks?

This usually happens on an NFS client in a cluster that is not running rpc.lockd and rpc.statd (or just lockd and statd on some systems) on both the NFS server and the NFS client.

Most commercials Unixes install statd and lockd, and as the source for statd and lockd are widely available, this should not present a problem for most users. On non-GNU/Linux systems these daemons are required to support reliable NFS file locks.

GNU/Linux does not seem normally to run statd and lockd, which seems to have caused a problem for one user. GNU Queue runs fine on my RedHat boxes, which seem to support NFS file locking (it would be most surprising if they didn't!) so it seems GNU/Linux moved file locking support into the kernel.

NFS file locking is required for a large number of commercial and free apps to run properly. (SAS, WordPerfect, sendmail, etc.) so it's probably a good idea to be running the daemons anyway.

However, I've suggested a patch replacing the fctnl() code on systems where locking is not supported with lock file locking (i.e., creating a file "file.LOCK" in the same directory to indicate that "file" has been locked.) Under NFS, this requires a sleep(4) to ensure synchronization and safe propagation of the lockfile throughout the cluster, so it is much slower that using statd and lockd. It is also less reliable, since statd and lockd remove locks when a client reboots, but what's to remove the lockfile if the client crases? The free, popular procmail(1) delivery implements lockfile file locking over NFS correctly; this source should be consulted for anyone wishing to write the patch.

Another solution is to put the spooldir on AFS or another high-reliability network filesystem that supports fcntl() file locking.

A final solution is to eliminate the need for locks and NFS altogether in Queue and rely only on TCP/IP transmission of job info. This is planned and in development , please support the developers by offering to give them a hand.
werner.krebs@yale.edu

Previous:	Copyright
Next:	Is time-synchronization an issue?

This document is: http://bioinfo.mbb.yale.edu:80/cgi-bin/fom?file=27

[Search]	[Appearance]		[Show Edit Commands]
This is Faq-O-Matic 2.606.