Posts Tagged ‘clusters’

cfengine, can’t stat in copy and reverse dns

Wednesday, July 9th, 2008

Well I’ve been using cfengine for a number of years now and thought I had paid my dues already when I initially took on its steep learning curve… Well today I had a little run in with cfengine that made me feel as frustrated as when I was a newbie to this software, but I guess it was a newbie mistake that Im sure I learned years ago that I just happened to forget over the years when adding a cluster of new hosts to the mix - reverse dns.
The issue came about when I was configuring a new group of servers. I was on the final one when I simply installed cfengine on the host, scp’ed over cfagent.conf, cfservd.conf, and update.conf from a host that I had just been successful with. But after running “cfagent -v” i ran into the familiar “Can’t stat /var/lib/cfengine… in copy” which struck me as odd because it had just worked on all the other hosts. After checking the usual suspects such as the grant: function in the cfservd.conf to make sure permissions were explicitly granted on the server side, the hostname and domain name configured on the client, typos, cfkeys, cfservd started ?, etc, nothing seemed to work and adding the debug options -d seemed to frustrate me even more. As a last resort I took a packet capture to see what was going on between the client-> server for both the system that was failing and one that was working. I didnt think it would help much but sure enough after crawling through the capture packet by packet I seen the issue in one of the packets data field that looked something like this…

CAUTH IP IP user - non-working host
CAUTH IP hostname user - working host

This is when the little cfl lightbulb went off in my head and I decided to have a look at reverse dns. Sure enough all the hosts had reverse dns configured but this last one.

Although other functions such as directories,files,editfiles seemed to authenticate and run fine without reverse dns it seemed the copy function was failing because authentication under cfservd and the grant directive is based on the domain *.domain.com and not the IP… sheesh… it seems the parameter SkipVerify can be applied globally here and workaround hosts not having reverse dns, however I decided not to use this option since we control the reverse dns and it really should of been configured, not sure why it was not…

as soon as I added the reverse dns for the host cfagent ran without a hitch…

rsync bug

Thursday, April 17th, 2008

The rsync folks just recently released rsync 3.0 last month with a bug-fix release and a security release earlier this month. Unfortunately, after upgrading one of my critical systems that feeds a cluster of about 10 machines, I ran into an ugly little bugger that prevented my cluster’d nodes from successfully pulling there data from the central rsync machine resulting in stale files on the clustered nodes. Heres the error I seen when running my rsync manually:

$ rsync -t 10.9.8.7::module/* /dest
rsync: link_stat “/*” (in module) failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1515) [receiver=3.0.2]

Instead of using the wildcard I tried one file specifically and that seemed to work just fine so I knew something was up with the wildcard thrown in there…After a little searching I confirmed my suspicions…

https://bugzilla.samba.org/show_bug.cgi?id=5388

Unfortunately this required a manual patch as the current version remains unpatched at the time of this writing and unavailable via package managers such as apt-get, portage, ports, yum, etc. Luckily this was easy enough as the patching and compilation was very smooth as I would expect…

#cd /usr/local/src/
#wget http://samba.org/ftp/rsync/src/rsync-3.0.2.tar.gz
#wget http://samba.org/ftp/rsync/src/rsync-patches-3.0.2.tar.gz
#tar -zxvf rsync-3.0.2.tar.gz
#tar -zxvf rsync-patches-3.0.2.tar.gz
#cd rsync-3.0.2
#patch util.c patches/util.c
#./configure
#make
#make install
#cp /usr/local/bin/rsync /usr/bin/rsync
#/etc/init.d/rsync restart

And once again my cluster’d nodes are happy again :).

rsync 3.0.2 wildcard bug…

Thursday, April 17th, 2008

The rsync folks just recently released rsync 3.0 last month with a bug-fix release and a security release earlier this month. Unfortunately, after upgrading one of my critical systems that feeds a cluster of about 10 machines, I ran into an ugly little bugger that prevented my cluster’d nodes from successfully pulling there data from the central rsync machine resulting in stale files on the clustered nodes. Heres the error I seen when running my rsync manually:

$ rsync -t 10.9.8.7::module/* /dest
rsync: link_stat “/*” (in module) failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(1515) [receiver=3.0.2]

Instead of using the wildcard I tried one file specifically and that seemed to work just fine so I knew something was up with the wildcard thrown in there…After a little searching I confirmed my suspicions…

https://bugzilla.samba.org/show_bug.cgi?id=5388

Unfortunately this required a manual patch as the current version remains unpatched at the time of this writing and unavailable via package managers such as apt-get, portage, ports, yum, etc. Luckily this was easy enough as the patching and compilation was very smooth as I would expect…

#cd /usr/local/src/
#wget http://samba.org/ftp/rsync/src/rsync-3.0.2.tar.gz
#wget http://samba.org/ftp/rsync/src/rsync-patches-3.0.2.tar.gz
#tar -zxvf rsync-3.0.2.tar.gz
#tar -zxvf rsync-patches-3.0.2.tar.gz
#cd rsync-3.0.2
#patch util.c patches/util.c
#./configure
#make
#make install
#cp /usr/local/bin/rsync /usr/bin/rsync
#/etc/init.d/rsync restart

And once again my cluster’d nodes are happy again :).