Archive for the ‘availability’ Category

The Dan Kaminsky Microsoft DNS Patch Sideeffect

Wednesday, July 30th, 2008

So its been a few weeks since most of us have patched our vulnerable dns servers, but I hadn’t noticed this little bonus until today which actually made me laugh. You see a few years ago I had noticed an annoying little behavior with the way Microsoft’s DNS Server handles outgoing client connection for domains/servers that are listed under the Forwarders tab. We use this Forwarders tab to list frequently queried domains in which we host a copy of the zone file in rbldnsd so as to not go to the internet to find the answer to. This gives us the benefit of returning an answer to a dns query much faster and saves us the extra bandwidth. This is highly beneficial to our mail systems which process on average 100 million messages per month, mostly spam of course. So back when we had implemented the rbldnsd system, we had placed Linux Virtual Server in front of the rbldnsd to load balance the traffic accross 8 or so machines. After pointing the Forwarded domains to the LVS VIP, I had expected hundreds, even thousands of connections to get sprayed accross the rbldnsd farm, but uh-uh, nope. There were only two connections listed to two of the backend servers, however all the queries were getting answered.

me@director:~$sudo ipvsadm -L
IP Virtual Server version 1.2.1 (size=32768)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP w.x.y.z:domain wrr
-> server1.domain.com:domain Route 2 0 1
-> server2.domain.com:domain Route 2 0 0
-> server3.domain.com:domain Route 2 0 0
-> server4.domain.com:domain Route 2 0 1
-> server5.domain.com:domain Route 2 0 0
-> server6.domain.com:domain Route 2 0 0
-> server7.domain.com:domain Route 2 0 0
-> server8.domain.com:domain Route 2 0 0

This had me scratching my head at first and then after a few packet captures later, I realized that Microsoft was opening 1 socket connection and pushing all the forwarded queries through it. Gee Wiz Microsoft! Why would you do such a thing? I figured that opening and closing socket connections carries an overhead and could also potentially exhaust all available udp ports in a very short amount of time, I can understand why Microsoft would implement it in this way. However this is exactly the insufficient socket entropy that is described in Dan’s advisory as flawed, and from my perspective I hated it as I couldn’t load balance all the forwarded dns queries across each machine that had rbldnsd running on them. Luckily rbldnsd wasn’t the primary service on those machines that we were load balancing so I had decided to just let it be after a spending a few minutes of looking for a workaround then banging my head on my desk out of frustration. Availability was still guaranteed and rbldnsd being as fast and memory efficient as it was, was performing fine in this configuration, so I let it be. I had bigger fish to fry at the time. Fast forward a few years later and a Dan Kaminsky patched Microsoft DNS Server, and wallah, this is what I noticed today…

me@director:~$sudo ipvsadm -L
IP Virtual Server version 1.2.1 (size=32768)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP w.x.y.z:domain wrr
-> server1.domain.com:domain Route 2 0 264
-> server2.domain.com:domain Route 2 0 258
-> server3.domain.com:domain Route 2 0 256
-> server4.domain.com:domain Route 2 0 252
-> server5.domain.com:domain Route 2 0 250
-> server6.domain.com:domain Route 2 0 252
-> server7.domain.com:domain Route 2 0 252
-> server8.domain.com:domain Route 2 0 252

and this is with a modified udp timeout of 10 seconds…

me@director:~$ sudo ipvsadm -L --timeout
Timeout (tcp tcpfin udp): 60 10 10

Awesome, entropy, security, and load balancing :). Thanks Dan!

Ldirectord missing dependency in Debian

Monday, June 23rd, 2008

So i came across this the other day while trying to configure ldirectord to load balance pop3 services.

Can’t locate Mail/POP3Client.pm

It seems there is a missing dependency that is specific to debian etch I believe. I was a little disappointed as I’ve had little other issues with LVS and ldirectord but the fix was easy enough and I was able to find bug #421415 in Debian’s bug tracking system so Im sure I was not the first or the last to run into this. If you run into this just run apt-get install libmail-pop3client-perl and you should be good to go.

Nagios 3 Released!

Thursday, March 13th, 2008

Nagios 3 was quietly released today! The beta and release canadates have been out for some time with a lot of new features, changes, optimizations, and bug fixes which is probably why there wasn’t a whole lot of fuss about todays release.

Now I must say I havent spent a whole lot of time with Nagios lately but hopefully this will give me a reason to get my hands dirty again. There has been a ton of work done on this release, so I won’t even begin to list them off here but I will point you the the whats new documentation so you can take a look for yourself. I really cant wait to get into it.

I also want to send my congrats to the Nagios developers who were involved in this milestone. As a long time part of the Nagios community your work is much appreciated.

Opsview - Nagios at the core…

Saturday, January 26th, 2008

As a die hard Nagios user ever since the early beta releases, I was excited when I came across some blog postings relating to Nagios and found Opsview late last summer. At first glance, I was a bit skeptical as there have been many offshoot projects of Nagios over the years as seen in the Nagios Exchange. That’s not too say that those projects aren’t great and/or viable solutions to the shortcomings of Nagios; I’ve got a published project or two registered at Nagios Exchange myself. But after going over some of the documentation for Opsview, I must say I was somewhat impressed as Opsview seemed to be implementing Nagios right. This project looks to take care of a lot of the nitty gritty details that many of the projects in the Nagios Exchange attempt to solve. However, as a relatively advanced Nagios user, I wasn’t sure if it would provide the flexibility I needed so I downloaded the vmware player image of Opsview to take a look. After poking around for a few minutes it seemed to be very flexible indeed and I was sold. However, finding the time to actually begin testing the product in a production environment and then replacing my current Nagios implementation is another story. This week, I was finally able to get Opsview installed and monitoring a subset of my production network for testing purposes.

Overall, the installation went smooth as the documentation is straight forward and using aptitude made the task trivial. During the setup of hosts and services, I did run into some minor issues and was a bit frustrated by the configuration through the web based interface. I guess I’m just used to configuration files instead of all the point and clicking. Hopefully this is where the database comes in, although I am still learning the layout and structure so point and click will have to do for now. I was able to work through my frustrations and issues and did find some help in the opsview-users mailing list. Thats always a good sign, but it’s still a bit early for me to tell whether I will stick with Opsview and migrate my current Nagios implementation over yet. Only time will tell.