Ten Ways to Solve DNS Problems (or: the web is amazing)

So I wrote about my woes with DNS, bemoaning how our VPS provider GleSYS's DNS servers were not performing well enough. As usual with the web, I was blown away by the feedback; not only did I get over a dozen tips on what to do, GleSYS themselves chimed in to say they've fixed the problem.

Either that's a PR move on their part, or their technicians are very attentive. I'd like to think the latter. So without further ado, here are the ten ways in which to solve the case of the slow DNS look-up:

There are of course pros and cons to every single one of these options above, and I'll just quickly address some obvious questions.

First up, BIND. As much as I love ISC software, BIND feels a little too heavy-duty for a one-off thing like this.

djbdns is, I'm sure, quality software too; here the problem is deployment. For djbdns, "integrating with the OS" means "write your own rc replacement and shove it down people's throats". I refer of course to the bane that is daemontools. I gave it a shot with qmail, never ever again.

As for OpenDNS and Google Public DNS, I'd have to benchmark them over a week or so to know what to think of them. However I'd much prefer to do business with people who will be accountable for downtime.

By far the most interesting of them is Unbound, because of what it says on the box: a lightweight caching DNS server.

For now it looks like GleSYS have fixed things on their end; if this becomes a problem again, it might be better to change VPS provider.


GleSYS, Y U NO DNS?

... or why DNS lookups are a dangerous thing.

At my current employer we specialize in making campaigns, and this particular one is a Facebook Canvas type of thing, meaning we talk to the Facebook API.

It turns out though, one day after launching the campaign, that the local DNS resolver is sometimes unable to resolve the name facebook.com or graph.facebook.com in a timely fashion.

Looking into the matter I wrote a script for benchmarking the performance of socket.gethostbyaddr(), for your convenience as well as future reference:

#!/usr/bin/env python2.6

import sys, time, socket

ts = []
def test_host(h):
    t0 = time.time()
    try:
        socket.gethostbyaddr(h)
    except:
        print "resolve failed", repr(h)
    ts.append(time.time() - t0)

def avg(L): return sum(L)/float(len(L))
def med(L):
    L=list(sorted(L))
    if len(L)&1:
        return L[int(len(L)/2)]
    else:
        return (L[int(len(L)/2)-1]+L[int(len(L)/2)])/2.0

t0 = time.time()
test_host("facebook.com")
test_host("www.facebook.com")
test_host("graph.facebook.com")
test_host("api.facebook.com")
test_host("api-read.facebook.com")
test_host("api-video.facebook.com")
print "started %.2f, completed in %.2f" % (t0, time.time() - t0)
print "slowest %.4f, fastest %.4f" % (max(ts), min(ts))
print "median %.4f, average %.4f" % (med(ts), avg(ts))

We use GleSYS for our VPS needs, which is a common provider in Sweden. Guess what their DNS performance looks like? Sometimes it takes up to 40 seconds for them to resolve facebook.com, when two seconds earlier they could answer the query in under 1ms.

For now I just chucked the relevant hostnames into /etc/hosts, so: I could use a tip on a lightweight recursive DNS server! (Not BIND or djbdns.)


RSS 2.0