So you're a joker?

I read an article by some guy called SM on the subject of jokers, he's saying the world is full of jokers - people who talk a lot but do little.

I am a fuck-up at my current workplace - I handle sick leaves poorly, I show up for work five minutes late rather than five minutes early; I am a fuck-up at house chores - I rarely do the dishes, laundry is everywhere, cleaning is the last thing I think about; I sometimes fuck up with friends - I miss out on keeping in touch, I borrow money and forget about it, I hit on some poor guy's ex, the list goes on.

I am not a fuck-up in my true nature, in fact I'm probably more of an over-zealous Asperger kid inside. I don't give up before it's too late, and I find a way when I need to. I move heaven and earth, as SM puts it.

At first the logics seem counter-intuitive, but really it's an ages old problem: you have an infinite set of chores, and a limited rate of chore churning. How do you balance the workload; what do you do well, half-assed and not at all? More often than not, there is a conflict of interest between the various aspects of life. You have to call the shots.

The todo list is the only way to avoid being a joker. You will have to defer tasks. That's just reality. You will sometimes defer tasks up to a point where you realize, "ah man wish I was going to do this but I'm not." That's not being a joker, that's just you being rational.

So while I agree that it's a good thing to go into tunnel vision mode and just churn out a product in no time, it's also not a viable lifestyle. SM makes it seem as if the only way to live is 150% speed all the time and get rich.

Call me complicated, but I want more out of life than that. If what it takes to make piles of money is complete tunnel vision, then I shall have none of it. Let me sit smug-faced in my middle-class bed and enjoy life before it flashes me by.


Ten Ways to Solve DNS Problems (or: the web is amazing)

So I wrote about my woes with DNS, bemoaning how our VPS provider GleSYS's DNS servers were not performing well enough. As usual with the web, I was blown away by the feedback; not only did I get over a dozen tips on what to do, GleSYS themselves chimed in to say they've fixed the problem.

Either that's a PR move on their part, or their technicians are very attentive. I'd like to think the latter. So without further ado, here are the ten ways in which to solve the case of the slow DNS look-up:

There are of course pros and cons to every single one of these options above, and I'll just quickly address some obvious questions.

First up, BIND. As much as I love ISC software, BIND feels a little too heavy-duty for a one-off thing like this.

djbdns is, I'm sure, quality software too; here the problem is deployment. For djbdns, "integrating with the OS" means "write your own rc replacement and shove it down people's throats". I refer of course to the bane that is daemontools. I gave it a shot with qmail, never ever again.

As for OpenDNS and Google Public DNS, I'd have to benchmark them over a week or so to know what to think of them. However I'd much prefer to do business with people who will be accountable for downtime.

By far the most interesting of them is Unbound, because of what it says on the box: a lightweight caching DNS server.

For now it looks like GleSYS have fixed things on their end; if this becomes a problem again, it might be better to change VPS provider.


GleSYS, Y U NO DNS?

... or why DNS lookups are a dangerous thing.

At my current employer we specialize in making campaigns, and this particular one is a Facebook Canvas type of thing, meaning we talk to the Facebook API.

It turns out though, one day after launching the campaign, that the local DNS resolver is sometimes unable to resolve the name facebook.com or graph.facebook.com in a timely fashion.

Looking into the matter I wrote a script for benchmarking the performance of socket.gethostbyaddr(), for your convenience as well as future reference:

#!/usr/bin/env python2.6

import sys, time, socket

ts = []
def test_host(h):
    t0 = time.time()
    try:
        socket.gethostbyaddr(h)
    except:
        print "resolve failed", repr(h)
    ts.append(time.time() - t0)

def avg(L): return sum(L)/float(len(L))
def med(L):
    L=list(sorted(L))
    if len(L)&1:
        return L[int(len(L)/2)]
    else:
        return (L[int(len(L)/2)-1]+L[int(len(L)/2)])/2.0

t0 = time.time()
test_host("facebook.com")
test_host("www.facebook.com")
test_host("graph.facebook.com")
test_host("api.facebook.com")
test_host("api-read.facebook.com")
test_host("api-video.facebook.com")
print "started %.2f, completed in %.2f" % (t0, time.time() - t0)
print "slowest %.4f, fastest %.4f" % (max(ts), min(ts))
print "median %.4f, average %.4f" % (med(ts), avg(ts))

We use GleSYS for our VPS needs, which is a common provider in Sweden. Guess what their DNS performance looks like? Sometimes it takes up to 40 seconds for them to resolve facebook.com, when two seconds earlier they could answer the query in under 1ms.

For now I just chucked the relevant hostnames into /etc/hosts, so: I could use a tip on a lightweight recursive DNS server! (Not BIND or djbdns.)


Force a Git branch to remain merged with master

At my workplace, we decided we should have two branches that are automatically rolled out on development and production servers respectively, and so I set out to ascertain that developers first make sure the master branch works; I thought the end-result would be useful to others so here it is:

#!/usr/bin/env python2.6

# assert that updating refs/heads/dev or refs/heads/prod is not possible
# without first putting that commit into the ancestry of refs/heads/master.

import sys
import subprocess

master_ref = "refs/heads/master"
checked_refs = ("refs/heads/dev", "refs/heads/prod")

def git_merge_base(a, b):
    "find the earliest common ancestor of a, b"
    args = ["git", "merge-base", a, b]
    p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=sys.stderr)
    if p.wait() != 0:
        sys.stderr.write("git-merge-base exited %d\n" % p.returncode)
        sys.exit(128)
    return p.stdout.read().strip()

def check(cid):
    base = git_merge_base(cid, master_ref)
    if base != new:
        sys.stderr.write("%s is not an ancestor of %s\n"
                         "%s diverges at %s\n"
                         % (cid, master_ref, new, base))
        sys.exit(1)
    return True

if __name__ == "__main__":
    for line in sys.stdin:
        (old, new, refname) = line.strip().split(" ", 2)
        if refname in checked_refs:
            check(new)

Warts of Python: Ternary Expressions

The problem with Python's ternary operator is that it breaks up the two contrasting values on two opposite sides of the expression.

To break it up into how I myself work with it, let's look at an example that made me opt for a hack.

form.data["type"] = "business" if request.form.get("business") else "private"

And the hack,

form.data["type"] = ("private", "business")[bool(request.form.get("business"))]

The reason is obvious: the above consolidates the literal data to one side of the expression, making it easier to follow the code. (Apart from the occasional guy who doesn't know about the perverted powers of indexing by boolean values!)

The only time the Python ternary works is with very simple conditions, no strike that -- with very short conditions. Complexity has nothing to do with it.

business = bool(request.form.get("business"))
form.data["type"] = "business" if business else "private"
##
get = request.form.get
form.data["type"] = "business" if get("business") else "private"

So I submit that the BDFL made a mistake. The ternary expression in Python sucks.


pylibmc 1.2.1 released

pylibmc 1.2.1 has been released!

Bug fix release, mitigates issues with Python 2.5 support, tests and NUL byte keys.

Some hilights from 1.2.0:

  • Fixed GIL issues
  • CAS support (ketralnis)
  • SASL authentication (Remoun)
  • Added auto_eject, num_replicas, remove_failed behaviors
  • Improved tests
  • Published documentation source

Perhaps you considered Mac OS X 10.7 Lion

While I appreciate a new version as much as the next tech guy, I wouldn't recommend anybody to update yet.

  • it's slow - connecting to 127.0.0.1:8080 took me 62 seconds, mundane tasks like cmd-tabbing lags sometimes, etc. Something's off.
  • exaggerated animations - due to their graphical intensity, they have perceptible rendering times... in a desktop UI. Does not go.
  • extremely cumbersome to use without a trackpad - means you sit with the computer in your lap rather than plug in external keyboards etc.
  • they inverted the scroll direction. What the hell, man. "Natural scrolling"? Sheer gimmick value.
  • Launchpad... I'm not sure why you would prefer a modal interface that only allows dragging as mode of operation. Gimmick again.
  • Mission Control seemed like a good idea but it turns out you can't rearrange spaces, nor rename them - so rather "Mission Happenstance" than Control (and don't get me started on the auto-rearrange feature)
  • you can now scroll past the beginning and end of the content, and very few applications are designed for this leading to UIs looking goofy
  • "Press and hold" disables keyboard repeat... for most keys. (The feature itself only works on Apples own layouts, with no document detailing how to implement this for your own keyboard layouts having been published.)
  • Browser back/forward with three finger swipes no longer work, Apple decided to use them for Spaces - you have to use the keyboard or nav buttons, unless you happen to use Safari which now uses horizontal two-finger swipes
And on the plus side...
  • refreshing new UI look
  • fullscreen now integrates with Spaces
  • umm... I guess the new UI is pretty neat, yeah

pylibmc 1.2.0

pylibmc 1.2.0 is out!

This release is for the people behind reddit.com, for helping push development forward. Keep doing your thing.

  • sevmer.org versioning scheme
  • Fixed GIL issues
  • Added CAS support (ketralnis)
  • Added SASL authentication (Remoun)
  • Added more detail to errors (spladug)
  • Added mapping-like behavior for clients
  • Fixed build errors on Mac OS X
  • Moved to nose for testing
  • Added auto_eject behavior
  • Added num_replicas behavior
  • Added remove_failed behavior
  • Removed cache_lookups behavior
  • Improved repr of clients (noah256)
  • Improved IPv6 support (JshWright)
  • Improved pooling behavior so it doesn't cause lock-ups
  • Improved tests and testing foundation
  • Improved documentation and structure
  • Internalized Sphinx documentation
  • Bunch of other stuff

Scaling Web Applications & Tower Defenses

If I were to try to teach a 15-year-old how to scale well on the Internet, I would let him play some Warcraft III tower defense map.

The analogy is pretty striking really: you have a more or less constant stream of visitors (creeps), and your goal is to not only meet the performance needs (deal enough damage), but also one of efficiency: how much gold do you need to deal enough damage to kill all the creeps -- or how many dollars do you need to perform well enough to handle all the traffic.

I'm not going to tell you how, because the TD analogy also lifts an interesting aspect: unless you know how the game works and what the alternatives are, you're going to suck at it. Perhaps less than some, probably more than others, but you're going to suck nonetheless.

Thought I'd let you know.


Qmail woes

I wouldn't recommend anyone ever to use qmail, no matter how beautiful the architecture might strike you. It's confusing, takes a solid two hours to even find a document to read, and there doesn't at all seem to be much community support for it. Most of the time you're left to your own devises with a--excuse my saying so--retarded mail server.

Maybe you know about Transport Layer Security, or TLS for short. It's a common enough thing, SMTP servers are known for their use of TLS. Well, Qmail probably isn't, because the only TLS support there is is a huge patch whose website is presently down.

As it turns out, one of these days my TLS support had just given up. Qmail doesn't tell you though, oh no -- not at all. You have to dump the I/O it does with Gmail to know anything is going wrong.

If I were you, and I was using Qmail with TLS support (like I do), then I'd check if my SMTP server goes wonky after I send STARTTLS. Mine did, and didn't tell me.

Why does this matter? Well, as it turns out, Gmail doesn't deliver messages when TLS fails like this, and I can't really blame them because my server advertised support for TLS.

So in short: stay away from Qmail, don't accept patches from strangers and don't wake up too late.


RTNETLINK complaining about "No buffer space available" (also SIOCSIFADDR)

I'm writing this to save somebody a headache.

My router does IPv4 and IPv6, and has been doing so for a while. I started noticing that when connecting to my own server from my own home, it would take a while. I whittled it down to it trying to connect with IPv6 first, failing and then falling back to IPv4 which worked fine.

Now, why had my IPv6 died? To cut a long story short, here's why:

router # ip link show eth1
4: eth1:  mtu 576 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:bf:21:b2:52 brd ff:ff:ff:ff:ff:ff

Look closely. Closer. The MTU is 576. No buffer space? Well, that just means the MTU is too low for IPv6! The error mesasge couldn't have been more clear! Err...

router # ip link set eth1 mtu 1500

BAM! Now it works again.

So why was the MTU set to 576? dhcpcd set it to that value! A-ha. Apparently this also happens to dhclient, and I guess the two are intertwined in some weird open-source way. My more persistent solution?

router # unlink /lib/dhcpcd/dhcpcd-hooks/10-mtu

BAM! Now it won't break for a while.


Google App Engine disregards Accept-Encoding

$ curl -H 'Accept-Encoding: gzip' -A 'Random/5.0' http://url |file -     
/dev/stdin: ASCII text, with very long lines, with no line terminators
$ curl -H 'Accept-Encoding: gzip' -A 'Random/5.0 gzip' http://url |file -
/dev/stdin: gzip compressed data, max compression

Google, in their infinite wisdom, have decided that the Accept-Encoding header is essentially useless.

Why I don't know, but if you deploy with Google App Engine and rightfully expect their boundary proxies to compress your data for you, then you have to make sure your clients's User-Agent is either whitelisted or contains the magic four bytes gzip. (So if your client happens to be named "tagzipper" for example, Google will gzip for you.)

In their defense this is documented behavior, but it doesn't make it any less quirky. As it turns out, we hit the billed quota we set because our clients aren't including "gzip" in their UA strings... D'oh.

Might be good to know!


Graphviz's dot: PNG not recognized

This is a post not only to relieve some frustration, but also to help some other poor fellow with this problem as I think it will inevitably crop up.

I just wanted acyclic graphs in my Sphinx documents. That was it.

So, I installed Graphviz with Brew, expecting it to work... But no:

~ $ dot -Tpng
Format: "png" not recognized. Use one of: canon cmap cmapx cmapx_np
dot eps fig gv imap imap_np ismap plain plain-ext ps ps2 svg svgz tk vml vmlz
xdot

I installed it on my own. I varied the configuration. Everything. I searched mailing lists. Then, I found a discussion saying Graphviz cannot be relocated without running dot -c -- hmm:

~ $ sudo dot -c
Warning: Could not load "/usr/local/Cellar/graphviz/2.26.3/lib/graphviz/libgvplugin_gd.6.dylib"
 - file not found
Warning: Could not load "/usr/local/Cellar/graphviz/2.26.3/lib/graphviz/libgvplugin_gd.6.dylib"
 - file not found
~ $ ls "/usr/local/Cellar/graphviz/2.26.3/lib/graphviz/libgvplugin_gd.6.dylib"
/usr/local/Cellar/graphviz/2.26.3/lib/graphviz/libgvplugin_gd.6.dylib

What? It's right there for f---'s sake!

I even did an otool -l on the file in question, to see that the shared libraries it loaded were in the correct places, and they were!

So, my last effort, a shot in the dark: load it with ctypes, see what it says. After tearing out the hair on my skull, this:

/usr/local/Cellar/graphviz/2.26.3 $ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.CDLL('/usr/local/Cellar/graphviz/2.26.3/lib/graphviz/libgvplugin_gd.6.dylib')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ctypes/__init__.py", line 345, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/usr/local/Cellar/graphviz/2.26.3/lib/graphviz/libgvplugin_gd.6.dylib, 6): Library not loaded: /opt/local/lib/libiconv.2.dylib
  Referenced from: /usr/local/Cellar/gd/2.0.36RC1/lib/libgd.2.dylib
  Reason: Incompatible library version: libgd.2.dylib requires version 8.0.0 or later, but libiconv.2.dylib provides version 7.0.0
>>>

Why the fork didn't it just tell me that instead?

The solution of course is to rebuild GD since it links against a libiconv that not only is outdated, but has also been removed.


Working with UNIX timestamps in Python

So, like many others, I thought I knew how to work with UNIX timestamps in Python. After all, I pride myself on being pretty well-versed in my one favorite language!

However, I also like many others, started using MongoDB recently. MongoDB really forces you to think about timezones, so I argued to myself that this is the equivalent of going from str to unicode - more work, little payoff, but a lot more correct and in the end saves multiple headaches.

So MongoDB stores Epoch offsets like many others, in 64-bit unsigned integers. It doesn't store any timezones or anything (note: timestamps without definite timezones are called naïve.)

Why should you care? What's the point, you ask? Where is the culmination? Well as it turns out, there are subtleties. Very subtle subtleties. Here's how we receive messages on a site I develop:

  1. Incoming HTTP/SMSC/whatever call with message data,
  2. timezone is inferred depending on transport (let's say it's CET),
  3. either use the current UTC time, or infer from message data the timestamp and convert to UTC,
  4. store this UTC timestamp.

The important part: this works fine. There is absolutely nothing wrong with this process. What fails, of course, is the part you'd least expect to fail - our pagination, which is done with Epoch offsets:

  1. Pageful of messages is requested,
  2. find the last message in the dataset,
  3. take whatever attribute we sorted by on this particular message,
  4. convert that timestamp to an integer offset from Epoch.

How on earth could this fail? It's such a dead-simple task. Let's pseudocode it!

# Why isn't this part of the standard library?
import time
def dt2unix(dt):
    return time.mktime(dt.timetuple()) + (dt.microsecond / 10.0 ** 6)

def next_page_offset(message_set, sort="created"):
    vals = sorted(message_set, key=lambda m: m[sort], reverse=True)
    return dt2unix(vals.next())

Seems fairly straight-forward, right? Take the lowest value for the key by which the entire dataset was sorted and make use of the fact that the next page's max(m[sort]) is less than this page's min(m[sort]).

As it turns out, no - this will behave in very odd ways on some machines. Yes, some machines - which ones will become evident soon.

The error is best explained by converting the Epoch offsets into datetime objects again:

def unix2dt(offset):
    return datetime.datetime.fromtimestamp(float(offset))

We should expect unix2dt(dt2unix(dt)) == dt to hold, and it does! So what's the fuzz about? Well...

>>> print dt
1970-01-01 00:00:00+00:00
>>> unix2dt(dt2unix(dt))
datetime.datetime(1970, 1, 1, 0, 0)
>>> dt2unix(dt)
-3600.0

Uh-oh. Shouldn't this give zero..? Well, no. The answer is that datetime.datetime.fromtimestamp and time.mktime both work not with naïve timestamps, but with local time.

So, the time functions in this case are compensating for the local timezone, which is CET! Hardly something you want them to be doing, I'd argue, for something like an offset (since it will lead to the same offset occurring multiple times during DST adjustment) - but hey.

We now know what is wrong, but how do we make them stop? One way is to just use UTC as the timezone. Server owners should really do this anyway, but timezones exist and they serve a good reason.

This is perhaps the second issue with Python's standard library and timestamps: it does nowhere mention how to do it without adjustment, except in the online HTML documentation. Personally I like using pydoc as a reference...

Converting from a UNIX timestamp into a non-compensated datetime is fairly easy: just use utcfromtimestamp and there we are (not sure why it says UTC though, this could really be any offset that shouldn't be compensated.)

Converting to a UNIX timestamp without compensating is a bit less obvious, even to seasoned Pythonistas because what you're looking for is actually calendar.timegm.

So we can now rewrite our two converter functions to look like this:

def dt2unix_utc(dt):
    return timegm(dt.timetuple()) + (dt.microsecond / 10.0 ** 6)

def unix2dt_utc(offset):
    return datetime.datetime.utcfromtimestamp(offset)

Again with the *_utc thing, should probably be called *_global or something. I elaborated this into using the very excellent pytz library to set the UTC tzinfo.

So, that will be all then! The complexity of this really speaks for why Python should incorporate a set of functions for doing these things. I'd suggest making it obnoxiously clear that a compensation can be made:

  • from_epoch(offset, compensate=False)
  • to_epoch(dt, compensate=False)

If you pass a naïve datetime to these and tell them not to compensate, I think an exception should be raised. The point in compensating, I presume, is to make UNIX timestamps comparable across timezones. Ah well.


Werkzeug & Reloading

I got tired of Werkzeug's runserver being so slow to reload (it polls once every second) combined with VMware's "shared folder" thing being so slow to refresh. It sometimes took up to 3-4 seconds to reload. Clearly not doable for a guy with over nine thousand words per minute!!1

So, like any responsible sendapatch.se member, I decided to take matters into my own hands and send a patch.

First, I modified Werkzeug's serving.py so it listens to SIGHUP for reloading itself.

Then, I run this line in a shell on my Mac OS X:

while until_changed **/*.py **/*.html **/*.jst 2>/dev/null; do
  python - <<PY
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.sendto("PING\n", ("VMWARE_IMAGE", 12345))
s.close()
PY
  date
done

until_changed is a utility I wrote that uses Apple's FSEvents APIs from Python. It exits as soon as any of the given arguments is changed.

Then, on my VMware Linux image, I run this:

socat udp-recv:12345 stdout | \
(while read; do
     pkill -HUP -f 'dev.py runserver'
 done)

That made it reload very quickly. In fact so quickly that the filesystem on the Linux guest OS didn't have time to refresh before Werkzeug loaded the code again, so it started reloading twice. And still just as slow. Once for my immediate hook, once for detected filesystem changes.

So that was clearly untenable. A non-solution if you will. Scratch the SIGHUP stuff. (I still have the changes if anybody is interested.)

Without NFS, rsync or something to that extent, would have been an exercise in pointlessness. But fear not. Setting up the OS X side to do rsync is so trivial! See:

rsync -ave ssh ./ rsync://lericson@VMWARE_IMAGE:devel/src/

Slap this into a similar while loop to the ones above, and you've got yourself a simple and effective synchronized directory. (The reason I chose to use rsync's own protocol is that it has very little initial negotiation -> schnappiness.)

Next up: Werkzeug's runserver needs pyinotify support, to reload near-instantaneously as the rsync completes.

Update So I made inotify support for Werkzeug, and Ronacher is going to merge it any minute now. Totally awesome.



Ludvig

RSS 2.0