Spawning 0.8.11 is out!

I wrote some daemonization support for the Spawning project, including writing pidfile, and redirecting the standard pipes. This was mostly for my own sake, as daemonization support is crucial in real deploy situations.

However, due to the wonders of open-source, Donovan Preston today released a new version, Spawning 0.8.11. He also mentioned possibly picking up activity on Spawning again, which would be very nice!

Also, there was an issue with the manifest template, so the OpenRC-like init scripts I wrote up for Spawning aren't in the distribution, but you can still find them in the Mercurial repository.

And here's how one could use these:

# ln -s /etc/init.d/spawn /etc/init.d/spawn.my_wsgi_app
# cp /etc/conf.d/spawn /etc/conf.d/spawn.my_wsgi_app
# "${EDITOR:-vi}" /etc/conf.d/spawn.my_wsgi_app
# /etc/init.d/spawn.my_wsgi_app start
# /etc/init.d/spawn.my_wsgi_app reload
# /etc/init.d/spawn.my_wsgi_app stop

Of course, if you're only using a single instance, you can go ahead and use /etc/conf.d/spawn.


pylibmc 0.8

It's that time again — pylibmc release time! The memcached client library all true Pythonistas must have.

This fine release brings two new amazingly useful features, one is pooling and one is the binary protocol.

The pooling is implemented in pure Python, and is described in detail in previous posts, and is immensely useful.

The other feature, the binary protocol, is all from the TangentOrg guys really, I just added an interface to it (that's what I do, so hey). Quite easily:

import pylibmc
mc = pylibmc.Client(["127.0.0.1"], binary=True)
vals = mc.get_multi(("abc", "def", "ghi"))

pylibmc 0.8.2 at PyPI

Yeah, I did it again - missed files in the manifest. But I've worked up a solution, so it won't happen again. :-)

Also, works fine with libmemcached 0.33.


Pooling with pylibmc pt. 2

In my previous post I mentioned having a pooling API in pylibmc, and then I got an idea. I've implemented it, and it's now up on Github.

Here's an example using the polymorphic API, and I think it's pretty self-explanatory:

import pylibmc
import threading

class Incrementer(threading.Thread):
    key = "my_key"
    limit = 10 ** 6  # A million

    def __init__(self, pool):
        super(Incrementer, self).__init__()
        self.pool = pool

    def increment(self):
        with self.pool.reserve() as mc:
            self.running = mc.incr(self.key) < self.limit

    def run(self):
        self.running = True
        while self.running:
            self.increment()

n_threads = 12
mc = pylibmc.Client(["127.0.0.1"])
mc.set("my_key", 1)

# This is the classical method, libmemcachedutil-style.
#pool = pylibmc.ClientPool()
#pool.fill(mc, n_threads)

# This is the more flexible style, adapted to threading.
pool = pylibmc.ThreadMappedPool(mc)

ts = [Incrementer(pool) for i in xrange(n_threads)]
map(Incrementer.start, ts)
map(Incrementer.join, ts)

print mc.get(Incrementer.key)
print pool

Duly note that the ThreadMappedPool won't be accessible unless your Python has threading enabled. But why would you be doing pooling without threads, I ask you, sillybear.


Pooling with pylibmc

I was discussing how to implement pooling for pylibmc when I realized what libmemcachedutil's pooling is - or rather, what it isn't.

It's not a magical solution for concurrently doing anything at all, it's not anything like that -- it just helps you with thread-safety.

In Python, however, we've got the global interpreter lock, the GIL. This lock must always be held by the thread that is dealing with anything Python. The Python interpreter itself isn't thread-safe, or rather, it is with the GIL.

This means that whenever Python code is running, you'll be sure to have exclusive access to all of Python's memory (unless something is misbehaving.) In turn, this means that the usecase for using libmemcachedutil in a Python library is rather slim.

Let's have a look at some code for doing the equivalent in pure Python. This is a Werkzeug-based WSGI application which would be run in multiple threads, concurrently:

# Configuration
n_threads = 12
mc_addrs = "10.0.1.1", "10.0.1.2", "10.0.1.3"
mc_pool_size = n_threads

# Application
import pylibmc
from contextlib import contextmanager
from pprint import pformat
from werkzeug import Request, Response
from werkzeug.exceptions import NotFound

class ClientPool(list):
    @contextmanager
    def reserve(self):
        mc = self.pop()
        try:
            yield mc
        finally:
            self.append(mc)

mc = pylibmc.Client(mc_addrs)
mc_pool = ClientPool(mc.clone() for i in xrange(mc_pool_size))

@Request.application
def my_app(request):
    with mc_pool.reserve() as mc:
        key = request.path[1:].encode("ascii")
        val = mc.get(key)
        if not val:
            return NotFound(key)
        return Response(pformat(val))

It's fully-functional example of how one could implement pooling with pylibmc, and very much so in the same way that people do with libmemcachedutil. To start it, you could use Spawning like so: spawn -s 1 -t 12 my_wsgi_app.my_app

I don't know if I think the above methodology is the best one though, another possibility is to have a dict with thread names as keys and client objects for values, then, each thread would look up its own client object in the dict on each request, and if none exists, it clones a master just like the pooling thing above.

It'd be neat if there was a generic Python API for doing any variant of pooling, per-thread or the list-based version, and then you'd be able to switch between them seamlessly. Hm.


pylibmc 0.7.4

I found a critical bug in the behaviors manipulation of pylibmc, so I fixed that and here's the new release.

pylibmc 0.7.4 at PyPI

If you'd like to strike up a conversation with a handsome fellow like me, join #sendapatch on freenode.

Notably, this release does compile with libmemcached 0.32. However, due to a function signature change, you'll get a warning regarding types -- this is completely harmless as it's unsigned int to size_t, which is the exact same size, so there should be no issues there. I'll change the call signature in the next version, kept the old because I think most people will be using the older declaration.


Programmer's Day

I've never heard of it before, but according to some blog entry, sept. 13th is "programmer's day". This because it's the 256th day of the year (and so it's on sept. 12th on leap years).

I'm thinking programmers around the world should recognize this day, mostly because it's mildly funny, and also because Russia did it.


DISTINCT ON with Django' ORM

I needed to use PostgreSQL's DISTINCT ON feature to avoid sorting all fields of a model, but still keep it distinct, so I wrote this um, clever way of doing it. It turned out that I couldn't use it anyway, so sharing it with the world as it shows a powerful way to "retouch" Django's generated SQL. But be warned, you'd best know what you're doing!

class DistinctOnQuery(Query):
    def as_sql(self, *args, **kwds):
        sql, params = super(DistinctOnQuery, self).as_sql(*args, **kwds)
        if self.distinct:
            sql_post = sql[16:]  # "SELECT DISTINCT "
        else:
            sql_post = sql[7:]  # "SELECT "
        distinct_sql = ", ".join(self.distinct_on)
        if distinct_sql:
            sql = "SELECT DISTINCT ON (" + distinct_sql + ") " + sql_post
        return sql, params

    def clone(self, *args, **kwds):
        obj = super(DistinctOnQuery, self).clone(*args, **kwds)
        obj.distinct_on = tuple(self.distinct_on)
        return obj

def distinct_on(qs, *exprs):
    """Set a PostgreSQL "DISTINCT ON" expression list on *qs*."""
    # Commence the hack of the year: generate SQL using Django, override the
    # query object, and have it hack up the string.
    qs = qs.all()
    qs.query = qs.query.clone(DistinctOnQuery)
    qs.query.distinct_on = exprs
    return qs

It ignores any previous .distinct() you might or might not have set, should probably raise an exception in the case where one is set, since you can't have both.

Usage is simple, qs = distinct_on(qs, "id").

I should also note that PostgreSQL can only use this feature on fields on which you sort -- the reason for this is sort of complicated, but in short, PostgreSQL requires that the dataset be sorted such that duplicate rows are adjacent, and if you're differentiating rows on something you're not sorting by, you're sure to have "duplicates".


pylibmc 0.7

So, the seventh minor version of my Python wrapper for libmemcached client has been released. The joy!

amix.dk was nice enough to take the time to both debug a long-withstanding bug and make speed comparisons to other libraries! Isn't that nice?

Update: I released a point one, because the manifest was old and the release contained code not suited for everyone (TCP_NODELAY set to 1).

Update2: I released a dot two, because the test runner wouldn't really test the correct module.

Update3: I released a spot three, because the C code broke on x86_64.

pylibmc 0.7.3 at PyPI

Also don't forget to stop by in #sendapatch on freenode!


Django, Unicode and Caching

So yeah, I was trying to insert marshalled (from Python's marshal module) data into memcached via Django's cache support, and I got this error:

  230. data = cache.get(key)
File "django/core/cache/backends/memcached.py" in get
  30.  return smart_unicode(val)
File "django/utils/encoding.py" in smart_unicode
  44.  return force_unicode(s, encoding, strings_only, errors)
File "django/utils/encoding.py" in force_unicode
  92.  raise DjangoUnicodeDecodeError(s, *e.args)

In force_unicode. What the hell? I'm retrieving data from memcached, and basically Django tries to protect me from the oh-so-confusing world of encodings, and at the same time assumes nobody will never ever store binary data in a cache.


RSS 2.0