Reconstructing a module from a Python bytecode cache file

So, sometimes you lose your configuration files and they happen to have a .pyc version that still exists.

>>> import marshal
>>> f = open("imgapi/conf.pyc")
>>> f.seek(8)
>>> o = marshal.load(f)
>>> o
<code object <module> at 0x7f84ff7f8c60, file ".../imgapi/conf.py", line 1>
>>> dis.dis(o)
  1           0 LOAD_CONST               0 (-1)
              3 LOAD_CONST               1 (('*',))
              6 IMPORT_NAME              0 (imgapi.conf_defaults)
              9 IMPORT_STAR

  2          10 LOAD_CONST               2 ('gevent')
             13 STORE_NAME               1 (url_fetcher)

  3          16 LOAD_CONST               6 (('.yo-toad.se:8000', '.yo-dev.se'))
             19 STORE_NAME               2 (rewrite_host)
             22 LOAD_CONST               5 (None)
             25 RETURN_VALUE
>>>

And from this, one can see what this (rather short) configuration file used to say:

from imgapi.conf_defaults import *

url_fetcher = "gevent"
rewrite_host = (".yo-toad.se:8000", ".yo-dev.se")

Now for some delicious food.


simples3 1.0-alpha

In keeping with my new year's resolution to release one-dot-ohs of my two mature projects, I now give you the 1.0 alpha release of simples3.

In case you missed it, simples3 is a simple S3 interface that has no dependencies other than Python, and has a daughter project named gaes3, which makes it work on Google App Engine (although some people have noted that simples3 should work on its own just fine.)

Without further ado:

simples3 1.0-alpha


Return Statements and Parentheses

This was originally posted on my old blog on the 24th of February, 2008

I've been trying to come up with an argument as to why you shouldn't be using parentheses around your return statements in Python for a few minutes, and I got it. Consider:

>>> def myfunc():
...     return(1)
... 
>>> myfunc()
1

Above is the classic example of people treating the return statement as a function. This could go bad, consider:

>>> def myfunc():
...     return()
... 
>>> myfunc()
()

An empty parenthesis pair is the literal for empty tuples in Python, which I find is a good reason, beyond aesthetics.


GAE and simples3

One of the projects I work on, aptly named simples3, doesn't work on Google's App Engine architecture.

This is, or rather used to be, somewhat of a problem for me, as I'm a heavy-duty S3 user and also one of those guys who just want to avoid this whole issue of deployment altogether. Then for obvious reasons, Google App Engine (GAE henceforth) has been very interesting to me.

However, since Google decided that Python's standard library URL fetching mechanisms were insecure for their purposes, they designed their own one. What this further means is that anything that uses urllib2, urllib or httplib will need to be rewritten.

The above is why I wrote the piece of code I want you to know about, which is also fairly aptly named gaes3. What it does is sit on top of simples3, and overrides certain methods so that the simples3 code actually uses Google's URL fetching APIs.

Check it out at
gaes3 on GitHub, or opt for the easy way and go easy_install gaes3.


Debugging Django: Using Flickzeug

It came to my attention that some people haven't realized you can use the excellent Pocoo creation known as Flickzeug with a Django project.

Flickzeug lets you find memory leaks, generate profiling data, look at that same data and is pretty awesome in general.

Step one is to make a new manage.py. Obviously, you won't be able to use that same name in a Django project, as Django already uses it. Choose on of your liking. I myself don't use Django's manage.py and so replace it.

from werkzeug import script

if __name__ == "__main__":
    script.run()

The above is a bare-bones skeleton of making your own management script using the Werkzeug helper system.

At this point, you'll need to run a Django project as a mere mortal WSGI application. Fortunately, this is relatively simple.

from werkzeug import script
from django.core.handlers.wsgi import WSGIHandler

application = WSGIHandler()

if __name__ == "__main__":
    script.run()

An issue with this, however, is that Django requires you to set up your environment before importing anything non-trivial from within Django. This is one of the... "design choices" that convinced me of the backwardness of Django, but I digress.

So, to take this into account, we have to set the Django settings module environment variable first, and also make sure our project is importable.

import sys
import os
from werkzeug import script
from django.core.handlers.wsgi import WSGIHandler

sys.path.insert(0, os.path.dirname(__file__))
os.environ["DJANGO_SETTINGS_MODULE"] = "foo.settings"

application = WSGIHandler()

if __name__ == "__main__":
    script.run()

There, now that we have our application, we can make a simple Werkzeug runserver — which, by the way, is much more light-weight and nicer to work with than Django's.

To do this, we define a variable called action_runserver, as Werkzeug's script module looks for action_* variable names.

import sys
import os
from werkzeug import script
from django.core.handlers.wsgi import WSGIHandler

sys.path.insert(0, os.path.dirname(__file__))
os.environ["DJANGO_SETTINGS_MODULE"] = "foo.settings"

application = WSGIHandler()

action_runserver = script.make_runserver(lambda: application,
    use_reloader=True, use_debugger=True)

if __name__ == "__main__":
    script.run()

There, we've got our first management command. Try it out! python new-manage.py runserver should do the trick. Pass -h for help.

On to the Flickzeug part then. Flickzeug works as a WSGI middleware, that is to say, you wrap a WSGI application in Flickzeug, and the application gains the functionality desired.

import sys
import os
from werkzeug import script
from django.core.handlers.wsgi import WSGIHandler

sys.path.insert(0, os.path.dirname(__file__))
os.environ["DJANGO_SETTINGS_MODULE"] = "foo.settings"

application = WSGIHandler()

def make_leakfinder():
    from flickzeug import LeakFinder
    filter = lambda v: (not v.startswith("foo.")
                    and not v.startswith("bar."))
    return LeakFinder(application, filter=filter)

action_runserver = script.make_runserver(lambda: application,
    use_reloader=True, use_debugger=True)
action_runserver_leakfinder = script.make_runserver(make_leakfinder,
    use_reloader=True, use_debugger=True)

if __name__ == "__main__":
    script.run()

So what we did was pass a different callable to make_runserver, which returns a Flickzeug-wrapped version of our application instead.

The Flickzeug LeakFinder optionally takes a filter argument, which I find very useful indeed to filter out the noise made by Django and any other third-party component you mightn't necessarily be interested in.

The same approach can be used for any other part of Flickzeug, or indeed any other WSGI middleware you might want to use.

Also, now that you have the skeleton for making a Werkzeug-based management script, you should probably look into the werkzeug.script module: http://werkzeug.pocoo.org/documentation/0.5.1/script.html


MacBook Multitouch - in Python

I's gone an' outdone myself! Since the last post got so much attention, I realized more people were interested in reading the touch input than I thought.

So, I wondered how good ctypes is at using private frameworks from Apple. Turns out, pretty darn good.

This is an implementation of a simple multitouch listener which just prints the information:

@MTContactCallbackFunction
def my_callback(device, data_ptr, n_fingers, timestamp, frame):
    print device, timestamp, frame
    for i in xrange(n_fingers):
        data = data_ptr[i]
        d = "x=%.2f, y=%.2f" % (data.normalized.position.x * 100,
                                data.normalized.position.y * 100)
        print "%d: %s" % (i, d)
    return 0

devices = MultitouchSupport.MTDeviceCreateList()
num_devices = CFArrayGetCount(devices)
print "num_devices =", num_devices
for i in xrange(num_devices):
    device = CFArrayGetValueAtIndex(devices, i)
    print "device #%d: %016x" % (i, device)
    MTRegisterContactFrameCallback(device, my_callback)
    MTDeviceStart(device, 0)

# Why sleep instead of join? Ask David Beazley.
while threading.active_count():
    time.sleep(0.125)

See this paste for the whole body, including the ctypes definitions and mappings.

I also wondered how good pygame really is, so I decided to try it out. See this paste for the pygame-based thing. It uses only the SDL bindings, no SDL_* extras. To quit it (it's fullscreen), hold one finger still and swipe four fingers quickly downwards.


Spawning 0.8.11 is out!

I wrote some daemonization support for the Spawning project, including writing pidfile, and redirecting the standard pipes. This was mostly for my own sake, as daemonization support is crucial in real deploy situations.

However, due to the wonders of open-source, Donovan Preston today released a new version, Spawning 0.8.11. He also mentioned possibly picking up activity on Spawning again, which would be very nice!

Also, there was an issue with the manifest template, so the OpenRC-like init scripts I wrote up for Spawning aren't in the distribution, but you can still find them in the Mercurial repository.

And here's how one could use these:

# ln -s /etc/init.d/spawn /etc/init.d/spawn.my_wsgi_app
# cp /etc/conf.d/spawn /etc/conf.d/spawn.my_wsgi_app
# "${EDITOR:-vi}" /etc/conf.d/spawn.my_wsgi_app
# /etc/init.d/spawn.my_wsgi_app start
# /etc/init.d/spawn.my_wsgi_app reload
# /etc/init.d/spawn.my_wsgi_app stop

Of course, if you're only using a single instance, you can go ahead and use /etc/conf.d/spawn.


DISTINCT ON with Django' ORM

I needed to use PostgreSQL's DISTINCT ON feature to avoid sorting all fields of a model, but still keep it distinct, so I wrote this um, clever way of doing it. It turned out that I couldn't use it anyway, so sharing it with the world as it shows a powerful way to "retouch" Django's generated SQL. But be warned, you'd best know what you're doing!

class DistinctOnQuery(Query):
    def as_sql(self, *args, **kwds):
        sql, params = super(DistinctOnQuery, self).as_sql(*args, **kwds)
        if self.distinct:
            sql_post = sql[16:]  # "SELECT DISTINCT "
        else:
            sql_post = sql[7:]  # "SELECT "
        distinct_sql = ", ".join(self.distinct_on)
        if distinct_sql:
            sql = "SELECT DISTINCT ON (" + distinct_sql + ") " + sql_post
        return sql, params

    def clone(self, *args, **kwds):
        obj = super(DistinctOnQuery, self).clone(*args, **kwds)
        obj.distinct_on = tuple(self.distinct_on)
        return obj

def distinct_on(qs, *exprs):
    """Set a PostgreSQL "DISTINCT ON" expression list on *qs*."""
    # Commence the hack of the year: generate SQL using Django, override the
    # query object, and have it hack up the string.
    qs = qs.all()
    qs.query = qs.query.clone(DistinctOnQuery)
    qs.query.distinct_on = exprs
    return qs

It ignores any previous .distinct() you might or might not have set, should probably raise an exception in the case where one is set, since you can't have both.

Usage is simple, qs = distinct_on(qs, "id").

I should also note that PostgreSQL can only use this feature on fields on which you sort -- the reason for this is sort of complicated, but in short, PostgreSQL requires that the dataset be sorted such that duplicate rows are adjacent, and if you're differentiating rows on something you're not sorting by, you're sure to have "duplicates".


Django, Unicode and Caching

So yeah, I was trying to insert marshalled (from Python's marshal module) data into memcached via Django's cache support, and I got this error:

  230. data = cache.get(key)
File "django/core/cache/backends/memcached.py" in get
  30.  return smart_unicode(val)
File "django/utils/encoding.py" in smart_unicode
  44.  return force_unicode(s, encoding, strings_only, errors)
File "django/utils/encoding.py" in force_unicode
  92.  raise DjangoUnicodeDecodeError(s, *e.args)

In force_unicode. What the hell? I'm retrieving data from memcached, and basically Django tries to protect me from the oh-so-confusing world of encodings, and at the same time assumes nobody will never ever store binary data in a cache.


Thoughts on beanstalkd

Yes, I know beanstalkd is old news. But I'm not telling you it's new either.

I've had a few usecases for it, but often I find that beanstalkd in itself is way too limited to accomplish my goals.

One usecase I've had is pooling up a number of jobs in a tube, and then doing them all in one go, because you want to acquire resources sparingly.

The naïve way to solve that would be to simply poll the tube's stats and check how many jobs are in the queue. What I'm suggesting is something like the reserve protocol command, but for tube stats. Something like wait-tube-stats, which would return the stats in YAML when they're updated.

The command should be able to use multiple tubes, like so:

wait-tube-stats <version> [tube1 [tube2 ...]]\r\n

The version argument would be a version returned in the stats. This is best explained with pseudo-code insipred by beanstalkc.py:

stats = None
while run:
    # This might as well have no positional arguments at all.
    stats = bean.wait_tube_stats("footube", "bartube", previous=stats)
    print "Tube %(name)s has %(current-jobs-ready)d jobs" % stats

The previous keyword argument is annotated with a version key, which is incremented each time the tube changes (and could quite possibly wrap around a 32-bit integer storage), so that beanstalkd can return immediately if the client has become out of sync since the last call.

Further, it'd probably be interesting to be able to skip tube specification entirely, so as to monitor all watched tubes (a common case, I would presume.)

Then, finally, the solution to batching N jobs would be:

stats = None
while run:
    stats = bean.wait_tube_stats(previous=stats)
    if stats["current-jobs-ready"] >= num_jobs_batch:
        acquire_resources()
        try:
            while True:
                job = bean.reserve(timeout=0)
                execute_job(job)
        finally:
            release_resources()

Right now, you basically have to reserve a job, get the stats, and release the job again if there weren't enough jobs in the tube, which is a little ugly.

Of course, the client library could fake this behavior, but I feel a solution on the backend would be better.

Also this is the first entry I publish solely using a hacked up rst2html.py with Pygments.


A Saner Way to Look at Program Code

I've seen a lot of people discuss how programming should be done, and the signs of bad code.

For example, some bloke says that you're a bad programmer if you've done this and done that.

I feel people forget one key aspect of code: it's not ever in a frozen state. It, much like natural language, never reaches a state where it is "correct" or "finished."

The previously mentioned post talks about "bulldozer code," which "gives the appearance of refactoring by breaking out chunks into subroutines, but that are impossible to reuse in another context." What I'd like to call that is phase one in a generic abstraction process.

Sure, the programmer who started breaking it up could've done a better job – but if the level of modularity chosen by the programmer at the time was apparently enough, and that's a very, very important quality in programmers – saying "no."

I think it's widely accepted that you could abstract and layer your code 'til kingdom come (Cf. Java :-), but that's not what "industrial" programming" is about at all. Oh well, I digress.

At any rate, what I feel is oftentimes not pointed out or even reflected over is the fact that code is the result of progressive evolution. You're always looking at a specific snapshot of some functionality, not plain code.

I think what makes a good programmer is awareness of this progressive nature of code - the ability to recognize situations where you can be smart, and moreso the situations where you can cut corners.

This, coincidentally, is why I dislike working on other people's code. Finding the "correct" way to evolve a certain piece of code requires that the previous programmer was good enough to cut the job up appropriately, and make the code only as flexible as is realistic.

Sadly I think it's uncommon for programmers to see these things. People generally go with the "I'll do what I'm tasked with and deal with the future when it comes"-take on programming.

It's all a very delicate balance really, and especially one that isn't visible to the product manager. You only ever get a sense of somebody's programming skills once you work with them, or on something they've written.

I've worked on code which aesthetically looks crap, but has a very fine balance in these matters, and I find I value that a lot higher.


Introducing iptraq

My server has a way of going down every now and then due to my ISP being crap as hell.

The problem is that my server is at my home in Stockholm, together with the rest of my family. And I'm ~400 km away from Stockholm.

It then becomes a problem if my server changes IP address, like it always has to do when the Internet connection goes down for an extended period of time - I can't find my own darn server.

Coincidentally, I love Google App Engine for small projects, so I decided to jot down an application which tracks the latest "pings" from a machine, and more precisely its IP address.

Thus, I present to you: iptraq.appspot.com!

Being me, it doesn't really have a user interface at all. No really, it doesn't, at all.

You set it up by following this simple list of instructions:

  1. Send an HTTP POST to the root: curl -d '' iptraq.appspot.com
  2. Copy the character blurb after IPMark:
  3. Send an HTTP POST to /<your blurb>/, like: curl -d '' iptraq.appspot.com/agZpcHRyYXFyDgsSBklQTWFyaxi3mQIM/
  4. Send an HTTP GET to the same place to get a list of updates within the last 24 hours. I suggest you add a bookmark for it.

And that's it. All you need is for your server(s) to ping the URL every half hour or so. I do that using cron, like this:

0,30 * * * * /home/toxik/bin/update_iptraq

And the contents of ~/bin/update_iptraq:

#!/bin/bash
exec /usr/bin/curl -s -d '' 'http://iptraq.appspot.com/<my blurb>/' >/dev/null

I don't silence stderr using 2>&1 because I'd really rather want to know when it doesn't work (a well-setup system will mail you whenever cron scripts print anything to stderr.)

I also use the full path to curl because you can't rely on your environment in a cron script -- everything has to be set explicitly. (In theory.)

And that's it, now you too can track the IP address of your machines.

Update: Good news, everyone! iptraq is now on GitHub.


Doing logging wrong: python-markdown

Python Markdown 2.0.1 contains this very interesting snippet of code, observe:

def message(level, text):
    """ A wrapper method for logging debug messages. """
    logger =  logging.getLogger('MARKDOWN')
    if logger.handlers:
        # The logger is configured
        logger.log(level, text)
        if level > WARN:
            sys.exit(0)
    elif level > WARN:
        raise MarkdownException, text
    else:
        warnings.warn(text, MarkdownWarning)

This function is used to implement so-called "logging", and calls to it are spread throughout the distribution's code.

Among other things, the author(s) thought it'd be neat to log each and every text parsed, so if you're debugging - good luck reading any of your logs.

But that's not the real issue here. If you were to be clever and do the following:

import logging

class NullHandler(logging.Handler):
    def emit(self, record): pass

markdown_logger = logging.getLogger("MARKDOWN")
markdown_logger.addHandler(NullHandler())

Then the sys.exit(0) part triggers.

Add a hypothetical situation, your application is a Web application (not an uncommon use for Markdown) and you deploy your code with FastCGI and a WSGI bridge.

When Markdown fails parsing some code or otherwise errors, it'll... Exit the process. Just like that.

Complete and utter disaster. What we're seeing is application code in a library.

How to stop this from happening then? Well, um... Monkey-patch markdown.message is the only real solution to the problem.

The other solution, which at least stops you from being spammed with crap from Markdown, is to do logging.getLogger("MARKDOWN").setLevel(logging.INFO).


Released simples3 0.5

Just released simples3 0.5, which now supports copying files intra-S3 (from and to S3 buckets.)

This also enables you to update the metadata of an object without sending the contents again.

Have fun with it!

simples3 0.5 on PyPI


Updating An Object's Metadata On S3

One complaint I've seen a lot around the internet is that you can't update the metadata of an object on S3.

While this is partly true, there's a way to do it.

What you do is using the copy method, an HTTP PUT with a x-amz-copy-source header, you set the metadata with the x-amz-metadata-directive and copy to the same object!

Using simples3, you can do this with:

bucket.copy(bucket.name + "/obj.txt", "obj.txt", metadata={"new": "metadata"},
            acl="public-read", mimetype="text/plain")

One thing to note here is that S3 doesn't copy anything but the object data! That includes content type, ACL, and previous metadata.

You might like to make an info request first, to get these values so you can merge with new values. But note that it won't be atomic!


The Monty Hall Problem

Jeff Atwood mentioned the Monty Hall problem, and that some Eduardo guy had not been convinced about it until a computer program simulated the theory over and over.


I had the same issue, being unconvinced about it, and so I wrote a similar program in Python. (Yeah, I'm a productive guy at work.)


import random

pot_goat = "GOAT"
pot_car = "CAR"
pots = pot_goat, pot_car

door_a = "DOOR A"
door_b = "DOOR B"
door_c = "DOOR C"
doors = door_a, door_b, door_c

player_nonswitch = "PLAYER NS"
player_switch = "PLAYER SW"
player_repick = "PLAYER RP"
players = player_nonswitch, player_switch, player_repick

def make_round():
    door_pots = dict((door, pot_goat) for door in doors)
    door_pots[random.choice(doors)] = pot_car
    return door_pots

stats = dict((player, dict((pot, 0) for pot in pots)) for player in players)

num_rounds = 10 ** 7
for round_num in xrange(num_rounds):
    if (round_num % (10 ** 5)) == 0:
        print "Round #%d (%.2f%%)" % (round_num, float(round_num) / num_rounds * 100)
    door_pots = make_round()
    for player in players:
        # First the contestant gets to choose from all three doors.
        choice = random.choice(doors)
        # Then, the host reveals a door which has a goat behind it which the
        # contestant didn't choose, asking if the contestant wants to remake
        # the door choice.
        goats = [d for (d, p) in door_pots.iteritems()
                 if p == pot_goat and d != choice]
        reveal_door = random.choice(goats)
        remaining_doors = [d for d in doors if d != reveal_door]
        if player == player_repick:
            # Contestant just picks randomly, blindfolded.
            choice = random.choice(remaining_doors)
        elif player == player_switch:
            # Contestant does want to switch, and so chooses again from the
            # remaining doors.
            choice = [d for d in remaining_doors if d != choice][0]
        stats[player][door_pots[choice]] += 1

for player, player_pots in stats.iteritems():
    goats, cars = player_pots[pot_goat], player_pots[pot_car]
    cars_amount = float(cars) / (goats + cars) * 100
    print "%s: %d goats, %d cars - %.2f%% cars" % (player, goats, cars, cars_amount)

The results?


PLAYER NS: 6665086 goats, 3334914 cars - 33.35% cars
PLAYER SW: 3333893 goats, 6666107 cars - 66.66% cars
PLAYER RP: 5001588 goats, 4998412 cars - 49.98% cars

PLAYER NS is the non-switching player, PLAYER SW switches, and PLAYER RP just randomly chooses again.


I can't explain this at all.


simples3 0.3 released!

Just released simples3 version 0.3.


This release adds support for generating those nice only-valid-until-some-time URLs (expiring URLs.) Thanks to Pavel Repin for that.


Otherwise mostly bugfixes and cosmetic changes (non-API breaking.)


simples3 0.3-r2. on PyPI or just sudo easy_install -U simples3 as usual.


Base64 encoding in URLs

Some time ago I was tasked with making integer IDs in URLs be base64 encoded, so they look less sequential and are more compact (for larger ID numbers, obviously.)


Python has no utility to do this in the standard library, but obviously it does have the tools for it.


First off, you don't want to be encoding the integer's decimal representation. That's not going to save you a lot of bytes at all.


What you do want to encode is the representation with the highest base possible -- and that's, shockingly, the binary representation of integers. I'm not going to go into details on how these work (because honestly, I expect my readers to know this - if not, Google ASAP.)


So that gives us the first key part of it: the struct module. It's a great tool when working with binary data in Python, even if aren't really decoding C structs in Python.


The format I use is "<I", which means "a single 32-bit integer in little endian". In retrospect I should probably have used big endian, as it's the de facto endianess used in inter-machine communication - but seriously, it matters very little which you choose in this case. Be sure to specify one though, because the default is actually whatever your architecture uses.


But, as you might know, 32-bit integers are... Well, 32-bit integers - always four bytes in length. Even when the value stored only uses the LSB (Least Significant Bit). Thus, for values below 256, only a single byte is needed; for below 65536, two bytes; for below 16777216, three bytes.


To remedy this, I get the binary representation as a Python str, and strip zero-bytes from right.


Next up is encoding with base64 - but with URL-safe characters. The convention here seems to be to replace the unsafe + and / by - and _. The base64 module actually has two functions for this, urlsafe_b64{en,de}code.


A funny quirk with base64 is that the last 6-bit value might sometimes need to be padded. This is done with equals signs, and they're actually unnecessary. You can infer how the last value should be padded by the length of the string, padding it with = until it's a multiple of four. I do this to save bytes.


import struct
from base64 import b64encode, b64decode

"""Base64 adapted for URLs.

>>> urlb64.encode("Hello kitty")
'SGVsbG8ga2l0dHk'
>>> urlb64.encode_int(1337)
'OQU'
>>> urlb64.decode('SGVsbG8ga2l0dHk')
'Hello kitty'
>>> urlb64.decode_int('OQU')
1337
"""

import struct
from base64 import b64encode, b64decode

def encode(value):
    return b64encode(value, "-_").rstrip("=")

def decode(value):
    return b64decode(value + '=' * (4 - len(value) % 4), "-_")

def encode_int(value):
    return encode(struct.pack("<I", value).rstrip("\0"))

def decode_int(value):
    value = decode(value)
    value += "\0" * (4 - len(value))
    return struct.unpack("<I", value)[0]

Default settings in Django

A problem with Django is that you can't have default settings unless you're a core Django developer.

This would be nice to have per-app or even per-funky-module, so I decided I'd implement something to support it!

Here's how I did it: create a base class, with a metaclass. When this base class is subclassed, take the attributes and loop through them. If the attribute is all uppercase and isn't set in the main configuration (yet), copy it there.

This ended up looking like:
class GlobalDefaultSettingsBase(type):
    def __new__(cls, name, bases, attrs):
        from django.conf import settings
        new = super(GlobalDefaultSettingsBase, cls).__new__
        new_cls = new(cls, name, bases, attrs)
        if name == "GlobalDefaultSettings":
            return new_cls
        for (attr, value) in attrs.iteritems():
            if attr == attr.upper() and not hasattr(settings, attr):
                print "SETTING", attr, "TO", repr(value)
                print settings
                print settings._target
                setattr(settings, attr, value)
        return new_cls

class GlobalDefaultSettings(object):
    """A base class for adding default global settings to the Django settings
    object.

    All attributes which are in the upper case will be merged into the existing
    configuration, unless they're already set.

    This is a simple solution for having default settings without `getattr`.
    All you have to insure is that your subclass is constructed before the
    access of the defaults you provide -- which should be fairly easy.
    """
    __metaclass__ = GlobalDefaultSettingsBase

To use it, simply make a subclass and set some settings!

class MyDefaults(DefaultSettings):
MY_SETTING_A = "Hello world!"
MY_SETTING_B = ["Hey guys"]

And that's all you need. If either of the settings is set in the project's settings.py, that setting will effectively take precedence!

Obviously, there's no great place to put this code since it really belongs to core Django.

In fact, Django should itself be using this style - as it is now, all default settings are specified in one huge file without any good correlation.


What I'd like for logging

Python's logging package is great and all, but it's only half of the solution.

I often find myself in a situation where I'd like to debug live processes, and that's pretty hard when they're FastCGI scripts or other daemons.

Sure, you could have a log file which you spam with information on a large scale, but that's not very helpful.

Sure, you could have N log files with individual output, but that's not very manageable.

What I'd like to have is a good log server which receives these masses of information, and couples every log line with arbitrary key-value pairs.

Then, you'd tap into the stream of information, specifying expressions that must hold true. For example, you could specify that "pid must be one of 1324, 1325, or 1326" and that "level must be above 10".

That way, you'd be able to narrow down one process, and only the output you'd like to have. You'd also avoid having to store all data -- if nothing taps the log lines received, they're simply discarded (or stored to disk based on arbitrary conditions).
$ logtap 'pid in (1234, 1235, 1236) and level > 10'
20:26:01,513    INFO  foo.bar: the fox is now red
[etc.] 

Nyare inlägg
RSS 2.0