Django, Unicode and Caching

So yeah, I was trying to insert marshalled (from Python's marshal module) data into memcached via Django's cache support, and I got this error:

  230. data = cache.get(key)
File "django/core/cache/backends/memcached.py" in get
  30.  return smart_unicode(val)
File "django/utils/encoding.py" in smart_unicode
  44.  return force_unicode(s, encoding, strings_only, errors)
File "django/utils/encoding.py" in force_unicode
  92.  raise DjangoUnicodeDecodeError(s, *e.args)

In force_unicode. What the hell? I'm retrieving data from memcached, and basically Django tries to protect me from the oh-so-confusing world of encodings, and at the same time assumes nobody will never ever store binary data in a cache.


Comments
Posted by: James Bennett §

Well, this is really a Python problem :)

The issue here is that you have some binary data that you need to treat as a chunk of bytes. But Python 2.x doesn't have a data type for that, and so you end up putting it into a string with dumps() and sending the string to memcached.

Django, meanwhile, has a policy of always using Unicode strings everywhere, since mixing bytestrings and Unicode strings is the path to madness. So when Django sees a string coming back out of the cache, it wants to make sure it's a Unicode string, and doesn't have any way to know that it's really non-string-data-masquerading-as-a-string.

The solution is about ten lines of code and some careful use of cache keys. Write a subclass of the memcached cache backend, and override get(), and decide on a prefix or suffix you'll use for keys that are binary-values-hiding-in-strings. Then have get() skip the Unicode conversion when you're fetching one of those keys.

2009-09-06 @ 20:18:34
URL: http://www.b-list.org/
Posted by: Ludvig Ericson §

Re: James Bennett,

I know, but IMO a cache like memcached stores byte strings, not text strings.

Why can't Django set a flag in memcached instead, saying "this is a Unicode string", so it knows on get? This is what the popular memcached libraries do for other datatypes, such as integers and longs.

Also, a solution I once used in another context where a Unicode string was required (a JSON data format), was to simply decode the data with ISO-8859-1 (which never fails), then on the receiving side, encode it with ISO-8859-1.

2009-09-06 @ 21:39:40
URL: http://lericson.blogg.se/code/

Comment the entry:

Name: (required, possibly pseudonym)
Remember me (cookie)

E-mail: (not required, never published, solely for me to reply to you in person)

URL:

Comment:

RSS 2.0