DISTINCT ON with Django' ORM
I needed to use PostgreSQL's DISTINCT ON feature to avoid sorting all fields of a model, but still keep it distinct, so I wrote this um, clever way of doing it. It turned out that I couldn't use it anyway, so sharing it with the world as it shows a powerful way to "retouch" Django's generated SQL. But be warned, you'd best know what you're doing!
class DistinctOnQuery(Query): def as_sql(self, *args, **kwds): sql, params = super(DistinctOnQuery, self).as_sql(*args, **kwds) if self.distinct: sql_post = sql[16:] # "SELECT DISTINCT " else: sql_post = sql[7:] # "SELECT " distinct_sql = ", ".join(self.distinct_on) if distinct_sql: sql = "SELECT DISTINCT ON (" + distinct_sql + ") " + sql_post return sql, params def clone(self, *args, **kwds): obj = super(DistinctOnQuery, self).clone(*args, **kwds) obj.distinct_on = tuple(self.distinct_on) return obj def distinct_on(qs, *exprs): """Set a PostgreSQL "DISTINCT ON" expression list on *qs*.""" # Commence the hack of the year: generate SQL using Django, override the # query object, and have it hack up the string. qs = qs.all() qs.query = qs.query.clone(DistinctOnQuery) qs.query.distinct_on = exprs return qs
It ignores any previous .distinct() you might or might not have set, should probably raise an exception in the case where one is set, since you can't have both.
Usage is simple, qs = distinct_on(qs, "id").
I should also note that PostgreSQL can only use this feature on fields on which you sort -- the reason for this is sort of complicated, but in short, PostgreSQL requires that the dataset be sorted such that duplicate rows are adjacent, and if you're differentiating rows on something you're not sorting by, you're sure to have "duplicates".
from django.db.models.sql import Query
