Row numbering hacks in postgres

I recently ran across this post on simulating row numbering in postgres (and announcing that 8.4 will finally have a function for this). I was aware of the temporary sequence solution (I believe that’s the stock answer to that question in postgres) , but the two other approaches are new to me (and rather clever). The last especially is… majestic:

SELECT (SELECT COUNT(*) FROM people 
    WHERE 
    (COALESCE(people.last_name,'') || COALESCE(people.first_name,'')) <= 
    (COALESCE(oldtable.last_name,'') 
    || COALESCE(oldtable.first_name,''))) As row_number, 
    oldtable.*
FROM (SELECT * 
    FROM people 
    ORDER BY 
    last_name, first_name) As oldtable;
&#91;/sourcecode&#93;

Conceptually, for each row (in the order we care about), this constructs a key with a well-defined ordering to it, and counts the number of rows that have keys "smaller" than that key. Slick.

I independently came up with another hacktastic approach to this problem last summer, and I'm rather fond of it:
&#91;sourcecode language='sql'&#93;
SELECT
    row_num,
    (SELECT first_name FROM people
    ORDER BY last_name, first_name
    LIMIT 1 OFFSET row_num - 1) as first_name,
    (SELECT last_name FROM people
    ORDER BY last_name, first_name
    LIMIT 1 OFFSET row_num - 1) as last_name,
FROM
    generate_series(1, (SELECT COUNT(*) FROM people)) as row_numbers(row_num)
&#91;/sourcecode&#93;

This approach is (almost certainly) much less efficient than the first, but I think I may win some style points. From Lucifer.

Essentially, this turns the query on its head: instead of getting a bunch of ordered results and trying to number them, we take a bunch of numbers, and for each one, take the nth row from the (unnumbered) result set with the help of handy LIMIT and OFFSET. Postgres allows queries in the target list (as long as they return a scalar), and since we only care about a single column and LIMIT to a single result, this is perfectly legit.

<hr>

As an aside, I lied above: the result of a subquery in the target list does not have to be scalar: it must have a single column, but it can return more than one row. Things get weird when this happens, though--see, e.g., the following:


SELECT generate_series(1,2) as series1, generate_series(1,3) as series2;

How many rows does this produce? Six. Yes, six.

Advertisements