PostgreSQL single vs. multi-column indexing for optimal SELECT performance - 【StackMirror】|sql|performance|postgresql|indexing

I am new to PostgreSQL, especially its performance tuning aspects. Basically we have data that is accessed by querying 3 integer values: segmentSize (ranges 1...10), segmentX (ranges +/- 100,000), segmentY (ranges +/- 100,000).

A forward-looking consideration: when the data volume grows, there's a possibility to segment the data into multiple tables, one per individual segmentSize, and / or contiguous ranges of segmentX and segmentY.

The present choice: I have an architectural choice of either directly using the key (segmentSize, segmentX, segmentY) or - in order to gain performance - outside of PostgreSQL creating a synthetic key that combines segmentX, segmentY into a single integer value that becomes the key (or much less likely, all three (segmentSize, segmentX, segmentY).

The question: assuming we're not too concerned about the cost of this "combined key" derivation from segmentX, segmentY happening outside of Postgress, and given we're not specifically after space saving on the order of bytes per row of data (unless it makes a performance difference), .... will there be any measurable or meaningful performance gain from querying for a single int value of the range segmentX * segmentY, as opposed to querying for a combination of two separate int values of segmentX and segmentY ?

Many, many thanks. Please feel free to include any links that expand on the applicable data and indexing strategies to maximize SELECT / read performance.

2012-04-04 17:53
by SashaK

Use EXPLAIN and EXPLAIN ANALYZE on your queries to see and measure what is going on and what works best - Frank Heikens 2012-04-04 18:00

Thank you, Frank - SashaK 2012-04-04 18:22

First: what is the natural primary key ? Second: what is your typical usage: a range query on either X or Y or on {X,Y} or on {Y,X} ? Third: is the set of keyfields in the query different from those of the "natural" PK? Is it different from the set of keyfields in the insert-operations? fourth: from the set of three keyfields: is any possible pair a candidate key? Fifth: please add a discription of the meaning of the keyfiels. "segment_id" is not very informative to most of us - wildplasser 2012-04-04 18:31

@wildplasser great insight - thank you. Basically we have a grid akin to city blocks in Manhattan NYC where avenues (1st to 11th) and streets (1st to say 160th) are numbered. So you can refer to some restaurant as being "near the corner of 7th Avenue and 34th Street", as people do in real life. Alternatively you can follow the Tokyo scheme where each city block is given a number, so you can refer to some restaurant as being in "block 926". In the former case, we'd have a combined index/key of (7,34) and in the latter case, a single key 926 (hence from a larger set of values) - SashaK 2012-04-05 04:01

The performance benefits of combining the two (or three) columns into a single value for the key are likely to be quite minimal. It could actually hurt performance for some usages; if these values are meaningful in other tables, the need to "navigate" through the synthetic key prevents plans from being considered which might be faster. Using a synthetic key when there is an available natural key tends to fall under the heading of "premature optimization", with all the risks associated with that -- including a high probability that it will actually make things slower.

2012-04-04 18:24
by kgrittn