I am new to PostgreSQL, especially its performance tuning aspects. Basically we have data that is accessed by querying 3 integer values: segmentSize (ranges 1...10), segmentX (ranges +/- 100,000), segmentY (ranges +/- 100,000).
A forward-looking consideration: when the data volume grows, there's a possibility to segment the data into multiple tables, one per individual segmentSize, and / or contiguous ranges of segmentX and segmentY.
The present choice: I have an architectural choice of either directly using the key (segmentSize, segmentX, segmentY) or - in order to gain performance - outside of PostgreSQL creating a synthetic key that combines segmentX, segmentY into a single integer value that becomes the key (or much less likely, all three (segmentSize, segmentX, segmentY).
The question: assuming we're not too concerned about the cost of this "combined key" derivation from segmentX, segmentY happening outside of Postgress, and given we're not specifically after space saving on the order of bytes per row of data (unless it makes a performance difference), .... will there be any measurable or meaningful performance gain from querying for a single int value of the range segmentX * segmentY, as opposed to querying for a combination of two separate int values of segmentX and segmentY ?
Many, many thanks. Please feel free to include any links that expand on the applicable data and indexing strategies to maximize SELECT / read performance.
The performance benefits of combining the two (or three) columns into a single value for the key are likely to be quite minimal. It could actually hurt performance for some usages; if these values are meaningful in other tables, the need to "navigate" through the synthetic key prevents plans from being considered which might be faster. Using a synthetic key when there is an available natural key tends to fall under the heading of "premature optimization", with all the risks associated with that -- including a high probability that it will actually make things slower.