I want to create a large table (about 45 billion rows) that is always accessed by a unique key.
Outside of the DB, the best structure to hold this is a Dictionary or a HashSet, but of course due to the size of data, it's not possible to do this outside of the database.
Does SQL Server provide a structure that's optimized for key-value access? I understand that a clustered key is very fast, but still it's an index and therefore there will be some additional disk reads associated with traversing index pages. What I would like to get from SQL Server is a "native" structure that stores data as key-value pairs and then makes it possible to access values based on keys.
In other words, my question is how to store in SQL Server 45 billion rows and efficiently access them WITHOUT having an index, clustered or non-clustered, because reading the index non-leaf pages may result in substantial IO, and since each value can be accessed by a unique key, it should be possible to have a structure where the hash of a key resolves into a physical location of the value. To get 1 value, we would need to do 1 read (unless there are hash collisions).
(an equivalent in Oracle is Hash Cluster)
Thanks for your help.
No such thing in SQL server. Your only option is an index. If you're going to be requesting all columns for a given key, you should use a clustered index. If you're only going to be requesting a subset, you should use a non-clustered index including only the columns you want like this:
create index IX_MyBigTable on MyBigTable(keyColumn) include (col1, col2, col3youneed);
This will be pretty efficient.
According to my benchmarks, the best approach is to create a hash column for the key. Details.