Daily I have 5 million or so unique keywords with an impression count for each one. I want to be able to look these keywords up by certain words so for instance if I have "ipod nano 4GB" I want to be able to pull that out if I search for "ipod", "nano", or "4GB". mySQL can't seem to handle that much data for what I want, I've tried Berkeley but that seems to crash with too many rows and it's slower. Ideas?
I'm quite happy with the Xapian search engine library. Although it sounds like it might be overkill for your scenario, maybe you just want to chuck your data into a big hashtable, like perhaps memcached?
you can try free text on mssql. http://msdn.microsoft.com/en-us/library/ms177652.aspx
Example query:
SELECT TOP 10 * FROM searchtable
INNER JOIN FREETEXTTABLE(searchtable, [SEARCH_TEXT], 'query string') AS KEY_TBL
ON searchtable.SEARCH_ID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC
Josh
A Lucene index might work. Ive used it for pretty big datasets before. It's developed in java but there is also a .NET version.
Have you tried fulltext search in MySQL ? Because if you tried it with LIKE comparison, I see why it was slow :).
That workload and search pattern is trivial for PostgreSQL with its integrated full text search functionality (integrated as of 8.4 which is now in RC status. It's a contrib module prior to that.)