Fastest way to lookup keywords. Any language, any system - 【StackMirror】|database|lookup|keyword|performance

Daily I have 5 million or so unique keywords with an impression count for each one. I want to be able to look these keywords up by certain words so for instance if I have "ipod nano 4GB" I want to be able to pull that out if I search for "ipod", "nano", or "4GB". mySQL can't seem to handle that much data for what I want, I've tried Berkeley but that seems to crash with too many rows and it's slower. Ideas?

2009-06-16 19:58
by Ryan Detzel

I'm quite happy with the Xapian search engine library. Although it sounds like it might be overkill for your scenario, maybe you just want to chuck your data into a big hashtable, like perhaps memcached?

2009-06-16 20:18
by asjo

you can try free text on mssql. http://msdn.microsoft.com/en-us/library/ms177652.aspx

Example query:

SELECT TOP 10 * FROM searchtable 
INNER JOIN FREETEXTTABLE(searchtable, [SEARCH_TEXT], 'query string') AS KEY_TBL
ON searchtable.SEARCH_ID = KEY_TBL.[KEY] 
ORDER BY KEY_TBL.RANK DESC

Josh

2009-06-16 20:02
by Josh

A Lucene index might work. Ive used it for pretty big datasets before. It's developed in java but there is also a .NET version.

2009-06-16 20:04
by Jack Ryan

I guesst that currently best way is using Lucene. My company is using for large databases and simultaneous request (aprox. 300req/s) - Zanoni 2009-06-16 20:09

Have you tried fulltext search in MySQL ? Because if you tried it with LIKE comparison, I see why it was slow :).

2009-06-16 20:23
by instanceof me

That workload and search pattern is trivial for PostgreSQL with its integrated full text search functionality (integrated as of 8.4 which is now in RC status. It's a contrib module prior to that.)

2009-06-18 14:06
by NoName