Partition Lucene Index by ID across multiple indexes

Go To StackoverFlow.com

0

I am trying to put together my Lucene search solution, and I'm having trouble figuring out how to start.

  • On my site, I want one search to span 5 different types of objects in my model.
  • I want my results to come back as one list, ordered by best match first, with a way to differentiate the type so I can show the data appropriately
  • Our system is split out into what we call sites. I want to index the 5 different model objects by site. Searching will always be done by site.

I'm not sure where to begin to index this system for optimal performance. I'm also not sure how best to implement the search for this setup. Any advice, articalse, and examples are greatly appreciated.

EDIT:

Since it has been said this is too broad,

Let's say I have 3 sites, Site 1, Site 2, and site 3.

Let's say I am indexing Dogs, Cats, and Hamsters. a record in each of these types is linked to a site.

So, for instance, my data might be (Type, Name, SiteId)

Dog, "Fido" 1
Cat, "Sprinkles", 2
Hamster, "Sprinkles", 2
Cat, "Mr. Pretty", 3
Cat, "Mr. Pretty 2", 3

So, when I do a search for "Mr. Pretty", I want to target a specific Site Id. If I go against site id 1, I'll get 0 results. If I search against site id 3, I'll get

Mr. Pretty
Mr. Pretty 2

And if I search for "Sprinkles" on Site 2, I will know that one result is a cat and the other result is a hamster.

What is the best way I can go about achieving this sort of search index?

2012-04-03 21:05
by Josh
This is too broad to answer here. You might want to look into Solr, or SolrCloud, or ElasticSearch, or Sensei - bmargulies 2012-04-03 21:08
Those are nice, but I cannot use them. The powers that be want me to use Lucene.net only - Josh 2012-04-03 21:34
Wouldn't you simply add a SiteID field to each document, and always make that part of your query - goalie7960 2012-04-03 22:00


2

As goalie7960 suggested, you can add a "SiteID" to each document and add a query term like siteid:3 to your query, in order to retrieve documents only from this site. You can also improve the performance of this by creating and storing a Filter for each different site, so you can apply it to the correspondent queries.

Regarding differente types in the same index, you could use the same strategy. Create a "type" field for each document with the corresponding type (maybe just an ID). Elasticsearch uses the same strategy to have different distinguishable types in the same index. Again, you can use Filters on the types to speed up queries (Elasticsearch does the same).

2012-04-04 19:36
by Felipe Hummel
Ads