Using solr for indexing different types of data

Go To


I'm considering the use of Apache solr for indexing data in a new project. The data is made of different, independent types, which means there are for example

  • botanicals
  • animals
  • cars
  • computers

to index. Should I be using different indexes for each of the types or does it make more sense to use only one index? How does using many indexes affect performance? Or is there any other possibility to achieve this?


2009-06-16 07:45
by Markus Lux


Both are legitimate approaches, but there are tradeoffs. First, how big is your dataset? If it is large enough that you may want to partition it across multiple servers, it probably makes sense to have different indexes.

Second, how important is performance - indexing it all together will likely result in worse performance, but the degree depends on how much data there is and how complex the queries can get.

Third, do you have the need to query for multiple data types in the same search? If so, indexing everything together can be a convenient way to allow this. Technically this could be achieved with separate indexes, but getting the most relevant results for the query could be a challenge (not that it isn't already)

Fourth, a single index with a single schema and configuration can simplify the life of whoever will be deploying and maintaining the system.

One other thing to consider is IDs - do the all of the different objects have a unique identifier across all types? If not, you probably will need to generate this if you want to index them together.

2009-06-16 11:42
by KenE
Thanks for your answer. I guess, I really have to stick with multiple indexes since the generation of unique identifiers in one index would be a mess in my case. I played around with solr index distribution and using shards, but they apparently were made for speeding up queries on huge datasets. I think five or even more cores isn't the way of use it is supposed to be. So my current thoughts are going towards just using Lucene without solr - Markus Lux 2009-06-16 15:05
I have a question. We have closer to 10 apps(approx 10000 rows of data per app with 10 columns; one or two columns will be big txt fields) and we also want to index all of our documents from shared drives, this may be like 5000 word/pdf docs). We want to create a global search where you can search for anything you want and the results can be categorized by facets(apps) or modified date range filter etc. We will also use this search in each of these individual applications where user can search by txt and other fields like modified date, modified user etc. Which of the two approach is better - Yogesh Jindal 2011-12-13 22:07
From the research I have done it looks like people have a lot more than 10 cores and they are managing them (I don't know how well). Here is a linkYogesh Jindal 2011-12-15 17:48