I'm considering the use of Apache solr for indexing data in a new project. The data is made of different, independent types, which means there are for example
to index. Should I be using different indexes for each of the types or does it make more sense to use only one index? How does using many indexes affect performance? Or is there any other possibility to achieve this?
Both are legitimate approaches, but there are tradeoffs. First, how big is your dataset? If it is large enough that you may want to partition it across multiple servers, it probably makes sense to have different indexes.
Second, how important is performance - indexing it all together will likely result in worse performance, but the degree depends on how much data there is and how complex the queries can get.
Third, do you have the need to query for multiple data types in the same search? If so, indexing everything together can be a convenient way to allow this. Technically this could be achieved with separate indexes, but getting the most relevant results for the query could be a challenge (not that it isn't already)
Fourth, a single index with a single schema and configuration can simplify the life of whoever will be deploying and maintaining the system.
One other thing to consider is IDs - do the all of the different objects have a unique identifier across all types? If not, you probably will need to generate this if you want to index them together.