SOLR - Match range query only if all dates in range are matched

Go To StackoverFlow.com

1

I am using SOLR and storing an array of dates a salesperson is available to visit clients (trips can last anywhere from a day upwards, depending on the client request). For each salesperson I have a list of dates that they are available for the salesperson for a given month. There are other fields, including salesperson data, geolocation information, etc.

I am familiar with range queries but it seems that SOLRs range searches on arrays work differently than I would like - as long as any item in the array is a match then the range is a match). I would like to send SOLR a query with a range and only return a match if all dates in that range are found in the array. For example:

<arr name="available_dates">
    <date>2012-04-30T00:00:00Z</date>
    <date>2012-05-01T00:00:00Z</date>
    <date>2012-05-02T00:00:00Z</date>
</arr>

-- should match --
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]

-- should not match as 2012-04-29 is not contained in available_dates --
available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]

Is this possible or am I going about this all wrong?

2012-04-05 01:54
by JP.
this thread will probably help you - Noam 2012-04-08 12:57


0

Instead of using a range query you should use multiple clauses, one for each date.

So instead of available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]

You should use available_dates:"2012-04-29T00:00:00.000Z" AND available_dates:"2012-04-30T00:00:00Z" AND available_dates:"2012-05-01T00:00:00.000Z" AND available_dates:"2012-05-02T00:00:00.000Z"

Hope that answers your question!

2012-04-07 09:39
by cberner
This isn't a tractable solution for a large date range. Given all that's being checked is the evaluation of a range query for each element in the array, structuring the query in that manner would work more efficiently. Good thought, though - MrGomez 2012-04-09 00:22


0

You have the right idea, but your initial query is a search instead of a match. Intuitively, your search within available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z] should contain all of the elements of available_dates for it to have matched successfully.

You have two options to implement this logic efficiently and successfully. You can either manually or dynamically perform the range query for each element in your array, or you can set up an ancillary that attempts to perform the match after your search has been performed. For example:

available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z](available_dates)

Which is saying, in left to right order: evaluate the range search, then check that all of the results from available_dates are contained in this evaluation (by way of a default AND query). If they are, return the element. If not, don't.

Syntactically, the above is untested and probably does not work. But procedurally, you should be able to draft the right query around this to fit your needs.

(Additional resource discussing the default AND behavior of composite search queries)

2012-04-09 01:00
by MrGomez


0

Assuming you're importing this data from database.

In your database or in your search index, create a new column that stores the max of your sales person's date (as in latest date), as well as a min. Also, calculate and store the difference between the max & min date.

Three criterias must be matched for a matching query (so use AND in the query)

  1. the differnce between the query's max & min can't be bigger than the difference as stored in the index

  2. you'd make sure {!frange l=0 u=difn_bet_query_max_and_min}sub(field_min,query_min)

  3. formulate the same thing for your max values

For a reference on function ranges http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/

2012-04-10 17:44
by Joyce
Fascinating suggestion. If I read this properly, the idea is to take temporary tables and perform high pass and low pass filtering, then see if the resulting data tables are comparable to what you were originally given. This is workable, though it seems inelegant in terms of memory and computational performance in both storage and the number of passes. Thoughts - MrGomez 2012-04-12 00:00
Still assuming you are importing from db. You don't necessarily have to store the min & max of a sale's person's date in db, you can store it in lucene/solr's index also. Like so by creating another entity under the salesperson entity, and run a query against the salesperson currently being processed to find out that person's min & max avail date. Also store the difference between the salesperson's max & min date. Same requirement on memory as before, but even more queries are performed, hence it's slower. But now you don't need to add columns to db - Joyce 2012-04-12 15:51
I've asked a question here similar to this one http://stackoverflow.com/questions/9892716/choose-solr-documents-where-one-field-is-great-than-another but that person`s solution actually asked that greater than or less than be indexed as booleans into the index :( Rather than try to calculate w/ frang - Joyce 2012-04-12 15:58
I concur. This is an entirely workable strategy, so don't be discouraged in the least! I'm genuinely interested if it's the most efficient and most expressive solution in SOLR, so we'll see what the OP manages. : - MrGomez 2012-04-12 20:38
Ads