2

I would like to use the Jensen-Shannon divergence as a histogram distance function. I'm implementing a simple image similarity search, and the histograms are normalized RGB color distributions.

I have a question on the Kullback-Leibler divergence formula (on which JS is based on): what should I return when Pi or Qi are zero?

Here is the implementation in F#:

```
let dKL p q =
Array.map2 (fun pi qi -> if pi = 0. then ? // ?
elif qi = 0. then ? // ?
else pi * log (pi / qi)) p q
|> Array.sum
```

and the Jensen-Shannon distance that uses it:

```
let dJS p q =
let m = Array.map2 (fun pi qi -> (pi + qi) / 2.) p q
(dKL p m) / 2. + (dKL q m) / 2.
```

Wikipedia says that it should returns 0 when pi=0 and qi>0, and is not defined when qi=0, but for histogram distance it does not make much sense. What values would make sense in this case?

here's the correct version as per Whatang's answer, for future reference:

```
let dKL p q =
Array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0.
else pi * log (pi / qi)) p q
|> Array.sum
```

I'm curious, I've been taking some stats night classes (for reference: we're studying MLE/MVUE/Sufficiency/etc at this point), but I don't see how you can shoehorn this distribution distance into one about relative frequency. Keep in mind my limited knowledge before you call me silly - Ritch Melton 2012-04-03 22:39

There isn't really a good alternative from what I am reading

`pi=0 -> 0`

is just to avoid `0 * log 0`

which is undefined, and `qi=0 -> undefined`

is because otherwise you have division by zero - Guvante 2012-04-03 23:19
There's a question similar to yours with a good answer on the Stats StackExchange: http://stats.stackexchange.com/a/1413 - Jack P. 2012-04-04 12:46

@Guvante the issue is what value is meaningful in those cases. When qi is 0 and pi is 0 there's no problem because 1) the values are equals thus distance is obviously 0, and 2) it's common to consider 0 log 0 as 0. On the other hand the problem is when only qi is 0, but as Whatang shown, this can never happen in this particular case - Francesco De Vittori 2012-04-04 14:13

@RitchMelton I'm not an expert, but the idea is that a relative frequency distribution is pretty much the same thing as a probability distribution, so Jensen-Shannon, Kullback-Leibler, chi-squared & co. are ok. The practical implementation I'm testing confirms this, JS works very well (slightly better than chi-squared) - Francesco De Vittori 2012-04-04 14:19

@FrancescoDeVittori - Yea, that makes sense - Ritch Melton 2012-04-04 14:25

3

Since you're using this to build the Jensen-Shannon divergence the only way that you can have `qi`

equal to zero in the calculation of the Kullback-Leibler divergence is if the `pi`

value is also zero. This is because really you're calculating the average of `dKL(p,m)`

and `dKL(q,m)`

, where `m=(p+q)/2`

. So `mi=0`

implies both `pi=0`

and `qi=0`

.

Expand the definition of `dKL`

to be `p log p - p log m`

, and use the convention/limit that `0 log 0 = 0`

and you'll see that there's no problem: `m`

can only be zero when `p`

also is.

To make a long story short, when you call `dKL`

from `dJS`

the second clause `elif qi = 0`

will never be executed: put whatever you like in there (probably a good idea to make it zero unless you're going to call `dKL`

from somewhere else).

Correct, hadn't thought of that. With that correction the algorithm works pretty well - Francesco De Vittori 2012-04-04 14:08