I would like to use the Jensen-Shannon divergence as a histogram distance function. I'm implementing a simple image similarity search, and the histograms are normalized RGB color distributions.
I have a question on the Kullback-Leibler divergence formula (on which JS is based on): what should I return when Pi or Qi are zero?
Here is the implementation in F#:
let dKL p q =
Array.map2 (fun pi qi -> if pi = 0. then ? // ?
elif qi = 0. then ? // ?
else pi * log (pi / qi)) p q
|> Array.sum
and the Jensen-Shannon distance that uses it:
let dJS p q =
let m = Array.map2 (fun pi qi -> (pi + qi) / 2.) p q
(dKL p m) / 2. + (dKL q m) / 2.
Wikipedia says that it should returns 0 when pi=0 and qi>0, and is not defined when qi=0, but for histogram distance it does not make much sense. What values would make sense in this case?
here's the correct version as per Whatang's answer, for future reference:
let dKL p q =
Array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0.
else pi * log (pi / qi)) p q
|> Array.sum
pi=0 -> 0
is just to avoid 0 * log 0
which is undefined, and qi=0 -> undefined
is because otherwise you have division by zero - Guvante 2012-04-03 23:19
Since you're using this to build the Jensen-Shannon divergence the only way that you can have qi
equal to zero in the calculation of the Kullback-Leibler divergence is if the pi
value is also zero. This is because really you're calculating the average of dKL(p,m)
and dKL(q,m)
, where m=(p+q)/2
. So mi=0
implies both pi=0
and qi=0
.
Expand the definition of dKL
to be p log p - p log m
, and use the convention/limit that 0 log 0 = 0
and you'll see that there's no problem: m
can only be zero when p
also is.
To make a long story short, when you call dKL
from dJS
the second clause elif qi = 0
will never be executed: put whatever you like in there (probably a good idea to make it zero unless you're going to call dKL
from somewhere else).