Cross correlation of medium sized arrays

Go To StackoverFlow.com

3

I have 16 1D arrays with approximately 10-11 million double-precision elements each. I need to execute a cross-correlation across them, i.e., 1 with 2, 1 with 3, ..., 1 with 16, 2 with 3, 2 with 4, ..., 2 with 16, and so on. This cannot be done efficiently on my MacBook Intel Core 2 duo 2.4 GHz, with 4GB of RAM. My question is, what is the typical approach, if not brute force (faster processor, more RAM) that people use to overcome this problem, or problems like it? Thanks!

2012-04-04 22:25
by Phillip Cloud


2

If you calculate the Fourier transform of each of your arrays, you should be able to use the transformed arrays to efficiently calculate the cross-correlation between each pair of the original input arrays. See the "Properties" section of the Wikipedia article I linked to for the identity to use.

2012-04-04 22:43
by Jim Lewis
I know about this identity. I'm using NumPy, which directly calculates the cross-correlation. Would calculating the cross-correlation using this identity noticeably speed up the computation, i.e., enough so that I wouldn't have to run it overnight, say - Phillip Cloud 2012-04-04 22:48
@cpcloud: I'm not a Python expert, but I googled around a bit and found an answer elsewhere on SO that claimed NumPy wasn't using the FFT technique. (So it would probably be a little faster if one array was much smaller than the other, but slow for the case you're concerned with.) Calculating the FFT of a 10 million point array shouldn't take too long, so I'd expect to see a significant speedup if you implemented a cross-correlation routine using the Fourier identity - Jim Lewis 2012-04-04 23:08
Thanks! Could you post a link to where you found this information? Much appreciated - Phillip Cloud 2012-04-04 23:18
@cpcloud - FWIW, numpy.correlate (deliberately) doesn't use fft, as it explains in the docstring for the function, but scipy.signal.fft.fftconvolve does, if you want a one-linear to do it - Joe Kington 2012-04-04 23:30


1

The cross-correlation function in numpy is ridiculously slow. The openCV library has a numpy friendly cross-correlation function available. Even if you try to implement a frequency domain aproach, you won't beat the openCV library, as there are more tricks available to speed up a cross-correlation calculation. I posted about this before:

Computing cross-correlation function?

I believe the code is based on the tricks detailed in the following paper:

J. P. Lewis, “Fast template matching,” in Vision Interface, 1995, vol. 95, pp. 120–123.

2012-04-10 20:51
by ncRubert
Ads