Menu

#3 Tune performance of cross_correlate()

open
3
2002-06-18
2002-06-18
No

I've been benchmarking the new cross_correlate()
function against code written in pure Jython, as well as
against CPython. Here's what I've learned:

On average, JNumeric runs about 15 times slower than
CPython, at least for this function, under my particular
setup (I've read somewhere that others find Jython runs
only 1.7 times slower than CPython - your milage may
vary).

Part of the reason for this is the difference in algorithms
used to compute the cross-correlation. Jnumeric uses
an FFT-based algorithm, which is Nlog(N). Cnumeric
uses a simple multiply-and-add algorithm, with some
clever usage of the mode information.

Here are some example data for cross-correlating two
vectors of 1024 elements, 1000 times each. The column
headings are:
CNumeric: use the cross_correlate() function
CPy: write a function that implements an FFT-based
cross-correlation, but use only the fft() function and write
it in pure Python

Mode CNumeric CPy JNumeric Ratio (C/J)
0 0.0103 3.75 52.3 5076
1 3.4422 3.8 57 16.6
2 4.62 4.2 62 13.4

That's great, but what happens when you go up to 8192
elements:

Mode CNumeric CPy
0 .043 63
1 231 80
2 308 73

(For a vector 8 times as long, CPy takes about 20 times
longer, while in the worst case, CNumeric takes over 64
times longer. This is as you'd expect for those
algorithms.)

It's clear from this that the CNumeric algorithm is not to
be recommended for long vector lengths (but we knew
that from reading Numerical Recipes). And yet, for short
lengths, it's really much faster (5000x!) than the FFT-
based algorithm. Just goes to show that 1000*N*logN
will always be much larger than N^2 for some small
enough N!

To get decent performance for short and long vectors,
we need to go with a dual algorithm, using ideas from
CNumeric's implementation, but keeping the FFT-based
algorithm as implemented so far.

Incidentally, comparing the build-in cross_correlate()
function with the same functionality implemented in
Jython using fft() shows that the performance is only 1-
5x faster. What's up with that?

-Frank

Discussion


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.