I should note that when I wrote my first response I was under the impression that we had been using the PandaBoard. It turns out we are using the PandaBoard ES, which uses the OMAP4460 and not the DuoVero's OMAP4430.
Also because people seem to be curious, I'll elaborate on the performance differences. We are doing pupil detection using a 9-step process which includes converting the image to grayscale and using a histogram to crop the image around the pupil. Then we use a haar feature detection and the starburst algorithm to find where edge of the ellipse might be, followed by an ellipse fitting algorithm. We do this using TBB + Boost + OpenCV libraries. We have compiled both OpenCV and the algorithm with gcc (4.7.1) using -mfpu=neon, -mfpu=vfpv3, -mfpu=vfpv3-fp1, and no additional flags at all.
With ~95% accuracy, this algorithm ran at ~1-2 fps on the DuoVero and ~6 fps on the PandaBoard ES. With ~60% accuracy, we were able to run two instances of the algorithm on the DuoVero at ~8 fps (each) and ~17 fps (each) on the PandaBoard ES. We are currently looking into using the armcc compiler to improve this even more (anyone know how to do this?)
Now, I don't know the reasoning behind this, but I suspect it may be because the PandaBoard includes the drivers for the GPU which means support for OpenGL, which may allow the system to use OpenGL calls to accelerate matrix operations.
PS: If anyone has any ideas/suggestions/criticisms for things we are doing, please let me know.