I spent the last few days mostly investigating android options for libtheoraplayer and have narrowed down the choices to these three:
1) libstagefright - this is android's private API for video processing, apparently powerful but very undocumented. I was unable to find a single tutorial online!
I managed to hack some code to compile, link to libstagefright.so and run a few commands but nothing past that.
2) ffmpeg with stagefright support. I tried everything but just couldn't compile ffmpeg with stagefright for android. I've managed to compile a non-stagefright enabled build and it works quite well. I commited the source code for a ffmpeg backend that I wrote while playing arround with ffmpeg. It's far from complete but I'll leave it in for future development. It could be useful.
3) Good 'ol libtheora. At first I was reluctant to use theora on ARM CPUs because it's dreadfully slow, but thanks to the TheorARM project, libtheora guys made some nifty ARM optimizations. I didn't think they were anything significant but I managed to compile libtheoraplayer with OC_ARM_ASM flag and it runs much faster!
So, for now, arm optimized libtheora seems to be a good solution for android.
I'm still interested in libstagefright support but I have yet to find some proper documentation.
If anyone has a working example, please let me know. I studied ffmpeg's implementation, that's the closest thing to a totorial I got.
Next, besides actual decoding, the biggest performance bottleneck in this library is the YUV->RGB colorspace conversion. The current Implementation is a hand optimized piece of code written in C.
Obviously an assembly level implementation would do better and I've researched a bit in that direction. I've tried libyuv and libcsc4arm. Both provide some improvement, definetly a good idea to look into.
Of course, if you are able, the best color space conversion method is using the GPU via shaders, but unfortunately that's not possible in every application.