I finished an implementation of all the image and frame operations (motion compensation, wavelet lifting, frame arithmetic) of the Dirac codec (the Schroedinger implementation) in NVidia CUDA. My goal was do be able to decode HDTV Dirac video in realtime, and even in this first (correct) version, I get extremely nice speedups.
Times for decoding a 1440x1080 Dirac stream (with inter) of 3880 frames:
- CPU only implementation: 2671126.326 ms = about 1.5fps
- GPU accelerated implementation: 188548.063 ms = about 21fps
This is a speedup of almost 15x of the entire process, as the GPU accelerated implementation almost reaches real time (24fps). There is still room for more optimization, but it already looks very promising.
These timings were done by gstreamer on this machine:
CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
GPU: Geforce 8800GTX
I will release the code later
That's great :-)
I really look forward to seeing the code. We'll incorporate if we can do it in a flexible way.
Why just decoding ? It would be great to have a generic library to speedup *all* encoders.
BTW, i just found this: http://www.cse.cuhk.edu.hk/~ttwong/software/dwtgpu/dwtgpu.html
Hope that helps !
Log in to post a comment.