Re: [Libjpeg-turbo-users] Fastest decompression of a JPEG image, many questions
SIMD-accelerated libjpeg-compatible JPEG codec library
Brought to you by:
dcommander
From: Eric B. <eri...@gm...> - 2012-06-25 15:25:14
|
Hi, i come back to you to give feedback on my search: I have used JPEG API with JDCT_IFAST, this clearly speed up libjpeg-turbo for 5-10%. I also use YUV decompression instead of RGB, i noticed that on my window the decompression time is divided by two. I was thinking that YUV decompression was the bigger part, but it seems to be the same, is it normal? I have analyzed VLC which use another library called avcodec to decode frames. My test shows that on my Linux system, libjpeg-turbo with JPEG API is around 5%-10% faster than avcodec. Nevertheless, on my Windows inside VirtualBox, libjpeg-turbo is around 70% slower that avcodec. I'll use the same code, on Linux and Windows, any ideas about why i meet a such difference? Is it a bug in libjpeg-turbo with SIMD acceleration or windows build? Or I'm doing something wrong? I'll give you below my test results. Regards, Eric JPEG=Using libjpeg8 from ijg JPEGTurbo=Using libjpeg-turbo with JPEG API. JPEGTurboTJPEG=Using libjpeg-turbo with TurboJPEG API. AVCodec=Using libavcodec. D=decompression time ***** Result on Linux: ***** [Main] Starting process: Using 9 frame for 30 test, codec=JPEG, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 96(117) ms => D=96 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 92(116) ms => D=92 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 92(116) ms => D=92 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 96(117) ms => D=96 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 92(115) ms => D=92 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 90(114) ms => D=90 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 92(115) ms => D=92 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 94(116) ms => D=94 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 926 ms [Main] Starting process: Using 9 frame for 30 test, codec=JPEGTurbo, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 55(62) ms => D=55 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 58(61) ms => D=58 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 56(61) ms => D=56 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 54(60) ms => D=54 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 58(61) ms => D=58 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 57(61) ms => D=57 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 55(61) ms => D=55 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 56(62) ms => D=56 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 489 ms [Main] Starting process: Using 9 frame for 30 test, codec=JPEGTurboTJPEG, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 60(74) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 60(73) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 60(73) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 60(73) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 60(73) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 60(73) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 60(74) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 60(73) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 586 ms [Main] Starting process: Using 9 frame for 30 test, codec=AVCodec, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 61(82) ms => D=61 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 60(81) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 60(81) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 60(81) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 61(83) ms => D=61 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 60(80) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 60(81) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 60(81) ms => D=60 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 650 ms ***** Result on Windows: ***** [Main] Starting process: Using 9 frame for 30 test, codec=JPEG, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 160(160) ms => D=160 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 181(181) ms => D=161 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 220(220) ms => D=210 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 250(250) ms => D=250 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 240(240) ms => D=240 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 251(251) ms => D=251 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 240(240) ms => D=240 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 250(250) ms => D=250 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 1792 ms [Main] Starting process: Using 9 frame for 30 test, codec=JPEGTurbo, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 271(271) ms => D=271 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 240(240) ms => D=240 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 250(250) ms => D=250 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 260(260) ms => D=250 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 180(180) ms => D=180 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 241(241) ms => D=231 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 240(240) ms => D=230 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 240(240) ms => D=230 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 1922 ms [Main] Starting process: Using 9 frame for 30 test, codec=JPEGTurboTJPEG, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 261(261) ms => D=241 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 260(260) ms => D=250 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 260(260) ms => D=260 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 270(270) ms => D=270 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 261(261) ms => D=261 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 260(260) ms => D=260 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 260(260) ms => D=250 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 270(270) ms => D=260 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 2102 ms [Main] Starting process: Using 9 frame for 30 test, codec=AVCodec, gui=(none) [Main] Test 1 (1280x800:420) => (1280x800:I420) => (160x100:RGB32)) taken 90(90) ms => D=90 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 2 (1280x800:420) => (1280x800:I420) => (320x200:RGB32)) taken 80(80) ms => D=80 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 3 (1280x800:420) => (1280x800:I420) => (640x400:RGB32)) taken 90(90) ms => D=90 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 4 (1280x800:420) => (1280x800:I420) => (1280x800:RGB32)) taken 90(90) ms => D=90 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 5 (1280x800:420) => (1280x800:I420) => (320x240:RGB32)) taken 90(90) ms => D=90 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 6 (1280x800:420) => (1280x800:I420) => (420x360:RGB32)) taken 90(90) ms => D=90 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 7 (1280x800:420) => (1280x800:I420) => (640x480:RGB32)) taken 80(80) ms => D=80 ms, L=0 ms, S=0 ms, X=0 ms [Main] Test 8 (1280x800:420) => (1280x800:I420) => (1024x780:RGB32)) taken 80(80) ms => D=80 ms, L=0 ms, S=0 ms, X=0 ms [Main] Total redering time 690 ms On Thu, Jun 7, 2012 at 12:18 AM, DRC <dco...@us...>wrote: > No problem. Thanks very much for the donation! > > > On 6/6/12 4:41 PM, Eric Beuque wrote: > > Hi, > > > > Sorry, i don't want you think I'm complaining (maybe my english is not > > very good), this just a suggestion. I'm also an opensource developper > > and i know that users always want lots of things quickly for free. I > > just say that i can help if you need depending of my knowledge of > course... > > > > Thank you very much for your time. > > > > I will do a small gift for this project that I appreciate. :) > > > > > > On Wed, Jun 6, 2012 at 10:46 PM, DRC <dco...@us... > > <mailto:dco...@us...>> wrote: > > > > Decompressing to YUV: "better"? It depends. You have to bear in > mind > > that libjpeg-turbo is only capable of planar output, so if your JPEG > > image used 4:2:0 subsampling, then you can directly output a format > that > > X Video will accept (YUV12), but if the JPEG file uses 4:2:2 or 4:4:4 > > subsampling, then you'll have to do some conversion between the > format > > that libjpeg-turbo outputs and the format that X Video expects > > (generally, X Video expects packed pixels when using 4:2:2, and I've > > never seen an implementation that could accept 4:4:4 directly, so > you'll > > have to do your own subsampling there.) In the aggregate, this may > be > > faster than decompressing to RGB and drawing RGB, but it depends on > how > > fast your planar-to-packed conversion is. > > > > Documentation: libjpeg-turbo is a drop-in replacement for libjpeg. > > Most people use it exactly the same way they use libjpeg, so the > > documentation for the libjpeg API and tools (cjpeg, djpeg, etc.) is > > sourced almost verbatim from the upstream project. If you find it > > lacking, complain to them. > > > > For those who prefer the TurboJPEG API, that is thoroughly documented > > using Doxygen and Javadoc. The README-turbo.txt file describes the > > trade-offs between those two APIs. All of the above are linked on > our > > web page: > > > > http://www.libjpeg-turbo.org/Documentation/Documentation > > > > Please bear in mind that I am an independent developer, so I only get > > paid for working on this stuff whenever someone sponsors the work > > financially. If a company or organization wants to step forward and > pay > > for however many hundreds of hours it takes to write a book about > > libjpeg-turbo, I am more than happy to oblige, but you're the first > > person who's complained about it. > > > > > > On 6/6/12 7:21 AM, Eric Beuque wrote: > > > Hi! Thank you very much for your fast and really complete answer! > > > > > > You right, I will try to use libjpegturbo with the libjpeg API to > > > compare the same things. > > > > > > I'll do some research and i ask some questions. VLC seems to > decompress > > > images into YUV format with libav or SDL_image, and > rendering/scaling > > > them using video acceleration (from XVideo, DirectX or OpenGL...). > > > So I think, I have to do the same, using libjpeg-turbo for the > > > decompression. > > > > > > Can you just confirm to me if I understand well, that > decompressing to > > > YUV and rendering to YUV (if possible), should be theoretically > better > > > than decompressing to RGB(X-A) and rendering to RGB(X-A)? > > > > > > I think, you should add more documentation to your project home > page, > > > because it's really difficult to understand how the library works, > when > > > we are beginner like me in image processing, and we don't know > well how > > > to use with the smart way. All the information you gave to me, > > should be > > > useful for every one want to use libjpeg-turbo. If you want, i can > try > > > to recap what i know and what you told me, and you should only > have to > > > review it. > > > > > > Thank you very much for you time and this very nice project. > > > > > > > > > On Mon, Jun 4, 2012 at 10:43 PM, DRC > > <dco...@us... > > <mailto:dco...@us...> > > > <mailto:dco...@us... > > <mailto:dco...@us...>>> wrote: > > > > > > Wow, those are a lot of questions. I will attempt to answer > them > > > all here rather than in line, because trying to inline such a > long > > > e-mail is just going to mushroom into a series of 10-page > replies, > > > each with 25 different threads. If you need more in-depth > analysis > > > and profiling than I am able to provide off the top of my head, > > then > > > I am certainly more than capable of providing those services (I > > have > > > more than 16 years of experience in performance analysis and > > > optimization and a lot of specific experience with streaming > > video), > > > but I can't do that work for free. > > > > > > I really can't say why VLC would be faster, because I don't > know > > > what they're doing under the hood. I can say that > libjpeg-turbo > > > should be 2-4x as fast as libjpeg. It doesn't appear that > you're > > > achieving that, which makes me suspicious that something is > amiss. > > > The first thing I notice is that you're using libjpeg-turbo > through > > > the TurboJPEG API. As a sanity check, have you tried using it > > > through the libjpeg API instead, to provide a pure > apples-to-apples > > > comparison with your code that is using libjpeg? If there is a > > > difference between the code that uses the TurboJPEG API and the > > code > > > that uses the libjpeg API, then that will tell you that the > issue > > > is, at least partly, in your use of the library and not the > library > > > itself. There should not be a significant difference between > those > > > two API's in terms of speed, as long as you are using them > > similarly > > > (for instance, configuring fast upsampling in each), but the > > libjpeg > > > API is more powerful, allowing custom source/destination > > managers to > > > support buffering, etc. TurboJPEG just puts the entire > > input/output > > > image into memory. > > > > > > If speed is paramount, you probably want to set dct_method to > > > JDCT_FASTEST. I think the default is to use the slow integer > DCT > > > when decompressing. Using the fast integer DCT may buy you an > > > additional 10%, but this doesn't explain the lack of speedup > > between > > > libjpeg and libjpeg-turbo (since libjpeg uses the same default, > > > unless you have explicitly overridden it.) > > > > > > Regarding the colorspace extensions, bear in mind that, in all > > > cases, libjpeg-turbo still has to do YUV-to-RGB conversion, so > the > > > R, G, and B components are always computed and stored > individually. > > > The only difference is where in the output word they're > stored, and > > > actually, the 4-byte components can often be faster, because > the > > > load/stores are aligned. > > > > > > Rendering time is really irrelevant to this discussion, as the > > > rendering time is the responsibility of your app, not > > libjpeg-turbo. > > > > > > You're better off letting libjpeg-turbo do YUV-to-RGB > conversion, > > > because it is SIMD-accelerated. The only reason why you would > want > > > to decompress to YUV is if you are using X Video or some other > type > > > of mechanism that can directly offload the YUV pixels to the > > > hardware. Even then, unless your source image was encoded > using > > > 4:2:0 subsampling, you'll probably have to convert from one YUV > > > format to another once you decompress the image, since most > > hardware > > > that deals with 4:2:2 images expects the pixels to be packed > (and > > > libjpeg-turbo always outputs planar, since that's what the > pixels > > > are stored as internally.) > > > > > > libjpeg and libjpeg-turbo scale images in the process of doing > the > > > inverse DCT, so only certain scaling factors are supported > (N/8, 1 > > > <= N <= 16) [NOTE: the currently released version of > libjpeg-turbo > > > doesn't yet support any factors other than 1/2, 1/4, and 1/8. > > > Support for other factors is in the SVN trunk.] The > usefulness of > > > scaling within the JPEG library is a matter for debate. Some > apps > > > (image viewers, etc.) use it as a fast way to get an image that > > more > > > or less fits within the boundaries of the screen, but in the > > case of > > > libjpeg-turbo, only scaling by 1/2 and 1/4 are > SIMD-accelerated, so > > > there is no real performance advantage to using in-library > scaling > > > as opposed to decompressing the full image and scaling using > > another > > > library. The main advantage to scaling within > > libjpeg/libjpeg-turbo > > > is that it saves memory, but you can eliminate that advantage > as > > > well by using buffered I/O (that is, decompressing the file in > > > chunks and scaling each chunk before moving on to the next.) > > > > > > VirtualBox? No clue. That question is better posed to a > > VirtualBox > > > mailing list. > > > > > > I will say that, in general, there is not any low-hanging > fruit for > > > improving the performance of libjpeg-turbo. There are some > > > difficult things that could be done, such as figuring out a > way to > > > SIMD accelerate the Huffman coding routines, but in general, > we've > > > extracted all of the "easy" performance out of it that we're > > capable > > > of extracting. Many people and projects are using it quite > > > successfully and have confirmed the same 2-4x speedup that I > claim, > > > but of course Amdahl's Law applies. > > > > > > > > > On 6/4/12 8:49 AM, Eric Beuque wrote: > > >> Hi, > > >> > > >> I'm using libjpeg-turbo to display in QT a MJPEG stream from > an IP > > >> Camera. > > >> > > >> I have tested IJG jpeg, libjpeg-turbo and Intel IPP to > compare the > > >> decompression time. > > >> > > >> Nevertheless, i have compared also with vlc, i don't know > how, but > > >> vlc seems strongly better to decode the stream. In fact, > playing > > >> the stream in VLC cost only around 8-10% of CPU, but with my > > >> application with QT+libjpeg turbo it is near to 25%. I did > some > > >> test and i found that 9% of CPU is the time libjpeg-turbo > need to > > >> perform the JPEG decompression. This should be acceptable if i > > >> have to use only one IP Camera at the same time, but actually > I > > >> would like to use many camera a the same time. I know i have > to > > >> perform some optimization in Qt for the rendering part, but i > was > > >> wondering if it is possible to improve the decompression time, > > >> hopping it cost me maximum 5% of my CPU time. Maybe i'm doing > > >> something wrong with the decoder. > > >> > > >> Moreover, i have also tried on Windows inside a VM with > Virtual > > >> Box. My application takes 90-100% of the CPU time instead of > VLC > > >> which take only 35-40%. > > >> > > >> So i did some test programs. This consist to render 30 JPEG > images > > >> where original size is 1280x800 to a specified size. The > image is > > >> pre-scaled at decompression by selecting the best scale ratio. > > >> D=Decompression time (libjpeg-turbo part) > > >> L=Pixmap loading time (pixmap to GUI image format) > > >> S=Scaling time > > >> X=Rendering time > > >> > > >> 1 ) Using an Intel core i5 on Linux 3.2 64 bits (LMDE): > > >> 1.1) Test with libjpeg-turbo (decompression is TJPF_BGRA, > with > > >> TJFLAG_FASTUPSAMPLE flags) > > >> [Qt +TurboJPEG ] Test 1: (30 images 1280x800 => 160x100 > > >> (160x100)) is 72(85) ms => D=30 ms, L=0 ms, S=0 ms, X=38 ms > > >> [Qt +TurboJPEG ] Test 2: (30 images 1280x800 => 320x200 > > >> (320x200)) is 138(151) ms => D=43 ms, L=0 ms, S=0 ms, X=86 ms > > >> [Qt +TurboJPEG ] Test 3: (30 images 1280x800 => 640x400 > > >> (640x400)) is 197(209) ms => D=60 ms, L=0 ms, S=0 ms, X=113 ms > > >> [Qt +TurboJPEG ] Test 4: (30 images 1280x800 => 1280x800 > > >> (1280x800)) is 358(370) ms => D=91 ms, L=0 ms, S=0 ms, X=246 > ms > > >> [Qt +TurboJPEG ] Test 5: (30 images 1280x800 => 320x240 > > >> (320x200)) is 90(100) ms => D=54 ms, L=0 ms, S=0 ms, X=30 ms > > >> [Qt +TurboJPEG ] Test 6: (30 images 1280x800 => 420x360 > > >> (640x400)) is 120(129) ms => D=60 ms, L=0 ms, S=0 ms, X=30 ms > > >> [Qt +TurboJPEG ] Test 7: (30 images 1280x800 => 640x480 > > >> (640x400)) is 135(146) ms => D=60 ms, L=0 ms, S=0 ms, X=36 ms > > >> [Qt +TurboJPEG ] Test 8: (30 images 1280x800 => 1024x780 > > >> (1280x800)) is 252(266) ms => D=90 ms, L=0 ms, S=30 ms, X=110 > ms > > >> [Qt +TurboJPEG ] Total redering time 1456 ms > > >> 1.2) Test with libjpeg-turbo (decompression is TJPF_RGB, with > > >> TJFLAG_FASTUPSAMPLE flags) > > >> [Qt +TurboJPEG ] Test 1: (30 images 1280x800 => 160x100 > > >> (160x100)) is 48(62) ms => D=30 ms, L=0 ms, S=0 ms, X=13 ms > > >> [Qt +TurboJPEG ] Test 2: (30 images 1280x800 => 320x200 > > >> (320x200)) is 154(166) ms => D=43 ms, L=0 ms, S=0 ms, X=97 ms > > >> [Qt +TurboJPEG ] Test 3: (30 images 1280x800 => 640x400 > > >> (640x400)) is 223(239) ms => D=62 ms, L=0 ms, S=0 ms, X=148 ms > > >> [Qt +TurboJPEG ] Test 4: (30 images 1280x800 => 1280x800 > > >> (1280x800)) is 400(414) ms => D=104 ms, L=0 ms, S=0 ms, X=281 > ms > > >> [Qt +TurboJPEG ] Test 5: (30 images 1280x800 => 320x240 > > >> (320x200)) is 97(118) ms => D=55 ms, L=0 ms, S=0 ms, X=33 ms > > >> [Qt +TurboJPEG ] Test 6: (30 images 1280x800 => 420x360 > > >> (640x400)) is 138(150) ms => D=60 ms, L=0 ms, S=22 ms, X=22 ms > > >> [Qt +TurboJPEG ] Test 7: (30 images 1280x800 => 640x480 > > >> (640x400)) is 197(214) ms => D=60 ms, L=0 ms, S=51 ms, X=67 ms > > >> [Qt +TurboJPEG ] Test 8: (30 images 1280x800 => 1024x780 > > >> (1280x800)) is 370(383) ms => D=92 ms, L=0 ms, S=126 ms, > X=118 ms > > >> 1.3) Test with IJG libjpeg (decompression is JCS_RGB) > > >> [Qt +JPEG ] Test 1: (30 images 1280x800 => 160x100 > > >> (160x100)) is 73(90) ms => D=30 ms, L=0 ms, S=0 ms, X=38 ms > > >> [Qt +JPEG ] Test 2: (30 images 1280x800 => 320x200 > > >> (320x200)) is 163(176) ms => D=45 ms, L=0 ms, S=0 ms, X=102 ms > > >> [Qt +JPEG ] Test 3: (30 images 1280x800 => 640x400 > > >> (640x400)) is 227(242) ms => D=96 ms, L=0 ms, S=0 ms, X=117 ms > > >> [Qt +JPEG ] Test 4: (30 images 1280x800 => 1280x800 > > >> (1280x800)) is 571(587) ms => D=329 ms, L=0 ms, S=0 ms, X=228 > ms > > >> [Qt +JPEG ] Test 5: (30 images 1280x800 => 320x240 > > >> (320x200)) is 94(113) ms => D=55 ms, L=0 ms, S=0 ms, X=26 ms > > >> [Qt +JPEG ] Test 6: (30 images 1280x800 => 420x360 > > >> (320x200)) is 119(136) ms => D=55 ms, L=0 ms, S=21 ms, X=21 ms > > >> [Qt +JPEG ] Test 7: (30 images 1280x800 => 640x480 > > >> (640x400)) is 207(218) ms => D=94 ms, L=0 ms, S=38 ms, X=48 ms > > >> [Qt +JPEG ] Test 8: (30 images 1280x800 => 1024x780 > > >> (1280x800)) is 571(587) ms => D=310 ms, L=0 ms, S=122 ms, > X=120 ms > > >> [Qt +JPEG ] Total redering time 2149 ms > > >> > > >> 2) Using a Windows 7 32 bits inside Virtual Box on the same > > machine: > > >> 2.1) Test with libjpeg-turbo (decompression is TJPF_BGRA, > with > > >> TJFLAG_FASTUPSAMPLE flags) > > >> [Qt +TurboJPEG ] Test 1: (30 images 1280x800 => 160x100 > > >> (160x100)) is 70(70) ms => D=50 ms, L=20 ms, S=0 ms, X=0 ms > > >> [Qt +TurboJPEG ] Test 2: (30 images 1280x800 => 320x200 > > >> (320x200)) is 101(101) ms => D=91 ms, L=0 ms, S=0 ms, X=10 ms > > >> [Qt +TurboJPEG ] Test 3: (30 images 1280x800 => 640x400 > > >> (640x400)) is 170(170) ms => D=140 ms, L=0 ms, S=0 ms, X=30 ms > > >> [Qt +TurboJPEG ] Test 4: (30 images 1280x800 => 1280x800 > > >> (1280x800)) is 660(660) ms => D=570 ms, L=0 ms, S=0 ms, X=90 > ms > > >> [Qt +TurboJPEG ] Test 5: (30 images 1280x800 => 320x240 > > >> (320x200)) is 131(131) ms => D=101 ms, L=0 ms, S=30 ms, X=0 ms > > >> [Qt +TurboJPEG ] Test 6: (30 images 1280x800 => 420x360 > > >> (640x400)) is 290(290) ms => D=200 ms, L=0 ms, S=80 ms, X=10 > ms > > >> [Qt +TurboJPEG ] Test 7: (30 images 1280x800 => 640x480 > > >> (640x400)) is 390(390) ms => D=210 ms, L=0 ms, S=160 ms, X=20 > ms > > >> [Qt +TurboJPEG ] Test 8: (30 images 1280x800 => 1024x780 > > >> (1280x800)) is 1031(1031) ms => D=601 ms, L=0 ms, S=400 ms, > > X=30 ms > > >> [Qt +TurboJPEG ] Total redering time 2843 ms > > >> 2.2) Test with libjpeg-turbo (decompression is TJPF_RGB, with > > >> TJFLAG_FASTUPSAMPLE flags) > > >> [Qt +TurboJPEG ] Test 1: (30 images 1280x800 => 160x100 > > >> (160x100)) is 70(70) ms => D=40 ms, L=0 ms, S=0 ms, X=30 ms > > >> [Qt +TurboJPEG ] Test 2: (30 images 1280x800 => 320x200 > > >> (320x200)) is 120(120) ms => D=110 ms, L=0 ms, S=0 ms, X=10 ms > > >> [Qt +TurboJPEG ] Test 3: (30 images 1280x800 => 640x400 > > >> (640x400)) is 250(250) ms => D=150 ms, L=0 ms, S=0 ms, X=100 > ms > > >> [Qt +TurboJPEG ] Test 4: (30 images 1280x800 => 1280x800 > > >> (1280x800)) is 1071(1071) ms←[00m => D=460 ms, L=0 ms, S=0 ms, > > >> X=611 ms > > >> [Qt +TurboJPEG ] Test 5: (30 images 1280x800 => 320x240 > > >> (320x200)) is 210(210) ms => D=60 ms, L=0 ms, S=60 ms, X=90 ms > > >> [Qt +TurboJPEG ] Test 6: (30 images 1280x800 => 420x360 > > >> (640x400)) is 411(411) ms => D=180 ms, L=0 ms, S=121 ms, > X=110 ms > > >> [Qt +TurboJPEG ] Test 7: (30 images 1280x800 => 640x480 > > >> (640x400)) is 650(650) ms => D=220 ms, L=0 ms, S=240 ms, > X=190 ms > > >> [Qt +TurboJPEG ] Test 8: (30 images 1280x800 => 1024x780 > > >> (1280x800)) is 1572(1573) ms => D=451 ms, L=0 ms, S=641 ms, > > X=480 ms > > >> [Qt +TurboJPEG ] Total redering time 4355 ms > > >> 2.3) Test with IJG libjpeg (decompression is JCS_RGB) > > >> [Qt +JPEG ] Test 1: (30 images 1280x800 => 160x100 > > >> (160x100)) is 80(80) ms => D=40 ms, L=0 ms, S=0 ms, X=40 ms > > >> [Qt +JPEG ] Test 2: (30 images 1280x800 => 320x200 > > >> (320x200)) is 110(110) ms => D=40 ms, L=0 ms, S=0 ms, X=70 ms > > >> [Qt +JPEG ] Test 3: (30 images 1280x800 => 640x400 > > >> (640x400)) is 261(261) ms => D=111 ms, L=0 ms, S=0 ms, X=150 > ms > > >> [Qt +JPEG ] Test 4: (30 images 1280x800 => 1280x800 > > >> (1280x800)) is 1091(1091) ms => D=430 ms, L=0 ms, S=0 ms, > X=661 ms > > >> [Qt +JPEG ] Test 5: (30 images 1280x800 => 320x240 > > >> (320x200)) is 180(180) ms => D=100 ms, L=0 ms, S=50 ms, X=30 > ms > > >> [Qt +JPEG ] Test 6: (30 images 1280x800 => 420x360 > > >> (320x200)) is 260(260) ms => D=50 ms, L=0 ms, S=120 ms, X=90 > ms > > >> [Qt +JPEG ] Test 7: (30 images 1280x800 => 640x480 > > >> (640x400)) is 631(631) ms => D=110 ms, L=0 ms, S=271 ms, > X=250 ms > > >> [Qt +JPEG ] Test 8: (30 images 1280x800 => 1024x780 > > >> (1280x800)) is 1591(1592) ms => D=530 ms, L=0 ms, S=610 ms, > > X=451 ms > > >> [Qt +JPEG ] Total redering time 4205 ms > > >> > > >> To increase the performance Qt's QPainter prefers to use : > > >> QImage::Format_ARGB32_ > > >> Premultiplied, QImage::Format_RGB32 or QImage::Format_RGB16. > So > > >> the only direct conversion I found is between > QImage::Format_RGB32 > > >> and TJPF_BGRA, but it increase the decompression time. > > >> (http://qt-project.org/doc/qt-4.8/QPainter.html#performance) > > >> > > >> So at this point I have some questions about libjpeg-turbo: > > >> > > >> 1) What is/are the fastest output format (not considering the > GRAY > > >> of course) within the following list: > > >> TJPF_RGB, TJPF_BGR, TJPF_RGBX, TJPF_BGRX, TJPF_XBGR, > TJPF_XRGB, > > >> TJPF_GRAY, TJPF_RGBA, TJPF_BGRA, TJPF_ABGR, TJPF_ARGB > > >> I guess 3 bytes output format are faster that 4 bytes output, > > right? > > >> > > >> 2) I'm using the TurboJPEG interface, should i prefer the IJG > JPEG > > >> interface? I saw some parameters are not present in the > TurboJPEG > > >> interface. Is this interface optimized for quality or speed? > > >> Actually, my focus is to prefer speed instead of quality. > > >> > > >> 3) Do you think I should prefers a YUV decompression? I don't > > >> really know what is the consequence to use RGB or YUV > > >> decompression. But I think Qt is not able to render directly > with > > >> YUV, and i think the best is to minimize the conversion step. > > >> > > >> 4) Is there a way to get directly the image scaled to the > expected > > >> dimension? Actually i perform a scaling in qt on the > pre-scaled > > >> image. Is libjpeg-turbo has a fast algorithm (like nearest > > >> neighbour) to rescale images? > > >> > > >> 5) Why it is so slow on Virtual Box? Is there a way to improve > > this? > > >> > > >> In advance, thank you if you could enlighten me. > > >> > > >> Find below my source code: > > >> > > >> static bool > > >> Module_Decompress (const ImageData* pImage, Size& imageSize, > const > > >> Size& desiredSize, Size& outputSize, PIXMAP_TYPE outputFormat, > > >> PixmapData** ppPixmap) > > >> { > > >> int res; > > >> > > >> if(!Module_IsSupportedImageFormat(pImage->type)){ > > >> log(LOG_ERROR, "Invalid image type"); > > >> return false; > > >> } > > >> > > >> if(!Module_IsSupportedPixmapFormat(outputFormat)){ > > >> log(LOG_ERROR, "Invalid pixmap type"); > > >> return false; > > >> } > > >> > > >> // Init decompressor > > >> if(!g_instance.tjhnd) { > > >> g_instance.tjhnd = tjInitDecompress(); > > >> if(!g_instance.tjhnd) { > > >> log(LOG_ERROR, "TurboJPEG init decompress error: > %s", > > >> tjGetErrorStr()); > > >> return false; > > >> } > > >> } > > >> > > >> // Read headers > > >> int iImageJpegSubsamp; > > >> if(imageSize.width == 0 || imageSize.height == 0){ > > >> // If we don't know the width, we read it > > >> res = tjDecompressHeader2(g_instance.tjhnd, > pImage->data, > > >> pImage->size, &imageSize.width, &imageSize.height, > > >> &iImageJpegSubsamp); > > >> //std::cout << "Size:" << iDesiredWidth << "x" << > > >> iDesiredHeight << ":" << iJpegSubsamp << "=" << > > >> tjPixelSize[iJpegSubsamp] << std::endl; > > >> if(res != 0){ > > >> log(LOG_ERROR, "TurboJPEG decompress header error: > > >> %s", tjGetErrorStr()); > > >> return false; > > >> } > > >> } > > >> > > >> // Select best scaling factor depending of the desired > size > > >> tjscalingfactor* pBestFactor = NULL; > > >> if(pBestFactor == NULL){ > > >> double fDesiredFactor = ((double)desiredSize.width / > > >> (double)imageSize.width); > > >> fDesiredFactor = MAX(fDesiredFactor, > > >> ((double)desiredSize.height / (double)imageSize.height)); > > >> > > >> int iNumScalingFactors; > > >> tjscalingfactor* pListFactors = > > >> tjGetScalingFactors(&iNumScalingFactors); > > >> double fBestFactor = 1.0; > > >> double fFactor; > > >> //std::cout << "iNumScalingFactors: " << > > >> iNumScalingFactors << std::endl; > > >> for(int i=0; i<iNumScalingFactors; i++){ > > >> fFactor = ((double)pListFactors[i].num / > > >> (double)pListFactors[i].denom); > > >> //std::cout << "iDesiredFactor: " << > fDesiredFactor << > > >> " vs " << fFactor; > > >> if(fFactor == fDesiredFactor){ > > >> // We found the best > > >> pBestFactor = &pListFactors[i]; > > >> fBestFactor = fFactor; > > >> break; > > >> } > > >> > > >> if(!pBestFactor){ > > >> pBestFactor = &pListFactors[i]; > > >> fBestFactor = fFactor; > > >> }else{ > > >> //std::cout << "res: " << > fabs((double)(fFactor - > > >> fDesiredFactor)) << " vs " << fabs((double)(fBestFactor - > > >> fDesiredFactor)); > > >> if(fabs(fFactor - fDesiredFactor) < > > >> fabs(fBestFactor - fDesiredFactor)){ > > >> pBestFactor = &pListFactors[i]; > > >> fBestFactor = fFactor; > > >> } > > >> } > > >> } > > >> } > > >> > > >> // Perform JPEG decompression > > >> unsigned long bufSize = 0; > > >> unsigned char* buffer = NULL; > > >> if(true){ > > >> PixmapData* pPixmap = new PixmapData; > > >> int tjFormat; > > >> switch(outputFormat){ > > >> case PIXMAP_BGR: > > >> tjFormat = TJPF_BGR; > > >> pPixmap->type = PIXMAP_BGR; > > >> break; > > >> case PIXMAP_RGB32: > > >> tjFormat = TJPF_BGRA; > > >> pPixmap->type = PIXMAP_RGB32; > > >> break; > > >> default: > > >> tjFormat = TJPF_RGB; > > >> pPixmap->type = PIXMAP_RGB; > > >> break; > > >> } > > >> > > >> outputSize.width = TJSCALED(imageSize.width, > > (*pBestFactor)); > > >> outputSize.height = TJSCALED(imageSize.height, > > >> (*pBestFactor)); > > >> int iScaledPitch = > tjPixelSize[tjFormat]*outputSize.width; > > >> //std::cout << "iScale: " << pBestFactor->num << "/" > << > > >> pBestFactor->denom << " => " << iScaledWidth << "x" << > > iScaledHeight; > > >> > > >> bufSize = iScaledPitch * outputSize.height; > > >> //unsigned long bufSize = iScaledPitch * > iScaledHeight; > > >> buffer = tjAlloc(bufSize); > > >> //std::cout << "bufSize:" << bufSize; > > >> res = tjDecompress2(g_instance.tjhnd, pImage->data, > > >> pImage->size, buffer, outputSize.width, iScaledPitch, > > >> outputSize.height, tjFormat, TJFLAG_FASTUPSAMPLE); > > >> if(res != 0){ > > >> log(LOG_ERROR, "TurboJPEG decompress error: %s", > > >> tjGetErrorStr()); > > >> delete pPixmap; > > >> pPixmap = NULL; > > >> return false; > > >> } > > >> > > >> pPixmap->size = bufSize; > > >> pPixmap->data = buffer; > > >> if(ppPixmap){ > > >> *ppPixmap = pPixmap; > > >> } > > >> > > >> } > > >> > > >> return true; > > >> } > > >> > > >> > > >> > > > ------------------------------------------------------------------------------ > > >> Live Security Virtual Conference > > >> Exclusive live event will cover all the ways today's security > and > > >> threat landscape has changed and how IT managers can respond. > > Discussions > > >> will include endpoint security, mobile security and the latest > > in malware > > >> threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > >> > > >> > > >> _______________________________________________ > > >> Libjpeg-turbo-users mailing list > > >> Lib...@li... > > <mailto:Lib...@li...> > > <mailto:Lib...@li... > > <mailto:Lib...@li...>> > > >> > https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-users > > > > > > > > > ------------------------------------------------------------------------------ > > > Live Security Virtual Conference > > > Exclusive live event will cover all the ways today's security > and > > > threat landscape has changed and how IT managers can respond. > > > Discussions > > > will include endpoint security, mobile security and the latest > in > > > malware > > > threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > > > Libjpeg-turbo-users mailing list > > > Lib...@li... > > <mailto:Lib...@li...> > > > <mailto:Lib...@li... > > <mailto:Lib...@li...>> > > > > https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-users > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Live Security Virtual Conference > > > Exclusive live event will cover all the ways today's security and > > > threat landscape has changed and how IT managers can respond. > > Discussions > > > will include endpoint security, mobile security and the latest in > > malware > > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > > > > > > > > > _______________________________________________ > > > Libjpeg-turbo-users mailing list > > > Lib...@li... > > <mailto:Lib...@li...> > > > https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-users > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. > > Discussions > > will include endpoint security, mobile security and the latest in > > malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Libjpeg-turbo-users mailing list > > Lib...@li... > > <mailto:Lib...@li...> > > https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-users > > > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > > > > > _______________________________________________ > > Libjpeg-turbo-users mailing list > > Lib...@li... > > https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-users > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Libjpeg-turbo-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-users > |