|
From: Scott F. J. <sc...@fl...> - 2000-04-28 19:16:21
|
Which is faster: unrolling the loops and growing past 12K or leaving the loops in and keeping it under? Switching to a block-based ycrcb_to_rgb gave me about 5% speed improvement over full-frame conversion. This and other changes are with Erik for review. (Got rid of place.c, broke PAL decoding, repackaged closer to library form, added dv2ppm.c, ...) I propose we stay away from kernel hacks for as long as possible. Ideally we should keep maintaining C-versions of each routine, to assist in cross-platform development. I'm sure the LinuxPPC folks will want this code, and if we keep it populated with too much ia32, they may revolt! We may also want to offer speed vs. quality options for our users: One improvement is to skip the third pass AC decoding and just return from dv_parse_video_segment() without calling dv_parse_ac_coeffs(seg). In playback, the additional error is barely noticable. (I had to use dv2ppm to grab frames and compare the results.) For some uses, like DV editting, where speed is more important than quality, I'd even be willing to forego *ALL* the AC decoding. Just give me 8x8 blocks of DC, which my tests show runs more than 3x faster-- those ducks look awfully blocky, though! There may be other intermediate "exit-points" in the decoder that we'll want to maintain as options. (Y_ONLY is another example: great for video editting when detail is needed, but color isn't.) Erik Walthinsen wrote: > > On Thu, 27 Apr 2000, James Bowman wrote: > > > I took a look at module sizes and found that we're at about 16k of text: > > we're blowing the code cache, and when code moves around (like when you > > remove a module) different functions are cacheing against each other and > > changing performance in surprising ways. > Whee! ;-) > > > The code should get smaller as we optmize it, though, so this effect > > will go away. We should be safe if the decode loop fits in 12k. > Yeah, we can do that pretty easily. Eventually I expect that a sufficient > percentage of this will be written in ASM to keep it well below that. > Then of course we have to worry about blowing the data cache. That means > all sorts of tricks, most of which aren't set up yet (such as using > non-cachable pages, which means a kernel hack). > |