From: Petri H. <phi...@us...> - 2007-04-23 11:07:28
|
On Sat, 2007-04-21 at 13:28 +0100, Darren Salt wrote: > Given either a suitably-commented patch (comments based on that description), > or a patch which just adds comments on top of the existing patch, I'm happy > to commit this. New version attached. > Even so, it's still a saving, and worthwhile in often-executed code. Few more ticks could be saved by pre-multiplying alphas: 4 x multiplication / blended pixel --> 2 x multiplication / rle element (or even palette entry) memset_word(&(*blend_yuv_data)[ 1 ][ (y + y_odd) & 1 ][ x + x_odd ], my_clut[ clr ].cr * my_trans[ clr ], rle_this_bite); ... - *dst_cr = ((*dst_cr * t4 + cr00 * o00 + cr01 * o01 + cr10 * o10 + cr11 * o11) * (0x1111+1)) >> 18; + *dst_cr = ((*dst_cr * t4 + cro00 + cro01 + cro10 + cro11) * (0x1111+1)) >> 18; This would double cached cr/cb memory usage but that shouldn't be problem. However, if there are no semi-transparent pixels in the overlay, blending will be slower than now (1 additional multiplication / blended pixel). There's one thing I'd like to change before further optimizations: Currently highlight area is handled in every blending function. It should be possible to preprocess highlight area in overlay manager so that actual blending functions could be implemented as there were no highlight areas at all. Highlight areas are used only in dvd menus, and change quite rarely (compared to video fps). And most of overlays (substitles etc.) have no clipping areas. This probably won't have measurable effect in execution speed, but it would simplify actual blending functions, making the code easier to understand and maintain. With simpler blending functions it would make sense to have separate, differently accelerated versions (like mmx/...). I was thinking something like: 1) Combine separate palettes to one: struct vo_overlay_s { ... - uint32_t color[OVL_PALETTE_SIZE]; /* color lookup table */ - uint8_t trans[OVL_PALETTE_SIZE]; /* mixer key table */ ... - uint32_t hili_color[OVL_PALETTE_SIZE]; - uint8_t hili_trans[OVL_PALETTE_SIZE]; + uint32_t color[OVL_PALETTE_SIZE*2]; + uint8_t trans[OVL_PALETTE_SIZE*2]; Even just re-arranging palettes to be continous is enough, and preserves API compability: +union { + struct { + uint32_t color[OVL_PALETTE_SIZE]; /* color lookup table */ + uint32_t hili_color[OVL_PALETTE_SIZE]; + uint8_t trans[OVL_PALETTE_SIZE]; /* mixer key table */ + uint8_t hili_trans[OVL_PALETTE_SIZE]; + }; + struct { + uint32_t color_all[OVL_PALETTE_SIZE*2]; + uint8_t trans_all[OVL_PALETTE_SIZE*2]; + }; +}; 2) When overlay or clipping area changes, overlay manager creates a copy of RLE data and while copying it - breaks RLE elements at clipping area boundaries - for elements inside clipping area set palette index += OVL_PALETTE_SIZE - Perform possible other validations that are currently duplicated in each blending functions --> blending functions do not need to know anything about highlight areas. If there are no objections, I can prepare a patch with those changes. It should not change ABI unless palette change is made directly to vo_overlay_s and visible to overlay manager users. But public structures have been re-arranged for 1.2 anyway ... - Petri |