Re: [Dri-devel] 16bpp span functions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

ra...@ra... wrote:
> 
> just thoguht i'd mention - you'll bypass the cmp's and jne/jeq' branches
> and get some better performance with:
> 
> #define READ_RGBA( rgba, _x, _y )                       \
> do {                                                    \
> GLushort p = *(GLushort *)(read_buf + _x*2 + _y*pitch); \
> rgba[0] = ((p >> 8) & 0xf8) | ((p >> 13) & 0x7);        \
> rgba[1] = ((p >> 3) & 0xfc) | ((p >> 9)  & 0x3);        \
> rgba[2] = ((p << 3) & 0xf8) | ((p >> 2)  & 0x7);        \
> rgba[3] = 0xff;                                         \
> } while (0)
> 
> now even smarter is do 2 pixels at once and just do alignment/single
> pixel cleanups either end of the span if tis not a multipel of 2 or not
> aligned to 2 pixel boundaries...
> #define READ_2_RGBA( rgba, _x, _y )                     \
> do {                                                    \
> GLuint p = *(GLushort *)(read_buf + _x*2 + _y*pitch);   \
> GLuint r, g, b; \
> r = ((p >> 8) & 0xf8) | ((p >> 13) & 0x7); \
> g = ((p >> 3) & 0xfc) | ((p >> 9)  & 0x3); \
> b = ((p << 3) & 0xf8) | ((p >> 2)  & 0x7); \
> rgba[0] = r & 0xff; \
> rgba[1] = g & 0xff; \
> rgba[3] = b & 0xff; \
> rgba[4] = 0xff; \
> rgba[5] = (r >> 16) & 0xff; \
> rgba[6] = (g >> 16) & 0xff; \
> rgba[7] = (b >> 16) & 0xff; \
> rgba[8] = 0xff; \
> } while (0)

Software fallbacks are slow for so many other reasons that this just
isn't worth it (well, maybe the first option).  Hell, we could do the
whole thing in assembly and see a 0% speedup...

The whole point of that macro is that it reads individual pixels.  Thus,
you can't just go ahead and read two pixels.

The software fallback mechanism in Mesa could be made a lot faster, but
the core contributors certainly have better things to do.  It would
require significantly more work than changing a pixel reading macro.  If
your hardware can do it blitting scanlines or even rectangles would be
much faster, but it's arguable whether this is worth it as the main
point of software fallbacks is correctness not performance.

-- Gareth