From: Gareth H. <ga...@va...> - 2000-12-30 03:06:17
|
ra...@ra... wrote: > > just thoguht i'd mention - you'll bypass the cmp's and jne/jeq' branches > and get some better performance with: > > #define READ_RGBA( rgba, _x, _y ) \ > do { \ > GLushort p = *(GLushort *)(read_buf + _x*2 + _y*pitch); \ > rgba[0] = ((p >> 8) & 0xf8) | ((p >> 13) & 0x7); \ > rgba[1] = ((p >> 3) & 0xfc) | ((p >> 9) & 0x3); \ > rgba[2] = ((p << 3) & 0xf8) | ((p >> 2) & 0x7); \ > rgba[3] = 0xff; \ > } while (0) > > now even smarter is do 2 pixels at once and just do alignment/single > pixel cleanups either end of the span if tis not a multipel of 2 or not > aligned to 2 pixel boundaries... > #define READ_2_RGBA( rgba, _x, _y ) \ > do { \ > GLuint p = *(GLushort *)(read_buf + _x*2 + _y*pitch); \ > GLuint r, g, b; \ > r = ((p >> 8) & 0xf8) | ((p >> 13) & 0x7); \ > g = ((p >> 3) & 0xfc) | ((p >> 9) & 0x3); \ > b = ((p << 3) & 0xf8) | ((p >> 2) & 0x7); \ > rgba[0] = r & 0xff; \ > rgba[1] = g & 0xff; \ > rgba[3] = b & 0xff; \ > rgba[4] = 0xff; \ > rgba[5] = (r >> 16) & 0xff; \ > rgba[6] = (g >> 16) & 0xff; \ > rgba[7] = (b >> 16) & 0xff; \ > rgba[8] = 0xff; \ > } while (0) Software fallbacks are slow for so many other reasons that this just isn't worth it (well, maybe the first option). Hell, we could do the whole thing in assembly and see a 0% speedup... The whole point of that macro is that it reads individual pixels. Thus, you can't just go ahead and read two pixels. The software fallback mechanism in Mesa could be made a lot faster, but the core contributors certainly have better things to do. It would require significantly more work than changing a pixel reading macro. If your hardware can do it blitting scanlines or even rectangles would be much faster, but it's arguable whether this is worth it as the main point of software fallbacks is correctness not performance. -- Gareth |