|
From: Antonino D. <ad...@po...> - 2002-08-05 22:00:41
Attachments:
fb-pixmap.diff
|
With fbcon-accel and the new drawing functions in linux-2.5, console performance degraded compared to the linux-2.4 implementation. This is because putcs() has to to do 1 fb_imageblit() per character to be drawn. This can be optimized by letting putcs() initially construct the row of text to be drawn into an offscreen buffer, then do a single fb_imageblit() in the end. Performance wil increase for several reasons: 1. Drawing can be done in "bursts" instead of "trickles" 2. For drivers that support accelerated drawing functions, the offscreen buffer can be optionally placed in graphics (or AGP) memory, which is better suited for most hardware that can only do blit's from video memory to video memory. 3. Some level of asynchronicity can be achieved, ie, the hardware can be blitting while fbcon-accel is constructing bitmaps. This would require "walking" the offscreen buffer, and support for hardware graphics sync'ing on demand. I have included a patch for 2.5.27 that implements it in fbcon-accel. It's preliminary, but I have tested it with cfb_imageblit and with hardware imageblit, with buffers in System or Video memory. The code is also present for hardware syncing on demand, though unimplemented. For drivers that uses cfb_imageblit or similar, a code such as the one below can be inserted during initialization: info->pixmap.addr = (unsigned long) kmalloc(BUFFER_SIZE, GFP_KERNEL); info->pixmap.size = BUFFER_SIZE; info->pixmap.offset = 0; info->pixmap.buf_align = 1; info->pixmap.scan_align = 1; Some benchmarks: time cat /usr/src/linux/MAINTAINERS (40K text file) mode: 1024x768@8bpp, y-panning disabled. cfb_imageblit - no offscreen buffer (default) real 0m13.586s user 0m0.001s sys 0m13.585s cfb_imageblit - with offscreen buffer in system memory real 0m10.708s user 0m0.001s sys 0m10.707s hardware imageblit - no offscreen buffer real 0m6.036s user 0m0.001s sys 0m6.035s hardware imageblit - with offscreen buffer in graphics memory real 0m3.160s user 0m0.001s sys 0m3.160s hardware imageblit - graphics offscreen buffer + hardware sync on demand real 0m1.843s user 0m0.000s sys 0m1.843s Tony |
|
From: Geert U. <ge...@li...> - 2002-08-06 20:09:18
|
On 6 Aug 2002, Antonino Daplas wrote:
> With fbcon-accel and the new drawing functions in linux-2.5, console
> performance degraded compared to the linux-2.4 implementation. This is
> because putcs() has to to do 1 fb_imageblit() per character to be
> drawn.
Yes, this will be shown badly after I'll have ported amifb to the new
framework, since chip RAM accesses are very slow and we use bitplanes...
> This can be optimized by letting putcs() initially construct the row of
> text to be drawn into an offscreen buffer, then do a single
> fb_imageblit() in the end. Performance wil increase for several
> reasons:
Yes, this is very nice! I was thinking about passing an array of images to an
fb_imageblit_multiple() or so, but yours may be better.
> For drivers that uses cfb_imageblit or similar, a code such as the one
> below can be inserted during initialization:
>
> info->pixmap.addr = (unsigned long) kmalloc(BUFFER_SIZE, GFP_KERNEL);
> info->pixmap.size = BUFFER_SIZE;
> info->pixmap.offset = 0;
> info->pixmap.buf_align = 1;
> info->pixmap.scan_align = 1;
>
> Some benchmarks:
>
> time cat /usr/src/linux/MAINTAINERS (40K text file)
> mode: 1024x768@8bpp, y-panning disabled.
[...]
Just for reference, did you run this benchmark on 2.4.x as well?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li...
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
|
|
From: Antonino D. <ad...@po...> - 2002-08-07 00:13:27
|
On Wed, 2002-08-07 at 04:08, Geert Uytterhoeven wrote:
>
> Just for reference, did you run this benchmark on 2.4.x as well?
>
> Gr{oetje,eeting}s,
>
Sort of. The functions in fbcon-cfb*.c are already very fast, because
fbcon and character drawing are tightly integrated together, and
fbcon_cfb8_putcs() is very, very efficient, processing 4 bits per
iteration, instead of 1. I'm getting numbers like this:
real 0m2.098s
user 0m0.000s
sys 0m2.070s
which was faster(!) than my hardware implementation of putcs, and 5x
faster than 2.5. Since I'm using an i810 with Video in System RAM,
direct framebuffer access does not carry much overhead. I just have to
beat fbcon-cfb8, so I thought of placing text data in offscreen graphics
memory to take full advantage of hardware blitting.
At high bit depths (32 bpp), 2.5 with an offscreen buffer is as fast as
2.4.
Tony
|
|
From: Antonino D. <ad...@po...> - 2002-08-07 05:21:51
Attachments:
cfbimgblt.c
|
One of the reason why 2.4 console performance is good especially at low bit depths is its ability to process more than 1 pixel per iteration and its usage of mask arrays. I tried to generalize the above in cfbimgblt.c by incorporating the idea in fbcon-cfb*.c. It's significantly faster but still not as fast as the 2.4 API. time cat /usr/src/linux/MAINTAINERs (40K text file) 1024x768-8bpp, y-panning disabled 2.5 old (with offscreen buffers) real 0m10.708s user 0m0.001s sys 0m10.707s 2.5 new real 0m4.378s user 0m0.002s sys 0m4.375s 2.4 real 0m2.098s user 0m0.000s sys 0m2.070s I've only tested the implementation at 8, 16, 24, and 32 bpp. 24bpp is slightly slower than 32 bpp :( Tony |