From: James S. <jsi...@tr...> - 2001-10-02 22:50:49
|
> > Well the reason the framebuffer suck is because the current api sucks for > > them. It draws pixel by pixel. Slow slow slow!!! I have developed a new > > api that takes advantage of the accel engine of graphics hardware. It is > > Great. VESAfb doesnt have one. Lots of older machines dont have one. True. Of course VESAfb exist because we lack so fbdev drivers. In time that problem should go away. Also many embedded devices, which I do for a living, lack hardware acceleration. Well okay alot of modern PDA's are staring to have accel engines. This doesn't mean you can't write really good optimized software code for devices that lack hardware acceleration. The software accel functions needed by the console layer (copyarea, fillrect, and drawimage) have been already written. Okay the drawimage one needs alot of work. I haven't benchmarked the new code versus the current code but you can see the difference. One of the big changes I have have made is that on write data to the framebuffer word aligned and a long at a time. For 8bpp you have 4 pixels written at a time. This makes for a much tigher loop. On ix86 you can see a huge difference in performance due to the word alignment. I knwo because at first I had a bug that wasn't doing it right. After I fix that bug you could see the difference. We use the same techniques at work for embedded devices where the cpus don't have the horse power like desktops. Every single line of code counts. I haven't ported the assembly versions for different platforms yet but I plan to. I know from experience writing proper assembly on ARM or using MMX will increase preformance many fold. |
From: Alan C. <al...@lx...> - 2001-10-02 22:54:00
|
> The software accel functions needed by the console layer (copyarea, > fillrect, and drawimage) have been already written. Okay the drawimage one > needs alot of work. I haven't benchmarked the new code versus the current On x86 they'll probably make no difference at all, unless the old code is really really crap. Your bottleneck is the PCI bus. All you can do is avoid reads. Alan |
From: James S. <jsi...@tr...> - 2001-10-02 23:14:01
|
> On x86 they'll probably make no difference at all, unless the old code > is really really crap. Your bottleneck is the PCI bus. All you can do is > avoid reads. True. We have discussed the idea of placing the fonts into video memory instead of system memory if the graphics card has room. At first I didn't like the idea since handling scrolling would become more difficult. It can be done tho with enough "tricks". I think it should be up to the driver write where he/she can place the font image. This case drawimage becomes copyarea except you grabbing off screen data. I have some thinking about how to handle that. |
From: Benjamin H. <be...@ke...> - 2001-10-03 10:20:05
|
>> The software accel functions needed by the console layer (copyarea, >> fillrect, and drawimage) have been already written. Okay the drawimage one >> needs alot of work. I haven't benchmarked the new code versus the current > >On x86 they'll probably make no difference at all, unless the old code >is really really crap. Your bottleneck is the PCI bus. All you can do is >avoid reads. Well, there are indeed a few improvements to get with machine specific optimisations on unaccelerated framebuffer. One example is, on PPC, the use of a floating point register to do the blits 64 bits at a time. This allow the PCI host controller to generate bursts of 2 32 bits transactions (for machines with controllers unable to write combine). Of course, having such optimisations in the kernel is tricky because of the lazy FPU switching (well, at least on PPC), but the point is that improvement _is_ possible. Regards, Ben. |
From: James S. <jsi...@tr...> - 2001-10-03 16:58:53
|
> Well, there are indeed a few improvements to get with machine specific > optimisations on unaccelerated framebuffer. [snip]... Neat trick. Please note also that no read operations to the framebuffer are done by the fbcon layer. Such reads should be to the shadow buffers (vc_screenbuffer) instead. Reading the framebuffer is a userland operation and as such you really only tricks for reading in userland. |
From: Paul M. <pm...@mv...> - 2001-10-03 17:27:17
|
On Wed, Oct 03, 2001 at 09:58:30AM -0700, James Simmons wrote: > > Well, there are indeed a few improvements to get with machine specific > > optimisations on unaccelerated framebuffer. > [snip]... >=20 > Neat trick. Please note also that no read operations to the framebuffer > are done by the fbcon layer. Such reads should be to the shadow buffers > (vc_screenbuffer) instead. Reading the framebuffer is a userland operation > and as such you really only tricks for reading in userland.=20 >=20 And while we're on the subject of architecture specific optimizations for unaccelerated framebuffers (or framebuffers in general for that matter), on SH4 you can remap the video memory area through a store queue and perform all writes through the remapped store queue area (there are two store queue= s, each are 32bytes, and are flushed to the memory they were mapped to on a prefetch instruction). This allows for very high speed writes to external memory, as it was designed for. Regards, --=20 Paul Mundt <pm...@mv...> MontaVista Software, Inc. |
From: Geert U. <ge...@li...> - 2001-10-04 07:43:43
|
On Tue, 2 Oct 2001, James Simmons wrote: > > > Well the reason the framebuffer suck is because the current api sucks for > > > them. It draws pixel by pixel. Slow slow slow!!! I have developed a new ^^^^^^^^^^^^^^^^^^^^^^^ Where does it draw pixel by pixel? > > > api that takes advantage of the accel engine of graphics hardware. It is > > > > Great. VESAfb doesnt have one. Lots of older machines dont have one. > > True. Of course VESAfb exist because we lack so fbdev drivers. In time Yep. Vesafb started as a nice gimmick to show that it's possible, and turned out to be a solution for yet another we-don't-release-specs-to-OS/FS-people company. > The software accel functions needed by the console layer (copyarea, > fillrect, and drawimage) have been already written. Okay the drawimage one > needs alot of work. I haven't benchmarked the new code versus the current > code but you can see the difference. One of the big changes I have have > made is that on write data to the framebuffer word aligned and a long at > a time. For 8bpp you have 4 pixels written at a time. This makes for a > much tigher loop. On ix86 you can see a huge difference in performance due > to the word alignment. I knwo because at first I had a bug that wasn't > doing it right. After I fix that bug you could see the difference. Euh, most fbcon-* drivers already do this. Grep for fb_write in e.g. drivers/video/fbcon-cfb8.c and count the byte accesses (=> 0). Gr{oetje,eeting}s, Geert P.S. Not to criticize the development in the Ruby tree of the linux-console project, but I don't like facts that aren't true. -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li... In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds |
From: James S. <jsi...@tr...> - 2001-10-04 16:50:07
|
> > > > them. It draws pixel by pixel. Slow slow slow!!! I have developed a new > ^^^^^^^^^^^^^^^^^^^^^^^ > Where does it draw pixel by pixel? Okay. Let me say most drivers don't take advantage of the graphics hardware to perform console operations. Instead they just draw directly to the framebuffer which can be slow. > Yep. Vesafb started as a nice gimmick to show that it's possible, and turned > out to be a solution for yet another > we-don't-release-specs-to-OS/FS-people company. I know. Same with OFfb. > Euh, most fbcon-* drivers already do this. Grep for fb_write in e.g. > drivers/video/fbcon-cfb8.c and count the byte accesses (=> 0). Yep. The new code I developed came out the merging of all the fbcon-cfb* drivers. |