From: Sottek, M. J <mat...@in...> - 2001-10-15 17:24:10
|
Michael, I need a refresher, what are you trying to do again? Seems from this information that you are just trying to get the bt848 data to the Overlay without using up too much cpu? If that is the case then you should be using the video4linux interface. Xawtv does this, and I watch TV on my i815 without using up any cpu. There is no memcpy in the path between the bt848 and the Overlay when using v4l. I think this developed out of a video capture discussion which requires that an application get the data out of the bt848 write it to disk, then get the data into the overlay... that indeed uses a memcpy since you are using Xv directly. If you want to, as has been discussed on this list, get rid of that memcpy and replace it with some DMA equivalent something like this has to happen. Reserve a set of GTT (agpgart) pages for general magic mapping. When XvShmPutImage gets into the X server you have to get each page from the Shm region and map it into those magic Gtt pages such that they appear linear. They can be left as cacheable. Turn off Ring buffer arbitration Flush the pipeline. Blit from the magic region to the Overlay. Flush again. Turn arbitration back on. Unmap the Shm region from the magic Gtt pages. Flip the overlay The arb-off,flush,blit,flush,arb-on are not immediate actions they are commands which need to be placed in the ring buffer and they will happen asynchronously. This causes an additional problem in that you can't issue the overlay flip until the copy is finished (no problem, use the overlay flip instruction instead of the register). You still have one more issue to resolve. When you enter the PutImage function you need to set up the overlay parameters, and you can't touch those registers unless the last flip has finished. I think the current bit checking will still work for that. OK now that I wrote it down it doesn't look that bad. The hard part is in the agpgart code. I don't even see a drm dependency. You just need an agpgart function that takes a user address and size and maps the pages into the gart and returns the gart address. Does such an animal exist? -Matt -----Original Message----- From: Michael Zayats [mailto:zmi...@po...] Sent: Saturday, October 13, 2001 2:17 PM To: xpert@XFree86.Org Cc: dri...@li...; Sottek, Matthew J Subject: Re: [Xpert]XVideo (memcoy) consuiming to much CPU (i810) well back to our cows... I get frames from bt848 at 25 fps - F_CIF (710x576 YUV420 i.e. 12 bits) -0% cpu usage if I just discard them. since I get them in mmap'ed driver area and not shared memory, I use single memcpy to copy them to one previously allocated shared memory - 25% CPU time. Now XvShmPutImage - 50% CPU - pretty predictable since it also does memcpy's for the same buffer no compression goes in a middle. BTW it goes very well with observation that 250 loops of memcpy(...); usleep(30000) take exactly 10 seconds meaning that memcpy takes 10 milliseconds. multiplying by 25 = 250ms -> 25% putting DMA might save about 25%... another 2 questions: 1) may be I should just use some optimized version of memcpy? someone knows of MMX or SSI uses in glibc? I have very defined hardware to run on... 2) offtopic: does somebody know how to access shared memory from kernel space ( may be I will fix bttv driver to write directly to shared memory, this will save me another 25%...)? any help? ----- Original Message ----- From: Sottek, Matthew J <mat...@in...> To: 'Michael Zayats' <zmi...@po...> Cc: <xpert@XFree86.Org> Sent: Tuesday, October 09, 2001 5:39 PM Subject: RE: [Xpert]XVideo consuiming to much CPU > Michael, > If you are only able to get 25fps then there is something wrong > in your application. I know of Xv based mpeg decoders that can > do full DVD sized frames at 30fps without issue, and the vast > majority of the cpu is taken up with the mpeg decode not the > transfer. I myself have done Xv tests that can peg the framerate > at 99fps when the vertical retrace is 100 (this was with a smaller > 320x200 mpeg1 stream) This was with a modest PIII cpu. > The bottom line is this, doing a blit from system memory to the > framebuffer or some other DMA transfer could offload a little bit > of cpu usage, but it isn't going to make anything "faster" the > overlay can only flip buffers on vertical retrace and even a > slow cpu should be able to keep up. Using the blit/DMA you will > then either have to wait for the transfer to complete or have > something else poll to find out when the transfer completes and > then flips the overlay. That makes a mess of a pretty simple > problem, all to save a little cpu. > Keep in mind that the memcpy isn't really that bad on i810 > since it is sharing memory bandwidth with the system instead > of actually being behind a pci bus. > > -Matt > > -----Original Message----- > From: Michael Zayats [mailto:zmi...@po...] > Sent: Tuesday, October 09, 2001 3:05 AM > To: Mark Vojkovich > Cc: xpert@XFree86.Org > Subject: Re: [Xpert]XVideo consuiming to much CPU > > > > > > The i810 driver will not display video faster than the vertical > > retrace. If you send frames faster than that, it will busy wait > > until the next retrace. What you are seeing is the expected behavior > > on i810. > > I send 25fps and as Peter already mentioned (and I checked it) it's > because of memcpy use instead of DMA in XVideo i810 driver > > > > > > > Mark. > > > > _______________________________________________ > > Xpert mailing list > > Xpert@XFree86.Org > > http://XFree86.Org/mailman/listinfo/xpert > > > > _______________________________________________ > Xpert mailing list > Xpert@XFree86.Org > http://XFree86.Org/mailman/listinfo/xpert > |