Re: [E-devel] Evas image (YUV colorspace) stride bug in GL engine

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, 30 May 2007 11:54:22 -0400 Jason Tackaberry <ta...@ur...> babbled:

> On 2007-05-30 01:55, Carsten Haitzler (The Rasterman) wrote:
> > thats interesting. i swear i handled this properly. when videos are of ODD
> > width size this would happen - mplayer and xine both had the bug but i
> > made
> > that work in emotion/evas - the code literally checks to see if the plane
> > pointers do all line up continuously and if so shortcuts the texture
> > upload to
> > a singly put call - if not it needs to do 1 put per line. let me check
> > when i
> > get home.
> 
> Yeah I was looking through this code yesterday (to do some profiling)
> and I noticed some code commented out that looks like it should do
> this.  And at the top:
> 
>    /* FIXME: should use subimage */

i think that fixme is old... the code does use subimage... glTexSubImage2D to
be exact. note that there is commented out code for uploading the texture line
by line - i uses gl's own stride walking (glPixelStorei). this of course
assumes that the delta between the first 1 rows of y remains constant through
the image - as well as the delta between the first 2 rows of the "u" plane are
the same for u and v.

> However the calls to glPixelStorei(GL_UNPACK_ROW_LENGTH, ...) look like
> they should handle the case where stride != width, at least assuming I'm
> correctly interpreting what they do.  So I'm not sure why I saw what I
> saw, but I'll dig into it a bit further and see if I can find the cause
> and a fix.

well other than the above "assumptions" this should work handling stride !=
width.

> > it should be ^ given any fragment-shader glsl capable card. i've found
> > an nv
> > 6xxx and on is needed for this ,
> 
> Do you have a sense how Intel GMA should behave?

no idea - but given intel's history, worse than nvidia. i.e. - slower.

> > but its enough to play full hd video on a
> > fairly mediocre machine. software also works - just its going to
> > consume more
> > cpu. it does make evas viable for a full video ui including the video
> > itself
> > any osd elements on top of the video or under it etc - i.e. - rage. :)
> 
> I'm currently struggling with this.  For Freevo, we're not using
> emotion, but rolling our own approach that is not only player agnostic
> (supports mplayer, xine, gstreamer -- although I see now emotion
> supports gstreamer as well), but, much more importantly, has the video
> run out of process and delivers the frame to shared memory.  This is
> because video players have a tendency of crashing, and we don't like the
> idea that playing a corrupt video may crash the UI. :)

yeah. i noticed that with libxine - it likes to crash once you've played a few
feature length movies and started, stopped, fast-forwarded, jumped around etc.
i'd consider putting the xine module into a slave process with shmem too.
gstreamer too maybe. technically emotion is player agnostic - just put int a
module for whatever decoder/video provider you like. not dissimilar to what you
have done - just all hidden behind a wrapping api.

> Given a 1080p50 video, the evas process (which just grabs the frames
> from shared memory [upon notification by fifo] and passes the buffer to
> evas_image_data_set) consumes about 15-20% CPU (on my E6600). 

this will mostly be texture uploads i imagine. in my tests evas actually
consumes LESS cpu than mplayer and xine - if you use the opengl output in
xine/mplaye (tested with 1920x1080 video - Serenity). xine's gl output is buggy
though.

rage    x11: 47.2%us, 18.8%sy,  0.0%ni, 33.8%id
rage     gl: 27.8%us,  0.8%sy,  0.0%ni, 71.0%id

mplayer x11: 28.9%us,  0.0%sy,  0.0%ni, 70.8%id
mplayer  gl: 36.9%us,  0.3%sy,  0.0%ni, 62.4%id
mplayer  xv: 19.0%us,  0.2%sy,  0.0%ni, 80.3%id

xine    x11: 35.6%us,  0.5%sy,  0.0%ni, 63.9%id
xine     gl: 46.8%us,  9.3%sy,  0.0%ni, 43.3%id (BUGGY!!!)
xine     xv: 17.9%us,  0.7%sy,  0.0%ni, 81.1%id

with rage + gl we get 30% more cpu idle time than mplayer or xine + gl. xvideo
still is better though. i suspect the way gl uploads textures vs. xvideo reads
the shm yuv buffer are different. (xv uses dma maybe?). in principle they
should be about the same though - cpu-side. on the video card size gl has the
EXTRA work of writing to a gl back-buffer THEN copying the backbuffer to the
screen. xv (with my nv card where i have forced it to not use the overlay but
write to the framebuffer) can skip the gl backbuffer and write direct to the
frontbuffer. that might explain the difference here - but that's the price we
pay for the rest of the canvas pipeline. i guess in theory it'd be possible to
skip the backbuffer in special circumstances (video is 100% solid and not
overlayed with any objects that are visible).

> Furthermore, while the evas process is working, the xine process
> actually consumes an additional 10-15% cpu, which is particularly
> confusing as the vo thread sleeps while waiting for evas to finish render().
> 
> The only thing that makes sense to me is the process of uploading the
> data to the video card (via glTexSubImage2D calls) pollutes the L2
> cache, resulting in an increase in cache misses inside xine engine,
> which is either caused or exacerbated by its multithreadedness.  (This
> 10-15% increase does not happen with MPlayer.)  When I suspend the evas
> renderer process, the xine process returns to normal usage, even though
> it's doing the same amount of work.  So increased cache misses is the
> best hypothesis I have.

interesting.

> But 15-20% of just evas overhead is a bit too high for comfort,
> especially considering when playing the video via Xv, the overhead for
> the VO is barely measurable.  15-20% CPU for basically uploading
> textures means that much less CPU is available for decoding and
> postprocessing, and when you're dealing with 1080p video, you want as
> much free CPU as possible. :)

what is the xserver using? remember with xv the xserver is doing the upload.
with gl the client (evas) is doing it (x isn't doing any heavy lifting). what
matters is overall performance - as per the chart above :) and above is also my
suspicion on why more cpu is used.

> For 1080p, the call to evas_gl_common_ycbcr601pl_texture_update (on my
> nvidia 7100, Intel E6600) takes on average around 3200 usec.  At 50 or
> 60fps (deinterlaced PAL or NTSC), this adds up fast.  Unfortunately
> pretty much everything in this function is GL calls, which means there's
> not much room for tweaking. Unless a new approach entirely is used
> (maybe pbuffer)?

unlikely. as best i know pbuffers don't let you use existing memory directly.
you still need to copy TO them as they are video card managed resource, thus i
can't just zero-copy create a yuv texture set DIRECTLY from the yuv video data.
also pbuffers get speed hits. as i said before i suspect the problem is the
extra back-buffer copy.

> The second problem is that with xine's tvtime plugin, frames are
> delivered in YUY2, for which there is currently no hardware accelerated
> colorspace conversion. :)

yuy2 is going to be much harder to deal with. the yuv fragment shader is nice
and simple with 3 textures. yuy2 is going to be much more evil. i'll need a
software converter too. for now it's just easier to convert the yuy2 in
software from yuy2 to planar yuv. :/

> > unknown issue. i'll take a look. what is the FULL color you are
> > setting it to
> > though?
> 
> Ok, that was the question to ask.  The API I was using wrapped color_set
> and color_get to premultiply (and unpremul) the color values.  This used
> to be necessary, but apparently it isn't anymore.  Oops. :)
> 
> Cheers,
> Jason.
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> enlightenment-devel mailing list
> enl...@li...
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
> 

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ra...@ra...
裸好多
Tokyo, Japan (東京 日本)