From: Reinhard N. <rn...@gm...> - 2006-01-29 21:44:44
Attachments:
xine-lib-startcode.patch
|
Hi, the attached patch is very noticeable on less powerful CPUs like VIA's C3 processor. Bye. -- Dipl.-Inform. (FH) Reinhard Nissl mailto:rn...@gm... |
From: Thibaut M. <thi...@gm...> - 2006-01-30 11:02:42
|
Hi Reinhard, On 1/29/06, Reinhard Nissl <rn...@gm...> wrote: > Hi, > > the attached patch is very noticeable on less powerful CPUs like VIA's > C3 processor. Try to replace memcpy by xine_fast_memcpy, that should be even better. Notice that the memcpy can be completely skipped if "current" points to the first byte after a startcode and if find_start_code finds a start code, in this case current source buffer contains a full frame. > Bye. > -- > Dipl.-Inform. (FH) Reinhard Nissl > mailto:rn...@gm... Thibaut |
From: Reinhard N. <rn...@gm...> - 2006-01-30 23:11:37
Attachments:
xine-lib-startcode2.patch
|
Hi, Thibaut Mattern wrote: >> the attached patch is very noticeable on less powerful CPUs like VIA's >> C3 processor. > > Try to replace memcpy by xine_fast_memcpy, that should be even better. > > Notice that the memcpy can be completely skipped if "current" points > to the first byte after a startcode and if find_start_code finds a > start code, in this case current source buffer contains a full frame. See updated attachment. Bye. -- Dipl.-Inform. (FH) Reinhard Nissl mailto:rn...@gm... |
From: Thibaut M. <thi...@gm...> - 2006-01-31 11:22:50
|
On 1/31/06, Reinhard Nissl <rn...@gm...> wrote: > Hi, > > Thibaut Mattern wrote: > > >> the attached patch is very noticeable on less powerful CPUs like VIA's > >> C3 processor. > > > > Try to replace memcpy by xine_fast_memcpy, that should be even better. > > > > Notice that the memcpy can be completely skipped if "current" points > > to the first byte after a startcode and if find_start_code finds a > > start code, in this case current source buffer contains a full frame. > > See updated attachment. you changed that: +=09 memcpy(mpeg2dec->chunk_ptr, data, bite); +=09 mpeg2dec->chunk_ptr +=3D bite; into that: + =09int bite =3D current - data; + =09if (bite) { +=09 memcpy(mpeg2dec->chunk_ptr, data, bite); +=09 mpeg2dec->chunk_ptr +=3D bite; + =09} Which is not what i meant, but i was not really clear. My idea is that if the decoder receives a complete frame in a contiguous buffer, then the decoder doesn't need to accumulate the frame data into a temporary buffer before decoding. I hope it's a bit clearer. In short, if mpeg2dec->chunk_ptr points to the beginning of the temp buffer and if you found a start code, then you have a full frame between "data" and "current", no ? In that case, why not setting mpeg2dec->chunk_ptr to data instead of memcpy= ? Maybe i'm wrong, i've not tested, or maybe you never get a full frame into a contiguous buffer, but if i'm right, avoiding the memcpy might have an impact. > Bye. > -- > Dipl.-Inform. (FH) Reinhard Nissl > mailto:rn...@gm... cheers, Thibaut |
From: Reinhard N. <rn...@gm...> - 2006-01-31 22:15:21
|
Hi, Thibaut Mattern wrote: >>>> the attached patch is very noticeable on less powerful CPUs like VIA's >>>> C3 processor. >>> Try to replace memcpy by xine_fast_memcpy, that should be even better. >>> >>> Notice that the memcpy can be completely skipped if "current" points >>> to the first byte after a startcode and if find_start_code finds a >>> start code, in this case current source buffer contains a full frame. >> >> See updated attachment. > > you changed that: > + memcpy(mpeg2dec->chunk_ptr, data, bite); > + mpeg2dec->chunk_ptr += bite; > > into that: > + int bite = current - data; > + if (bite) { > + memcpy(mpeg2dec->chunk_ptr, data, bite); > + mpeg2dec->chunk_ptr += bite; > + } > > Which is not what i meant, but i was not really clear. > My idea is that if the decoder receives a complete frame in a > contiguous buffer, then the decoder doesn't need to accumulate the > frame data into a temporary buffer before decoding. I hope it's a bit > clearer. Not it is clearer. > In short, if mpeg2dec->chunk_ptr points to the beginning of the temp > buffer and if you found a start code, then you have a full frame > between "data" and "current", no ? Yes. > In that case, why not setting mpeg2dec->chunk_ptr to data instead of memcpy ? Because it breaks all other code in decode.c. It's a reasonable idea but requires thorough rework of all other code. > Maybe i'm wrong, i've not tested, or maybe you never get a full frame > into a contiguous buffer, but if i'm right, avoiding the memcpy might > have an impact. Well, it's not a full frame which is considered here, but just the data between two start codes, e. g. a sequence header, group header, etc. These short pieces of data might benefit from not copying them as there is for sure a certain amount of data where xine_fast_memcpy performs bad due to alignment issues. Larger data may be found between consecutive slice start codes, but those might not fit into a single 2039 byte block and are then copied anyway. Although I agree, that it is preferable to avoid memcpys, the current gain comes from optimizing the process of searching start codes. Bye. -- Dipl.-Inform. (FH) Reinhard Nissl mailto:rn...@gm... |