Erik Walthinsen wrote:
>
> On Wed, 26 Apr 2000, James Bowman wrote:
>
> > Sorry, this code will not run as is, it requires changes to vlc.h and
> > vlc.c in order to work. I can check it all in together right now... the
> > performance gain is modest - benchmark goes from 40.3s to 37.3s.
>
> How did you incorporate the asm version? I put it in vlc_asm.S and added
> an ifdef around the C version, as well as turned off the INLINE flag for
> the VLC stuff.
Yes, I put it in vlc_x86.S, removed the inline definitions from vlc.h,
and ifdefed out the C version with USE_ASM_FOR_VLC.
> Yeah, go ahead and check it in.
Cool.
I looked at the profile and it seems that the gains have been modest
because of more time in the parser. Possibly it was unrolling the block
decode loop to some extent and scheduling instructions. The decode
process is very serial right now, with branch mispredicts in lots of
places.
So I've been thinking about what would be the ideal implementation for
dv_parse_ac_coeffs_pass0(). This function seems to take about 29% of
execution time currently.
What's your plan for the "MMX getbits" task? Reading an527, it looks
very branchy.
--
James Bowman
ja...@ex...
|