|
From: James B. <ja...@ex...> - 2000-04-27 01:39:32
|
Erik Walthinsen wrote: > > On Wed, 26 Apr 2000, James Bowman wrote: > > > Sorry, this code will not run as is, it requires changes to vlc.h and > > vlc.c in order to work. I can check it all in together right now... the > > performance gain is modest - benchmark goes from 40.3s to 37.3s. > > How did you incorporate the asm version? I put it in vlc_asm.S and added > an ifdef around the C version, as well as turned off the INLINE flag for > the VLC stuff. Yes, I put it in vlc_x86.S, removed the inline definitions from vlc.h, and ifdefed out the C version with USE_ASM_FOR_VLC. > Yeah, go ahead and check it in. Cool. I looked at the profile and it seems that the gains have been modest because of more time in the parser. Possibly it was unrolling the block decode loop to some extent and scheduling instructions. The decode process is very serial right now, with branch mispredicts in lots of places. So I've been thinking about what would be the ideal implementation for dv_parse_ac_coeffs_pass0(). This function seems to take about 29% of execution time currently. What's your plan for the "MMX getbits" task? Reading an527, it looks very branchy. -- James Bowman ja...@ex... |