It would be really nice to get the scalar assembly to work in libdv as this algorithm is less destructive than the MMX IDCT algorithm. The primary use of DV is just to import footage in the computer, not for persistant storage or intermediate stages, and as a result the speed of the DV decoder is virtually irrelevant. It's the speed of the uncompressed intermediates which matters.
You want the highest quality at the import stage and getting this quality usually requires scalar methods. LibDV doesn't seem to weight the coefficients properly in the scalar routines.