So the file I just checked in does a simple benchmark - testbitstream.c
- that extracts bits from a 10k buffer. Comparing the current
implementation with the simpler new one, I get the following
old: 29.0s
new: 20.7s
This isn't really surprising - the new code doesn't have any branches,
and doesn't do any swab()s on the input stream. It's true that the new
version does more shifts than the old version, but on PentiumII shifts
are 1uop each, so the cost is more than offset by the benefit of zero
branch mispredicts.
Running the two versions in the playdv benchmark mode (with
dv_parse_ac_coeffs disabled because of the unget issue), I get
old: 33.2
new: 32.2
--
James Bowman
ja...@ex...
|