|
From: James B. <ja...@3d...> - 2000-04-28 18:49:04
|
So the file I just checked in does a simple benchmark - testbitstream.c - that extracts bits from a 10k buffer. Comparing the current implementation with the simpler new one, I get the following old: 29.0s new: 20.7s This isn't really surprising - the new code doesn't have any branches, and doesn't do any swab()s on the input stream. It's true that the new version does more shifts than the old version, but on PentiumII shifts are 1uop each, so the cost is more than offset by the benefit of zero branch mispredicts. Running the two versions in the playdv benchmark mode (with dv_parse_ac_coeffs disabled because of the unget issue), I get old: 33.2 new: 32.2 -- James Bowman ja...@ex... |