linuxcompressed-devel Mailing List for Linux Compressed Cache (Page 6)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

John Moser wrote:
> 
> A quick test on the first 4096 bytes of each of /dev/urandom and
> /lib/libc.so.6 gives:
> 

A bunch of numbers for libc that are only a few digits away.  I've
written a more optimized version that processes 2 longs at a time to
save cycle count by about 8 times on 32-bit and 16 times on 64-bit.

Output below:

bluefox@icebox:/tmp/x$ ./mcarlo_optimal
hits = 401, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  1.138932e+02
        Nulls:  22

hits = 391, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  1.150680e+01
        Nulls:  14

hits = 384, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  7.062513e+00
        Nulls:  20

hits = 413, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  1.176888e+01
        Nulls:  21

hits = 394, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  1.575606e+01
        Nulls:  11

hits = 392, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  1.264340e+01
        Nulls:  18

hits = 416, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  9.224467e+00
        Nulls:  11

hits = 407, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  2.625027e+01
        Nulls:  31

hits = 397, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  2.498117e+01
        Nulls:  13

hits = 396, shots = 512
Final Calculations for /dev/urandom:
        Q.Idx:  2.090185e+01
        Nulls:  18

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2341

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  3448

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2646

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2194

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2022

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2032

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2019

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2029

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2025

hits = 512, shots = 512
Final Calculations for /lib/libc.so.6:
        Q.Idx:  1.164948e+00
        Nulls:  2008

Notice /dev/urandom still gets over 7 and usually over 10, 20, or even
over 100 for a Quality Index; while libc6 still never touches 2.
Instead of comparing 2 bytes, this code compares longs, which (in this
example) groups 4 bytes together and analyzes them to determine a hit or
miss.  As such, the slight variations in libc6 smooth out and we
basically hit all the time.

In case you're wondering, what we're 'hitting' is a circle.  This
algorithm mathematically inscribes a circle in a square; cuts the square
down to the top-right corner; and tests where random values fall.

 ________
| ..--.. |<-- square
|/      \|
|    X   |  X = Inside Circle
|        |
|\      /|
|_--__--_|
 ____
|-.. |<-- Top-right quadrant of square
| X \|
|____|  X = inside circle

If they fall a distance of 1.0 or less from bottom-left of this
quadrant, they're in the circle; if not, they're outside the circle.
The number of hits inside the circle over the total shots gives the
value of pi; we can approximate this by calculating hits/shots for the
quadrant and multiplying by 4.

Simple monte-carlo algorithm.

The optimized version processes 8 or 16 bytes at a time using CPU-native
words; and uses one pass of AND comparisons and bit shifts to test how
many NULL bytes are processed.  There are a lot of pointer dereferences;
the compiler will cache these for us so they won't cost cycles.

Note that in practice we can probably discard the entire monte-carlo
calculation, since having a lot of NULs will probably be enough to
accept the use of Wilson-Kaplan family algorithms over Lempil-Ziv.
Using the monte-carlo based quality index is not good as a general
compression index; compressed data has a high Q.Idx despite not being
compressible.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (6)	Nov (1)	Dec (11)
2002	Jan (22)	Feb (11)	Mar (31)	Apr (19)	May (17)	Jun (9)	Jul (13)	Aug (1)	Sep (10)	Oct (4)	Nov (10)	Dec (4)
2003	Jan	Feb (8)	Mar	Apr (5)	May (39)	Jun (10)	Jul (2)	Aug (1)	Sep (1)	Oct (27)	Nov (1)	Dec (2)
2004	Jan	Feb (3)	Mar (1)	Apr	May	Jun (3)	Jul (1)	Aug	Sep	Oct	Nov (3)	Dec
2005	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (9)	Dec (2)
2006	Jan (7)	Feb (4)	Mar (12)	Apr (16)	May (11)	Jun (48)	Jul (19)	Aug (16)	Sep (13)	Oct	Nov (8)	Dec (1)
2007	Jan (4)	Feb	Mar	Apr (3)	May (26)	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (7)	Mar (5)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

linuxcompressed-devel Mailing List for Linux Compressed Cache (Page 6)

linuxcompressed-devel — Linux Compressed Cache Development Mailing List