Re: [Pytables-users] compression in 0.9

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi again,

I've been looking deep into the problem, and it seems like I have a
solution. The problem was that I had a mistake when I was implementing
indexation, and the parameters for EArray chunk size computation remains for
my early tests for optimizing chunksizes just for indexes. After that, I've
moved the computation for optimum index chunksizes out of EArray module, but
I forgot to restablish the correct values for general EArrays :-/

Check with the next patch (against original 0.9 sources):

--- /home/falted/PyTables/exports/pytables-0.9/tables/EArray.py 2004-10-05 14:30:31.000000000 +0200
+++ EArray.py   2004-11-10 11:08:22.000000000 +0100
@@ -224,7 +224,7 @@
         #bufmultfactor = int(1000 * 2) # Is a good choice too,
         # specially for very large tables and large available memory
         #bufmultfactor = int(1000 * 1) # Optimum for sorted object
-        bufmultfactor = int(1000 * 1) # Optimum for sorted object
+        bufmultfactor = int(1000 * 100) # Optimum for sorted object

         rowsizeinfile = rowsize
         expectedfsizeinKb = (expectedrows * rowsizeinfile) / 1024

That should get the 0.8.1 compression ratios back. You can play with
increasing the bufmultfactor still more, and you will get better ratios, but
I'm afraid that this will make the access to small portions of the EArray
slower (much more data should be read compared with the desired range).

Please, tell me about your findings and I'll fix that in CVS afterwards.

Cheers,

-- 
Francesc Altet