From: Francesc A. <fa...@py...> - 2004-11-10 10:16:21
|
Hi again, I've been looking deep into the problem, and it seems like I have a solution. The problem was that I had a mistake when I was implementing indexation, and the parameters for EArray chunk size computation remains for my early tests for optimizing chunksizes just for indexes. After that, I've moved the computation for optimum index chunksizes out of EArray module, but I forgot to restablish the correct values for general EArrays :-/ Check with the next patch (against original 0.9 sources): --- /home/falted/PyTables/exports/pytables-0.9/tables/EArray.py 2004-10-05 14:30:31.000000000 +0200 +++ EArray.py 2004-11-10 11:08:22.000000000 +0100 @@ -224,7 +224,7 @@ #bufmultfactor = int(1000 * 2) # Is a good choice too, # specially for very large tables and large available memory #bufmultfactor = int(1000 * 1) # Optimum for sorted object - bufmultfactor = int(1000 * 1) # Optimum for sorted object + bufmultfactor = int(1000 * 100) # Optimum for sorted object rowsizeinfile = rowsize expectedfsizeinKb = (expectedrows * rowsizeinfile) / 1024 That should get the 0.8.1 compression ratios back. You can play with increasing the bufmultfactor still more, and you will get better ratios, but I'm afraid that this will make the access to small portions of the EArray slower (much more data should be read compared with the desired range). Please, tell me about your findings and I'll fix that in CVS afterwards. Cheers, -- Francesc Altet |