From: Doug B. <dou...@gm...> - 2011-12-31 14:13:25
|
On Sat, Dec 31, 2011 at 8:23 AM, jerome <rom...@ya...> wrote: > Hi, > > > On this end of year I find something interesting by backporting a improvement made by Doug some months ago on trunk. > > http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 > > I was cleaning my data, I made backups and patched the 'non-idempotent XML handling' on XML export for current stable release... It was for my own use and for testing data migration or diffs between backups. > > After using the patched version, I thought that I made a mistake... > > unpatched data.gramps : 870Kio > patched data.gramps : 751Kio > > Fortunately, it is not related to the content: my data; but the compression rate/ratio !!! I do not know what is the unit but: > > unpatched data.gz : 5.79 > pachted data.gz : 6.70 > > Conclusion: to sort handles seems to also generate a gain on file size! > :) I just did the same experiment, and I get the opposite result: the sorted handles version is smaller by 20K. Importantly, the uncompressed versions are exactly the same size. Conclusion: the ZIP algorithm is sensitive to the ordering of data. Sometimes you can get better compression, sometimes worse, by just rearranging the data. And Happy New Year to you too! -Doug > > Happy new year! > > Jérôme |