From: jerome <rom...@ya...> - 2011-12-31 13:23:17
|
Hi, On this end of year I find something interesting by backporting a improvement made by Doug some months ago on trunk. http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 I was cleaning my data, I made backups and patched the 'non-idempotent XML handling' on XML export for current stable release... It was for my own use and for testing data migration or diffs between backups. After using the patched version, I thought that I made a mistake... unpatched data.gramps : 870Kio patched data.gramps : 751Kio Fortunately, it is not related to the content: my data; but the compression rate/ratio !!! I do not know what is the unit but: unpatched data.gz : 5.79 pachted data.gz : 6.70 Conclusion: to sort handles seems to also generate a gain on file size! :) Happy new year! Jérôme |
From: jerome <rom...@ya...> - 2011-12-31 13:50:20
|
> I do not know what is the unit but: > > unpatched data.gz : 5.79 > pachted data.gz : 6.70 '$ gzip -l' gives the ratio in percent. unpatched : 82.7% patched : 85.1% --- En date de : Sam 31.12.11, jerome <rom...@ya...> a écrit : > De: jerome <rom...@ya...> > Objet: [Gramps-devel] Compression rate on compressed .gramps ... > À: gra...@li... > Date: Samedi 31 décembre 2011, 14h23 > Hi, > > > On this end of year I find something interesting by > backporting a improvement made by Doug some months ago on > trunk. > > http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 > > I was cleaning my data, I made backups and patched the > 'non-idempotent XML handling' on XML export for current > stable release... It was for my own use and for testing data > migration or diffs between backups. > > After using the patched version, I thought that I made a > mistake... > > unpatched data.gramps : 870Kio > patched data.gramps : 751Kio > > Fortunately, it is not related to the content: my data; but > the compression rate/ratio !!! I do not know what is the > unit but: > > unpatched data.gz : 5.79 > pachted data.gz : 6.70 > > Conclusion: to sort handles seems to also generate a gain > on file size! > :) > > > Happy new year! > > Jérôme > > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't > need a complex > infrastructure or vast IT resources to deliver seamless, > secure access to > virtual desktops. With this all-in-one solution, easily > deploy virtual > desktops for less than the cost of PCs and save 60% on VDI > infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Doug B. <dou...@gm...> - 2011-12-31 14:13:25
|
On Sat, Dec 31, 2011 at 8:23 AM, jerome <rom...@ya...> wrote: > Hi, > > > On this end of year I find something interesting by backporting a improvement made by Doug some months ago on trunk. > > http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 > > I was cleaning my data, I made backups and patched the 'non-idempotent XML handling' on XML export for current stable release... It was for my own use and for testing data migration or diffs between backups. > > After using the patched version, I thought that I made a mistake... > > unpatched data.gramps : 870Kio > patched data.gramps : 751Kio > > Fortunately, it is not related to the content: my data; but the compression rate/ratio !!! I do not know what is the unit but: > > unpatched data.gz : 5.79 > pachted data.gz : 6.70 > > Conclusion: to sort handles seems to also generate a gain on file size! > :) I just did the same experiment, and I get the opposite result: the sorted handles version is smaller by 20K. Importantly, the uncompressed versions are exactly the same size. Conclusion: the ZIP algorithm is sensitive to the ordering of data. Sometimes you can get better compression, sometimes worse, by just rearranging the data. And Happy New Year to you too! -Doug > > Happy new year! > > Jérôme |
From: jerome <rom...@ya...> - 2011-12-31 14:24:00
|
> I just did the same experiment, and I get the opposite > result: the > sorted handles version is smaller by 20K. Oh, but this is the same result ! '751 * 6.70' is the same as '870 * 5.79' I got 6.70 and 5.79 by looking at archives properties (unit does not exist, it is the multiplier). Higher multiplier for a better compression. The compression gain is around 2 dot (on %) (smaller after patching). :) Jérôme --- En date de : Sam 31.12.11, Doug Blank <dou...@gm...> a écrit : > De: Doug Blank <dou...@gm...> > Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... > À: "jerome" <rom...@ya...> > Cc: gra...@li... > Date: Samedi 31 décembre 2011, 15h13 > On Sat, Dec 31, 2011 at 8:23 AM, > jerome <rom...@ya...> > wrote: > > Hi, > > > > > > On this end of year I find something interesting by > backporting a improvement made by Doug some months ago on > trunk. > > > > http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 > > > > I was cleaning my data, I made backups and patched the > 'non-idempotent XML handling' on XML export for current > stable release... It was for my own use and for testing data > migration or diffs between backups. > > > > After using the patched version, I thought that I made > a mistake... > > > > unpatched data.gramps : 870Kio > > patched data.gramps : 751Kio > > > > Fortunately, it is not related to the content: my > data; but the compression rate/ratio !!! I do not know what > is the unit but: > > > > unpatched data.gz : 5.79 > > pachted data.gz : 6.70 > > > > Conclusion: to sort handles seems to also generate a > gain on file size! > > :) > > I just did the same experiment, and I get the opposite > result: the > sorted handles version is smaller by 20K. Importantly, the > uncompressed versions are exactly the same size. > > Conclusion: the ZIP algorithm is sensitive to the ordering > of data. > Sometimes you can get better compression, sometimes worse, > by just > rearranging the data. > > And Happy New Year to you too! > > -Doug > > > > > Happy new year! > > > > Jérôme > |
From: jerome <rom...@ya...> - 2011-12-31 14:35:39
|
aaaah translation mistake ... 'gain' in french also means 'to won': something positive, good news http://fr.wiktionary.org/wiki/gain but also something that relates, something more useful. I just see that in electronics, it is rather the factor by which a signal is multiplied... My bad! Thanks! Jérôme --- En date de : Sam 31.12.11, jerome <rom...@ya...> a écrit : > De: jerome <rom...@ya...> > Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... > À: "Doug Blank" <dou...@gm...> > Cc: gra...@li... > Date: Samedi 31 décembre 2011, 15h23 > > I just did the same experiment, > and I get the opposite > > result: the > > sorted handles version is smaller by 20K. > > Oh, but this is the same result ! > > '751 * 6.70' is the same as '870 * 5.79' > > I got 6.70 and 5.79 by looking at archives properties (unit > does not exist, it is the multiplier). Higher multiplier for > a better compression. > > The compression gain is around 2 dot (on %) (smaller after > patching). :) > > > Jérôme > > > --- En date de : Sam 31.12.11, Doug Blank <dou...@gm...> > a écrit : > > > De: Doug Blank <dou...@gm...> > > Objet: Re: [Gramps-devel] Compression rate on > compressed .gramps ... > > À: "jerome" <rom...@ya...> > > Cc: gra...@li... > > Date: Samedi 31 décembre 2011, 15h13 > > On Sat, Dec 31, 2011 at 8:23 AM, > > jerome <rom...@ya...> > > wrote: > > > Hi, > > > > > > > > > On this end of year I find something interesting > by > > backporting a improvement made by Doug some months ago > on > > trunk. > > > > > > http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 > > > > > > I was cleaning my data, I made backups and > patched the > > 'non-idempotent XML handling' on XML export for > current > > stable release... It was for my own use and for > testing data > > migration or diffs between backups. > > > > > > After using the patched version, I thought that I > made > > a mistake... > > > > > > unpatched data.gramps : 870Kio > > > patched data.gramps : 751Kio > > > > > > Fortunately, it is not related to the content: > my > > data; but the compression rate/ratio !!! I do not know > what > > is the unit but: > > > > > > unpatched data.gz : 5.79 > > > pachted data.gz : 6.70 > > > > > > Conclusion: to sort handles seems to also > generate a > > gain on file size! > > > :) > > > > I just did the same experiment, and I get the > opposite > > result: the > > sorted handles version is smaller by 20K. Importantly, > the > > uncompressed versions are exactly the same size. > > > > Conclusion: the ZIP algorithm is sensitive to the > ordering > > of data. > > Sometimes you can get better compression, sometimes > worse, > > by just > > rearranging the data. > > > > And Happy New Year to you too! > > > > -Doug > > > > > > > > Happy new year! > > > > > > Jérôme > > > > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't > need a complex > infrastructure or vast IT resources to deliver seamless, > secure access to > virtual desktops. With this all-in-one solution, easily > deploy virtual > desktops for less than the cost of PCs and save 60% on VDI > infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Doug B. <dou...@gm...> - 2011-12-31 15:25:56
|
On Sat, Dec 31, 2011 at 9:23 AM, jerome <rom...@ya...> wrote: >> I just did the same experiment, and I get the opposite >> result: the >> sorted handles version is smaller by 20K. > > Oh, but this is the same result ! Interesting! I wonder if that is consistent for all Gramps XML ordered-handle compressions. I'll be teaching the PKZIP algorithm next semester, and I'll look into this in more detail. I could imagine that certain sets of handles would not necessarily be better, but I see that if the handles where in order, and had many chars in common, that it would have some benefit. For me, it was about 3 bytes per record. -Doug > '751 * 6.70' is the same as '870 * 5.79' > > I got 6.70 and 5.79 by looking at archives properties (unit does not exist, it is the multiplier). Higher multiplier for a better compression. > > The compression gain is around 2 dot (on %) (smaller after patching). :) > > > Jérôme > > > --- En date de : Sam 31.12.11, Doug Blank <dou...@gm...> a écrit : > >> De: Doug Blank <dou...@gm...> >> Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... >> À: "jerome" <rom...@ya...> >> Cc: gra...@li... >> Date: Samedi 31 décembre 2011, 15h13 >> On Sat, Dec 31, 2011 at 8:23 AM, >> jerome <rom...@ya...> >> wrote: >> > Hi, >> > >> > >> > On this end of year I find something interesting by >> backporting a improvement made by Doug some months ago on >> trunk. >> > >> > http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportXml.py?view=patch&r1=18238&r2=18272&pathrev=18272 >> > >> > I was cleaning my data, I made backups and patched the >> 'non-idempotent XML handling' on XML export for current >> stable release... It was for my own use and for testing data >> migration or diffs between backups. >> > >> > After using the patched version, I thought that I made >> a mistake... >> > >> > unpatched data.gramps : 870Kio >> > patched data.gramps : 751Kio >> > >> > Fortunately, it is not related to the content: my >> data; but the compression rate/ratio !!! I do not know what >> is the unit but: >> > >> > unpatched data.gz : 5.79 >> > pachted data.gz : 6.70 >> > >> > Conclusion: to sort handles seems to also generate a >> gain on file size! >> > :) >> >> I just did the same experiment, and I get the opposite >> result: the >> sorted handles version is smaller by 20K. Importantly, the >> uncompressed versions are exactly the same size. >> >> Conclusion: the ZIP algorithm is sensitive to the ordering >> of data. >> Sometimes you can get better compression, sometimes worse, >> by just >> rearranging the data. >> >> And Happy New Year to you too! >> >> -Doug >> >> > >> > Happy new year! >> > >> > Jérôme >> |
From: jerome <rom...@ya...> - 2012-01-01 10:39:22
|
In fact, all is not possitive with this backport ... I just see that to sort handles might ignore records without id value! It is very specific and cannot happen with a normal use, but should be a bug on current trunk! It is related to the fix for idempotent XML import/export. Nothing is really wrong, but a pointer issue. Current stable version keep something close to a 'fake' record: <person handle="_baee4b3d0e20f7b3413" change="1325353567" id=""> <gender>U</gender> <name type="Birth Name"> </name> </person> This type of record cannot be sorted by id and seems to be ignored on 'idempotent' version (trunk or patched), this generates my TypeError after a XML import... http://www.gramps-project.org/bugs/view.php?id=5466 PS: I made two backports (3.2.6 and 3.3.2SVN) This problem was between 3.2.6 and 3.3.2SVN but should be fixed on trunk!!! Jérôme --- En date de : Sam 31.12.11, Doug Blank <dou...@gm...> a écrit : > De: Doug Blank <dou...@gm...> > Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... > À: "jerome" <rom...@ya...> > Cc: gra...@li... > Date: Samedi 31 décembre 2011, 16h25 > On Sat, Dec 31, 2011 at 9:23 AM, > jerome <rom...@ya...> > wrote: > >> I just did the same experiment, and I get the > opposite > >> result: the > >> sorted handles version is smaller by 20K. > > > > Oh, but this is the same result ! > > Interesting! I wonder if that is consistent for all Gramps > XML > ordered-handle compressions. I'll be teaching the PKZIP > algorithm next > semester, and I'll look into this in more detail. I could > imagine that > certain sets of handles would not necessarily be better, > but I see > that if the handles where in order, and had many chars in > common, that > it would have some benefit. For me, it was about 3 bytes > per record. > > -Doug > |
From: Doug B. <dou...@gm...> - 2012-01-01 14:27:01
|
On Sun, Jan 1, 2012 at 5:39 AM, jerome <rom...@ya...> wrote: > In fact, all is not possitive with this backport ... > I just see that to sort handles might ignore records without id value! The changes for this fix only sort records on their handles, so it shouldn't have any side effects, and doesn't have anything to do with gramps IDs. I've reopened 4365, and put a patch there for gramps33 to be idempotent: http://www.gramps-project.org/bugs/view.php?id=4365 > It is very specific and cannot happen with a normal use, but should be a bug on current trunk! It is related to the fix for idempotent XML import/export. Nothing is really wrong, but a pointer issue. Current stable version keep something close to a 'fake' record: > > <person handle="_baee4b3d0e20f7b3413" change="1325353567" id=""> > <gender>U</gender> > <name type="Birth Name"> > </name> > </person> > > This type of record cannot be sorted by id and seems to be ignored on 'idempotent' version (trunk or patched), this generates my TypeError after a XML import... > > http://www.gramps-project.org/bugs/view.php?id=5466 I'll try the sample database there, but I suspect that there is a problem with your local changes. Try the above patch, and see if you have a problem. I'm not sure it is worth back-porting to Gramps 3.2, but we can if you think so. You can attach your patch for 3.2 there. -Doug > > PS: I made two backports (3.2.6 and 3.3.2SVN) > This problem was between 3.2.6 and 3.3.2SVN but should be fixed on trunk!!! > > > Jérôme > > --- En date de : Sam 31.12.11, Doug Blank <dou...@gm...> a écrit : > >> De: Doug Blank <dou...@gm...> >> Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... >> À: "jerome" <rom...@ya...> >> Cc: gra...@li... >> Date: Samedi 31 décembre 2011, 16h25 >> On Sat, Dec 31, 2011 at 9:23 AM, >> jerome <rom...@ya...> >> wrote: >> >> I just did the same experiment, and I get the >> opposite >> >> result: the >> >> sorted handles version is smaller by 20K. >> > >> > Oh, but this is the same result ! >> >> Interesting! I wonder if that is consistent for all Gramps >> XML >> ordered-handle compressions. I'll be teaching the PKZIP >> algorithm next >> semester, and I'll look into this in more detail. I could >> imagine that >> certain sets of handles would not necessarily be better, >> but I see >> that if the handles where in order, and had many chars in >> common, that >> it would have some benefit. For me, it was about 3 bytes >> per record. >> >> -Doug >> > |
From: Nick H. <nic...@ho...> - 2012-01-01 15:27:04
|
Doug, I have done a couple of tests, with a small database and a medium sized database. In both cases the export takes longer than before. With the small database the file size actually increased. With the medium sized database the size decreased. The additional time for the export is no problem for me with a small database. Perhaps someone with a very large database should also do some testing? Nick. On 01/01/12 14:26, Doug Blank wrote: > On Sun, Jan 1, 2012 at 5:39 AM, jerome<rom...@ya...> wrote: >> In fact, all is not possitive with this backport ... >> I just see that to sort handles might ignore records without id value! > The changes for this fix only sort records on their handles, so it > shouldn't have any side effects, and doesn't have anything to do with > gramps IDs. I've reopened 4365, and put a patch there for gramps33 to > be idempotent: > > http://www.gramps-project.org/bugs/view.php?id=4365 > >> It is very specific and cannot happen with a normal use, but should be a bug on current trunk! It is related to the fix for idempotent XML import/export. Nothing is really wrong, but a pointer issue. Current stable version keep something close to a 'fake' record: >> >> <person handle="_baee4b3d0e20f7b3413" change="1325353567" id=""> >> <gender>U</gender> >> <name type="Birth Name"> >> </name> >> </person> >> >> This type of record cannot be sorted by id and seems to be ignored on 'idempotent' version (trunk or patched), this generates my TypeError after a XML import... >> >> http://www.gramps-project.org/bugs/view.php?id=5466 > I'll try the sample database there, but I suspect that there is a > problem with your local changes. Try the above patch, and see if you > have a problem. I'm not sure it is worth back-porting to Gramps 3.2, > but we can if you think so. You can attach your patch for 3.2 there. > > -Doug > >> PS: I made two backports (3.2.6 and 3.3.2SVN) >> This problem was between 3.2.6 and 3.3.2SVN but should be fixed on trunk!!! >> >> >> Jérôme >> >> --- En date de : Sam 31.12.11, Doug Blank<dou...@gm...> a écrit : >> >>> De: Doug Blank<dou...@gm...> >>> Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... >>> À: "jerome"<rom...@ya...> >>> Cc: gra...@li... >>> Date: Samedi 31 décembre 2011, 16h25 >>> On Sat, Dec 31, 2011 at 9:23 AM, >>> jerome<rom...@ya...> >>> wrote: >>>>> I just did the same experiment, and I get the >>> opposite >>>>> result: the >>>>> sorted handles version is smaller by 20K. >>>> Oh, but this is the same result ! >>> Interesting! I wonder if that is consistent for all Gramps >>> XML >>> ordered-handle compressions. I'll be teaching the PKZIP >>> algorithm next >>> semester, and I'll look into this in more detail. I could >>> imagine that >>> certain sets of handles would not necessarily be better, >>> but I see >>> that if the handles where in order, and had many chars in >>> common, that >>> it would have some benefit. For me, it was about 3 bytes >>> per record. >>> >>> -Doug >>> > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex > infrastructure or vast IT resources to deliver seamless, secure access to > virtual desktops. With this all-in-one solution, easily deploy virtual > desktops for less than the cost of PCs and save 60% on VDI infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > |
From: Doug B. <dou...@gm...> - 2012-01-01 16:42:43
|
On Sun, Jan 1, 2012 at 10:26 AM, Nick Hall <nic...@ho...> wrote: > Doug, > > I have done a couple of tests, with a small database and a medium sized > database. > > In both cases the export takes longer than before. I've done some timing tests in sorting handles, and it should add between 0.000029 and 0.000039 seconds per handle to the export XML process, testing between 1,000 and 1,000,000 handles. I didn't test gzip compression time. > With the small database the file size actually increased. With the > medium sized database the size decreased. I suspect that any rearrangement of ZIPped data will have some impact on the size, but shouldn't vary by more than a few bytes per record up or down. > The additional time for the export is no problem for me with a small > database. Perhaps someone with a very large database should also do > some testing? Anyone running trunk for the last couple of months has been testing this. It shouldn't use any additional memory, other than what sorted() uses. It only sorts lists of handles. For example, it would have had a large memory impact if it was sorting cursors (eg, turning enumerators into lists), but that is not the case. It will be good to test this thoroughly in trunk and better know any impact. But this should have no connection to issues that Jerome has mentioned about missing XML data. -Doug > Nick. > > > On 01/01/12 14:26, Doug Blank wrote: >> On Sun, Jan 1, 2012 at 5:39 AM, jerome<rom...@ya...> wrote: >>> In fact, all is not possitive with this backport ... >>> I just see that to sort handles might ignore records without id value! >> The changes for this fix only sort records on their handles, so it >> shouldn't have any side effects, and doesn't have anything to do with >> gramps IDs. I've reopened 4365, and put a patch there for gramps33 to >> be idempotent: >> >> http://www.gramps-project.org/bugs/view.php?id=4365 >> >>> It is very specific and cannot happen with a normal use, but should be a bug on current trunk! It is related to the fix for idempotent XML import/export. Nothing is really wrong, but a pointer issue. Current stable version keep something close to a 'fake' record: >>> >>> <person handle="_baee4b3d0e20f7b3413" change="1325353567" id=""> >>> <gender>U</gender> >>> <name type="Birth Name"> >>> </name> >>> </person> >>> >>> This type of record cannot be sorted by id and seems to be ignored on 'idempotent' version (trunk or patched), this generates my TypeError after a XML import... >>> >>> http://www.gramps-project.org/bugs/view.php?id=5466 >> I'll try the sample database there, but I suspect that there is a >> problem with your local changes. Try the above patch, and see if you >> have a problem. I'm not sure it is worth back-porting to Gramps 3.2, >> but we can if you think so. You can attach your patch for 3.2 there. >> >> -Doug >> >>> PS: I made two backports (3.2.6 and 3.3.2SVN) >>> This problem was between 3.2.6 and 3.3.2SVN but should be fixed on trunk!!! >>> >>> >>> Jérôme >>> >>> --- En date de : Sam 31.12.11, Doug Blank<dou...@gm...> a écrit : >>> >>>> De: Doug Blank<dou...@gm...> >>>> Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... >>>> À: "jerome"<rom...@ya...> >>>> Cc: gra...@li... >>>> Date: Samedi 31 décembre 2011, 16h25 >>>> On Sat, Dec 31, 2011 at 9:23 AM, >>>> jerome<rom...@ya...> >>>> wrote: >>>>>> I just did the same experiment, and I get the >>>> opposite >>>>>> result: the >>>>>> sorted handles version is smaller by 20K. >>>>> Oh, but this is the same result ! >>>> Interesting! I wonder if that is consistent for all Gramps >>>> XML >>>> ordered-handle compressions. I'll be teaching the PKZIP >>>> algorithm next >>>> semester, and I'll look into this in more detail. I could >>>> imagine that >>>> certain sets of handles would not necessarily be better, >>>> but I see >>>> that if the handles where in order, and had many chars in >>>> common, that >>>> it would have some benefit. For me, it was about 3 bytes >>>> per record. >>>> >>>> -Doug >>>> >> ------------------------------------------------------------------------------ >> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex >> infrastructure or vast IT resources to deliver seamless, secure access to >> virtual desktops. With this all-in-one solution, easily deploy virtual >> desktops for less than the cost of PCs and save 60% on VDI infrastructure >> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> >> > > > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex > infrastructure or vast IT resources to deliver seamless, secure access to > virtual desktops. With this all-in-one solution, easily deploy virtual > desktops for less than the cost of PCs and save 60% on VDI infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel |
From: Nick H. <nic...@ho...> - 2012-01-01 17:05:10
|
Doug, I don't have any problem with the small additional export time or different file size. Jerome made an interesting observation, and I thought that it was worth doing some tests of my own. I quite often import into trunk, but don't export much. As you say, you are only sorting lists of handles, so the patch is safe. The extra convenience of a sorted XML file will be useful. Nick. On 01/01/12 16:42, Doug Blank wrote: > On Sun, Jan 1, 2012 at 10:26 AM, Nick Hall<nic...@ho...> wrote: >> Doug, >> >> I have done a couple of tests, with a small database and a medium sized >> database. >> >> In both cases the export takes longer than before. > I've done some timing tests in sorting handles, and it should add > between 0.000029 and 0.000039 seconds per handle to the export XML > process, testing between 1,000 and 1,000,000 handles. I didn't test > gzip compression time. > >> With the small database the file size actually increased. With the >> medium sized database the size decreased. > I suspect that any rearrangement of ZIPped data will have some impact > on the size, but shouldn't vary by more than a few bytes per record up > or down. > >> The additional time for the export is no problem for me with a small >> database. Perhaps someone with a very large database should also do >> some testing? > Anyone running trunk for the last couple of months has been testing > this. It shouldn't use any additional memory, other than what sorted() > uses. It only sorts lists of handles. For example, it would have had a > large memory impact if it was sorting cursors (eg, turning enumerators > into lists), but that is not the case. > > It will be good to test this thoroughly in trunk and better know any > impact. But this should have no connection to issues that Jerome has > mentioned about missing XML data. > > -Doug > >> Nick. >> >> >> On 01/01/12 14:26, Doug Blank wrote: >>> On Sun, Jan 1, 2012 at 5:39 AM, jerome<rom...@ya...> wrote: >>>> In fact, all is not possitive with this backport ... >>>> I just see that to sort handles might ignore records without id value! >>> The changes for this fix only sort records on their handles, so it >>> shouldn't have any side effects, and doesn't have anything to do with >>> gramps IDs. I've reopened 4365, and put a patch there for gramps33 to >>> be idempotent: >>> >>> http://www.gramps-project.org/bugs/view.php?id=4365 >>> >>>> It is very specific and cannot happen with a normal use, but should be a bug on current trunk! It is related to the fix for idempotent XML import/export. Nothing is really wrong, but a pointer issue. Current stable version keep something close to a 'fake' record: >>>> >>>> <person handle="_baee4b3d0e20f7b3413" change="1325353567" id=""> >>>> <gender>U</gender> >>>> <name type="Birth Name"> >>>> </name> >>>> </person> >>>> >>>> This type of record cannot be sorted by id and seems to be ignored on 'idempotent' version (trunk or patched), this generates my TypeError after a XML import... >>>> >>>> http://www.gramps-project.org/bugs/view.php?id=5466 >>> I'll try the sample database there, but I suspect that there is a >>> problem with your local changes. Try the above patch, and see if you >>> have a problem. I'm not sure it is worth back-porting to Gramps 3.2, >>> but we can if you think so. You can attach your patch for 3.2 there. >>> >>> -Doug >>> >>>> PS: I made two backports (3.2.6 and 3.3.2SVN) >>>> This problem was between 3.2.6 and 3.3.2SVN but should be fixed on trunk!!! >>>> >>>> >>>> Jérôme >>>> >>>> --- En date de : Sam 31.12.11, Doug Blank<dou...@gm...> a écrit : >>>> >>>>> De: Doug Blank<dou...@gm...> >>>>> Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... >>>>> À: "jerome"<rom...@ya...> >>>>> Cc: gra...@li... >>>>> Date: Samedi 31 décembre 2011, 16h25 >>>>> On Sat, Dec 31, 2011 at 9:23 AM, >>>>> jerome<rom...@ya...> >>>>> wrote: >>>>>>> I just did the same experiment, and I get the >>>>> opposite >>>>>>> result: the >>>>>>> sorted handles version is smaller by 20K. >>>>>> Oh, but this is the same result ! >>>>> Interesting! I wonder if that is consistent for all Gramps >>>>> XML >>>>> ordered-handle compressions. I'll be teaching the PKZIP >>>>> algorithm next >>>>> semester, and I'll look into this in more detail. I could >>>>> imagine that >>>>> certain sets of handles would not necessarily be better, >>>>> but I see >>>>> that if the handles where in order, and had many chars in >>>>> common, that >>>>> it would have some benefit. For me, it was about 3 bytes >>>>> per record. >>>>> >>>>> -Doug >>>>> >>> ------------------------------------------------------------------------------ >>> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex >>> infrastructure or vast IT resources to deliver seamless, secure access to >>> virtual desktops. With this all-in-one solution, easily deploy virtual >>> desktops for less than the cost of PCs and save 60% on VDI infrastructure >>> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox >>> _______________________________________________ >>> Gramps-devel mailing list >>> Gra...@li... >>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >>> >>> >> >> ------------------------------------------------------------------------------ >> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex >> infrastructure or vast IT resources to deliver seamless, secure access to >> virtual desktops. With this all-in-one solution, easily deploy virtual >> desktops for less than the cost of PCs and save 60% on VDI infrastructure >> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: jerome <rom...@ya...> - 2012-01-01 17:24:09
|
> But this should have no connection to issues that > Jerome has mentioned about missing XML data. +1 !!! As I can reproduce it on trunk by importing a Gramps XML without sorted handles on export. This just help me to track this issue. :) It is rather closer to: * 5047: When attempting backup old 3.2.0 database from 3.3.0 received UE http://www.gramps-project.org/bugs/view.php?id=5047 * 4661: Check and repair the database after import http://www.gramps-project.org/bugs/view.php?id=4661 than sorted handles! About performances, I wonder how it is possible to get a better compression than '$ gzip -9' flag/argument via the command line ? where -9 should be 'compress better' option. Because generated XML file is more compressed via Gramps than with gzip with this flag! Ma data without compression is around 5Mo and the difference is already sensible/observable. --- En date de : Dim 1.1.12, Doug Blank <dou...@gm...> a écrit : > De: Doug Blank <dou...@gm...> > Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... > À: "Nick Hall" <nic...@ho...> > Cc: gra...@li... > Date: Dimanche 1 janvier 2012, 17h42 > On Sun, Jan 1, 2012 at 10:26 AM, Nick > Hall <nic...@ho...> > wrote: > > Doug, > > > > I have done a couple of tests, with a small database > and a medium sized > > database. > > > > In both cases the export takes longer than before. > > I've done some timing tests in sorting handles, and it > should add > between 0.000029 and 0.000039 seconds per handle to the > export XML > process, testing between 1,000 and 1,000,000 handles. I > didn't test > gzip compression time. > > > With the small database the file size actually > increased. With the > > medium sized database the size decreased. > > I suspect that any rearrangement of ZIPped data will have > some impact > on the size, but shouldn't vary by more than a few bytes > per record up > or down. > > > The additional time for the export is no problem for > me with a small > > database. Perhaps someone with a very large database > should also do > > some testing? > > Anyone running trunk for the last couple of months has been > testing > this. It shouldn't use any additional memory, other than > what sorted() > uses. It only sorts lists of handles. For example, it would > have had a > large memory impact if it was sorting cursors (eg, turning > enumerators > into lists), but that is not the case. > > It will be good to test this thoroughly in trunk and better > know any > impact. But this should have no connection to issues that > Jerome has > mentioned about missing XML data. > > -Doug > > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: jerome <rom...@ya...> - 2012-01-01 15:27:06
|
Doug, Maybe I made something wrong during patching (even it should return any error), but I have some Family trees[1] and this occur only with my primary one, which is the more complex. > The changes for this fix only sort records on their > handles, so it shouldn't have any side effects, and doesn't have anything > to do with gramps IDs. Then this might be related to merge action under Gramps 3.2.6! ie. handle reference on person reference. I wonder how I get this broken record/relation and how this has been cleaned under Gramps 3.3.x? I thought that without ID value the sort action was not complete and the exported data ignored this record. After an import, to sort by ID (and maybe others headers' columns might also crash) and Gramps was not able to find this record/handle, so this was returning error. What was strange is also that Gramps seems to play (in silence) this problem (once I get the error) and fixed by running 'Check and Repair' tool. > You can attach your patch for 3.2 there. I used one very close to 3.3 (without tag). As merging on person object was improved on 3.3, I suppose this issue on person ref record was related to previous method (3.2.x)? I had some 'special records', which were stored as key into attribute object (no value): my primary family tree under 3.2.x[1]. Nick has sent to me a tool which moved these keys (attributes) to new tag objects (3.3.x). //If need I updated the tool with the new 'DB transaction' commit//. Now, I am looking at my new and modified records since 'last year migration' (3.2.x -> 3.3.x). I ignore markup/tag, timestamp, last/surname, etc ... and only get a crash when importing a XML file generated with my Gramps 3.2.6 'patched' version (whose patch is the same as your attached one, except line numbers). It was easy to clean this and to make a quick diff between the standard one and the file generated by the patched version. Note, 'Check and repair' has also fixed this broken reference. Maybe there was a missing back reference check when we merged person into Gramps 3.2.x or I removed the associated person (personref handle) into Gramps 3.2.6? Finally, I just came to the conclusion that I can lost the 'broken' association value by exporting a idempotent XML (sort handles) under Gramps 3.2.6. Could also happen with sourceref, eventref, etc ... I will try to reproduce it from 3.3.x to trunk, but this might be already fixed. [1] http://gramps-project.org/wiki/index.php?title=User:Romjerome Thanks! Jérôme --- En date de : Dim 1.1.12, Doug Blank <dou...@gm...> a écrit : > De: Doug Blank <dou...@gm...> > Objet: Re: [Gramps-devel] Compression rate on compressed .gramps ... > À: "jerome" <rom...@ya...> > Cc: gra...@li... > Date: Dimanche 1 janvier 2012, 15h26 > On Sun, Jan 1, 2012 at 5:39 AM, > jerome <rom...@ya...> > wrote: > > In fact, all is not possitive with this backport ... > > I just see that to sort handles might ignore records > without id value! > > The changes for this fix only sort records on their > handles, so it > shouldn't have any side effects, and doesn't have anything > to do with > gramps IDs. I've reopened 4365, and put a patch there for > gramps33 to > be idempotent: > > http://www.gramps-project.org/bugs/view.php?id=4365 > > > It is very specific and cannot happen with a normal > use, but should be a bug on current trunk! It is related to > the fix for idempotent XML import/export. Nothing is really > wrong, but a pointer issue. Current stable version keep > something close to a 'fake' record: > > > > <person handle="_baee4b3d0e20f7b3413" > change="1325353567" id=""> > > <gender>U</gender> > > <name type="Birth Name"> > > </name> > > </person> > > > > This type of record cannot be sorted by id and seems > to be ignored on 'idempotent' version (trunk or patched), > this generates my TypeError after a XML import... > > > > http://www.gramps-project.org/bugs/view.php?id=5466 > > I'll try the sample database there, but I suspect that > there is a > problem with your local changes. Try the above patch, and > see if you > have a problem. I'm not sure it is worth back-porting to > Gramps 3.2, > but we can if you think so. You can attach your patch for > 3.2 there. > > -Doug > > > > > PS: I made two backports (3.2.6 and 3.3.2SVN) > > This problem was between 3.2.6 and 3.3.2SVN but should > be fixed on trunk!!! > > > > > > Jérôme > > > > --- En date de : Sam 31.12.11, Doug Blank <dou...@gm...> > a écrit : > > > >> De: Doug Blank <dou...@gm...> > >> Objet: Re: [Gramps-devel] Compression rate on > compressed .gramps ... > >> À: "jerome" <rom...@ya...> > >> Cc: gra...@li... > >> Date: Samedi 31 décembre 2011, 16h25 > >> On Sat, Dec 31, 2011 at 9:23 AM, > >> jerome <rom...@ya...> > >> wrote: > >> >> I just did the same experiment, and I get > the > >> opposite > >> >> result: the > >> >> sorted handles version is smaller by > 20K. > >> > > >> > Oh, but this is the same result ! > >> > >> Interesting! I wonder if that is consistent for > all Gramps > >> XML > >> ordered-handle compressions. I'll be teaching the > PKZIP > >> algorithm next > >> semester, and I'll look into this in more detail. > I could > >> imagine that > >> certain sets of handles would not necessarily be > better, > >> but I see > >> that if the handles where in order, and had many > chars in > >> common, that > >> it would have some benefit. For me, it was about 3 > bytes > >> per record. > >> > >> -Doug > >> > > > |