From: Benny M. <ben...@gm...> - 2011-01-15 13:44:12
|
We should _never_ order on export. We should only access things via an index in the database. Ordering would mean a huge time penalty on exporting for those with very large family trees. Even exporting along a bsddb index would be much slower, as now we go from database page to database page. Just looping over the data and exporting means the the harddisk is the least read (it goes from database page to database page). In other words: 1/ default should be just a cursor of the database table, so order cannot be maintained 2/ ordered output could be optional. If we add an ordered output, it should be along an index page of the database, so no in memory sorting must occur before export can be done. I think ID has a sorted index over it. Handle normally also, as it is the primary key, and will hence be in some sort of B-tree. You must be sure to use the sort index on looping however. Benny 2011/1/15 Jérôme <rom...@ya...> > > if the round-trip through gramps was idempotent, then the diff would be > empty. > > Expected result was: minor change on date generation (if generated on an > other day) and maybe media objects (media paths). > > I do not expect a full idem potent after round-trip, but currently we > cannot easily get the differences. I just wanted testing complete XML > migration before major release. > > > Jérôme > > > Doug Blank a écrit : > > On Fri, Jan 14, 2011 at 4:31 PM, jerome <rom...@ya...> wrote: > >>>> gramps ids could be exotic! > >>> Do you mean unique? Anyway it is a good sort-key > >>> candidate > >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...] > >> > >> In 'handle' I trust! ;) > >> > >>>> Every time I import a Gramps XML, Gramps rebuilds > >>> (write, DB commit) some objects! Change time is not the same > >>> with a simple import then export. > >>> Well, they all need new handles, right? Possibility > >>> of collisions. > >>> Also with gramps ids. > >> In fact, I want to keep handles: they should be the keys control. > >> > >> My problem could be illustrated by something like: > >> > >> $ gramps -i import.gramps -e export.gramps > >> $ gunzip < import.gramps > import.xml > >> $ gunzip < export.gramps > export.xml > >> $ diff -u import.xml export.xml > diff.txt > >> > >> where import.gramps is our "Scientific control". > >> > >> What should be the content of diff.txt ? > >> > >> For me, it should be few lines... > >> Unfortunatly there is some change (order, change time on family > objects): that's strange! > > > > Yes, it would be handy to do this. This might be called "idempotent" > > by a mathematician: if the round-trip through gramps was idempotent, > > then the diff would be empty. > > > > What we need is: > > > > 1. something smarter than diff for this usage > > 2. sort on something that doesn't change (like the handle), just for > > this purpose > > 3. make it so that the order is preserved > > > > I would lean towards #3. I've "fixed" some other places where the > > order was lost. If you let me know which orders are lost, I'll > > address. > > > > -Doug > > > >> Jérôme > >> > >> > >> --- En date de : Ven 14.1.11, Gerald Britton <ger...@gm...> > a écrit : > >> > >>> De: Gerald Britton <ger...@gm...> > >>> Objet: Re: [Gramps-devel] > self.db.iter_object_handles(sort_handles=True) > >>> À: "jerome" <rom...@ya...> > >>> Cc: gra...@li... > >>> Date: Vendredi 14 janvier 2011, 22h10 > >>> On Fri, Jan 14, 2011 at 3:59 PM, > >>> jerome <rom...@ya...> > >>> wrote: > >>>>>> I am not certain to understand ... > >>>>>> Keys should be handles, no ? > >>>>> Well, that's the question! I can see a case for > >>>>> gramps ids, or > >>>>> surnames, or event dates, etc. etc. > >>>> But handle is the easiest way and safe key for > >>> ordering our data. > >>> > >>> Only if that's the order you want > >>> > >>>> gramps ids could be exotic! > >>> Do you mean unique? Anyway it is a good sort-key > >>> candidate > >>> > >>>> surnames is not a good key :( > >>> I can see that some would like it...makes the XML easier to > >>> read by a human > >>> > >>>> date => date_object => year, then month, then > >>> day, then rank, etc ... = horrible index > >>> > >>> Probably, but its just one possibility > >>> > >>>> My problem is on plugins/export/ExportXML.py > >>>> > >>>> I saw a sortByID function not used, then sometimes the > >>> use of list (get_...), then iteration (only family > >>> handles). > >>>> I thought on use lists sorted by handle for having an > >>> order rule. I do not want to group handles, handles will be > >>> grouped into the Gramps XML, so it was not planned to parse > >>> one flat XML file or something like that! > >>>> But it is not my main problem ... > >>>> I thought that to sort handles means objects lists > >>> will be consistent (Persons, Families, Events, etc ...) > >>>> Every time I import a Gramps XML, Gramps rebuilds > >>> (write, DB commit) some objects! Change time is not the same > >>> with a simple import then export. > >>> > >>> Well, they all need new handles, right? Possibility > >>> of collisions. > >>> Also with gramps ids. > >>> > >>>> I can understand the random order used by bsddb, but > >>> this should not be done on some objects (like family) and > >>> not on the others. > >>>> In my mind, an import without DB change is like a > >>> "read-only": it is not the case. OK, you are saying that it > >>> is the way used by bsddb. XML files should be able to use > >>> 'diff' or revision control tools. With current Gramps XML > >>> import/export, these tools are limited. :( > >>> > >>> Yep. You're probably looking for something like a > >>> UUID for each > >>> record. Not a bad idea but not implemented at the > >>> moment. > >>> > >>>> > >>>> Jérôme > >>>> > >>>> > >>>> --- En date de : Ven 14.1.11, Gerald Britton < > ger...@gm...> > >>> a écrit : > >>>>> De: Gerald Britton <ger...@gm...> > >>>>> Objet: Re: [Gramps-devel] > >>> self.db.iter_object_handles(sort_handles=True) > >>>>> À: "jerome" <rom...@ya...> > >>>>> Cc: gra...@li... > >>>>> Date: Vendredi 14 janvier 2011, 21h21 > >>>>> On Fri, Jan 14, 2011 at 3:11 PM, > >>>>> jerome <rom...@ya...> > >>>>> wrote: > >>>>>> I am not certain to understand ... > >>>>>> Keys should be handles, no ? > >>>>> Well, that's the question! I can see a case for > >>>>> gramps ids, or > >>>>> surnames, or event dates, etc. etc. > >>>>> > >>>>>> > >>> 'self.db.get_{object}_handles(sort_handles=True)' is > >>>>> allowed, > >>>>>> not > >>> 'self.db.iter_{object}_handles(sort_handles=True)'! > >>>>>> There is two questions: > >>>>>> > >>>>>> 1. Why does Gramps only use > >>>>> self.db.iter_family_handles(), else > >>>>> self.get_{object}_handles(), where {object} is > >>> person or > >>>>> event or source or place or repository or note or > >>> media > >>>>> object. > >>>>> > >>>>> the get_...handles methods return a list, which > >>> can be > >>>>> expensive in > >>>>> memory and must read all objects in one pass. > >>> The > >>>>> iter... methods > >>>>> just return one at at time, so are cheaper in > >>> memory. > >>>>> So, the iter... > >>>>> methods are preferable. OTOH, they cannot do > >>> sorting, > >>>>> since by > >>>>> definition you need to read all records before you > >>> can sort > >>>>> them. > >>>>> > >>>>>> 2. Why 'sort_handles=True' argument is > >>> allowed on all > >>>>> primary objects except family object ? > >>>>> > >>>>> I suppose that there has been no requirement so > >>> far so no > >>>>> one coded it up. > >>>>> > >>>>>>> The data is not ordered since it > >>>>>>> comes from bsddb in random order. > >>>>>> This could explain why I will not be able to > >>> keep > >>>>> order on XML import (to bsddb). :( > >>>>>> > >>>>>> Thanks. > >>>>>> Jérôme > >>>>>> > >>>>>> --- En date de : Ven 14.1.11, Gerald Britton > >>> <ger...@gm...> > >>>>> a écrit : > >>>>>>> De: Gerald Britton <ger...@gm...> > >>>>>>> Objet: Re: [Gramps-devel] > >>>>> self.db.iter_object_handles(sort_handles=True) > >>>>>>> À: "jerome" <rom...@ya...> > >>>>>>> Cc: gra...@li... > >>>>>>> Date: Vendredi 14 janvier 2011, 19h53 > >>>>>>> The data is not ordered since it > >>>>>>> comes from bsddb in random order. If > >>>>>>> we ordered it, we would have to sort it > >>> by some > >>>>> key. > >>>>>>> So, if we did, > >>>>>>> what keys would you use for: > >>>>>>> > >>>>>>> person > >>>>>>> family > >>>>>>> event > >>>>>>> source > >>>>>>> place > >>>>>>> repository > >>>>>>> note > >>>>>>> media object > >>>>>>> > >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome > >>> <rom...@ya...> > >>>>>>> wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> > >>>>>>>> I am trying to get an answer to a > >>> question > >>>>> about the > >>>>>>> code: why we cannot keep the order of > >>> objects > >>>>> after a Gramps > >>>>>>> XML file import against export ? > >>>>>>>> Nick pointed out that objects are > >>> not ordered > >>>>> on > >>>>>>> export[1]. > >>>>>>>> Why ? I suppose backup scripts or > >>> revision > >>>>> control > >>>>>>> tools will work better with ordered > >>> objects! > >>>>> Anyway, to use > >>>>>>> 'sort_handles=True' works on export, > >>> except for > >>>>> family > >>>>>>> handles. Any reason for that ? A typo > >>> somewhere ? > >>>>> On my side > >>>>>>> ? > >>>>>>>> > >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365 > >>>>>>>> > >>>>>>>> regards, > >>>>>>>> Jérôme > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>> > ------------------------------------------------------------------------------ > >>>>>>>> Protect Your Site and Customers from > >>> Malware > >>>>> Attacks > >>>>>>>> Learn about various malware tactics > >>> and how > >>>>> to avoid > >>>>>>> them. Understand > >>>>>>>> malware threats, the impact they can > >>> have on > >>>>> your > >>>>>>> business, and how you > >>>>>>>> can protect your company and > >>> customers by > >>>>> using code > >>>>>>> signing. > >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl > >>>>>>>> > >>>>> _______________________________________________ > >>>>>>>> Gramps-devel mailing list > >>>>>>>> Gra...@li... > >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Gerald Britton > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Gerald Britton > >>>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Gerald Britton > >>> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Protect Your Site and Customers from Malware Attacks > >> Learn about various malware tactics and how to avoid them. Understand > >> malware threats, the impact they can have on your business, and how you > >> can protect your company and customers by using code signing. > >> http://p.sf.net/sfu/oracle-sfdevnl > >> _______________________________________________ > >> Gramps-devel mailing list > >> Gra...@li... > >> https://lists.sourceforge.net/lists/listinfo/gramps-devel > >> > > > > > > ------------------------------------------------------------------------------ > Protect Your Site and Customers from Malware Attacks > Learn about various malware tactics and how to avoid them. Understand > malware threats, the impact they can have on your business, and how you > can protect your company and customers by using code signing. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |