From: Gerald B. <ger...@gm...> - 2011-01-15 15:03:40
|
Agreed. If the export is in handle order we should be fine. Re-importing though can generate new handles, can it not? If so, we lose idempotency which is jerome's issue I think. On 1/15/11, Benny Malengier <ben...@gm...> wrote: > We should _never_ order on export. > We should only access things via an index in the database. > > Ordering would mean a huge time penalty on exporting for those with very > large family trees. > Even exporting along a bsddb index would be much slower, as now we go from > database page to database page. > > Just looping over the data and exporting means the the harddisk is the least > read (it goes from database page to database page). > > In other words: > 1/ default should be just a cursor of the database table, so order cannot be > maintained > 2/ ordered output could be optional. If we add an ordered output, it should > be along an index page of the database, so no in memory sorting must occur > before export can be done. I think ID has a sorted index over it. Handle > normally also, as it is the primary key, and will hence be in some sort of > B-tree. You must be sure to use the sort index on looping however. > > Benny > > 2011/1/15 Jérôme <rom...@ya...> > >> > if the round-trip through gramps was idempotent, then the diff would be >> empty. >> >> Expected result was: minor change on date generation (if generated on an >> other day) and maybe media objects (media paths). >> >> I do not expect a full idem potent after round-trip, but currently we >> cannot easily get the differences. I just wanted testing complete XML >> migration before major release. >> >> >> Jérôme >> >> >> Doug Blank a écrit : >> > On Fri, Jan 14, 2011 at 4:31 PM, jerome <rom...@ya...> wrote: >> >>>> gramps ids could be exotic! >> >>> Do you mean unique? Anyway it is a good sort-key >> >>> candidate >> >> ids = [I000001, IAYUTRE235, zharb, /empty/ , etc ...] >> >> >> >> In 'handle' I trust! ;) >> >> >> >>>> Every time I import a Gramps XML, Gramps rebuilds >> >>> (write, DB commit) some objects! Change time is not the same >> >>> with a simple import then export. >> >>> Well, they all need new handles, right? Possibility >> >>> of collisions. >> >>> Also with gramps ids. >> >> In fact, I want to keep handles: they should be the keys control. >> >> >> >> My problem could be illustrated by something like: >> >> >> >> $ gramps -i import.gramps -e export.gramps >> >> $ gunzip < import.gramps > import.xml >> >> $ gunzip < export.gramps > export.xml >> >> $ diff -u import.xml export.xml > diff.txt >> >> >> >> where import.gramps is our "Scientific control". >> >> >> >> What should be the content of diff.txt ? >> >> >> >> For me, it should be few lines... >> >> Unfortunatly there is some change (order, change time on family >> objects): that's strange! >> > >> > Yes, it would be handy to do this. This might be called "idempotent" >> > by a mathematician: if the round-trip through gramps was idempotent, >> > then the diff would be empty. >> > >> > What we need is: >> > >> > 1. something smarter than diff for this usage >> > 2. sort on something that doesn't change (like the handle), just for >> > this purpose >> > 3. make it so that the order is preserved >> > >> > I would lean towards #3. I've "fixed" some other places where the >> > order was lost. If you let me know which orders are lost, I'll >> > address. >> > >> > -Doug >> > >> >> Jérôme >> >> >> >> >> >> --- En date de : Ven 14.1.11, Gerald Britton <ger...@gm...> >> a écrit : >> >> >> >>> De: Gerald Britton <ger...@gm...> >> >>> Objet: Re: [Gramps-devel] >> self.db.iter_object_handles(sort_handles=True) >> >>> À: "jerome" <rom...@ya...> >> >>> Cc: gra...@li... >> >>> Date: Vendredi 14 janvier 2011, 22h10 >> >>> On Fri, Jan 14, 2011 at 3:59 PM, >> >>> jerome <rom...@ya...> >> >>> wrote: >> >>>>>> I am not certain to understand ... >> >>>>>> Keys should be handles, no ? >> >>>>> Well, that's the question! I can see a case for >> >>>>> gramps ids, or >> >>>>> surnames, or event dates, etc. etc. >> >>>> But handle is the easiest way and safe key for >> >>> ordering our data. >> >>> >> >>> Only if that's the order you want >> >>> >> >>>> gramps ids could be exotic! >> >>> Do you mean unique? Anyway it is a good sort-key >> >>> candidate >> >>> >> >>>> surnames is not a good key :( >> >>> I can see that some would like it...makes the XML easier to >> >>> read by a human >> >>> >> >>>> date => date_object => year, then month, then >> >>> day, then rank, etc ... = horrible index >> >>> >> >>> Probably, but its just one possibility >> >>> >> >>>> My problem is on plugins/export/ExportXML.py >> >>>> >> >>>> I saw a sortByID function not used, then sometimes the >> >>> use of list (get_...), then iteration (only family >> >>> handles). >> >>>> I thought on use lists sorted by handle for having an >> >>> order rule. I do not want to group handles, handles will be >> >>> grouped into the Gramps XML, so it was not planned to parse >> >>> one flat XML file or something like that! >> >>>> But it is not my main problem ... >> >>>> I thought that to sort handles means objects lists >> >>> will be consistent (Persons, Families, Events, etc ...) >> >>>> Every time I import a Gramps XML, Gramps rebuilds >> >>> (write, DB commit) some objects! Change time is not the same >> >>> with a simple import then export. >> >>> >> >>> Well, they all need new handles, right? Possibility >> >>> of collisions. >> >>> Also with gramps ids. >> >>> >> >>>> I can understand the random order used by bsddb, but >> >>> this should not be done on some objects (like family) and >> >>> not on the others. >> >>>> In my mind, an import without DB change is like a >> >>> "read-only": it is not the case. OK, you are saying that it >> >>> is the way used by bsddb. XML files should be able to use >> >>> 'diff' or revision control tools. With current Gramps XML >> >>> import/export, these tools are limited. :( >> >>> >> >>> Yep. You're probably looking for something like a >> >>> UUID for each >> >>> record. Not a bad idea but not implemented at the >> >>> moment. >> >>> >> >>>> >> >>>> Jérôme >> >>>> >> >>>> >> >>>> --- En date de : Ven 14.1.11, Gerald Britton < >> ger...@gm...> >> >>> a écrit : >> >>>>> De: Gerald Britton <ger...@gm...> >> >>>>> Objet: Re: [Gramps-devel] >> >>> self.db.iter_object_handles(sort_handles=True) >> >>>>> À: "jerome" <rom...@ya...> >> >>>>> Cc: gra...@li... >> >>>>> Date: Vendredi 14 janvier 2011, 21h21 >> >>>>> On Fri, Jan 14, 2011 at 3:11 PM, >> >>>>> jerome <rom...@ya...> >> >>>>> wrote: >> >>>>>> I am not certain to understand ... >> >>>>>> Keys should be handles, no ? >> >>>>> Well, that's the question! I can see a case for >> >>>>> gramps ids, or >> >>>>> surnames, or event dates, etc. etc. >> >>>>> >> >>>>>> >> >>> 'self.db.get_{object}_handles(sort_handles=True)' is >> >>>>> allowed, >> >>>>>> not >> >>> 'self.db.iter_{object}_handles(sort_handles=True)'! >> >>>>>> There is two questions: >> >>>>>> >> >>>>>> 1. Why does Gramps only use >> >>>>> self.db.iter_family_handles(), else >> >>>>> self.get_{object}_handles(), where {object} is >> >>> person or >> >>>>> event or source or place or repository or note or >> >>> media >> >>>>> object. >> >>>>> >> >>>>> the get_...handles methods return a list, which >> >>> can be >> >>>>> expensive in >> >>>>> memory and must read all objects in one pass. >> >>> The >> >>>>> iter... methods >> >>>>> just return one at at time, so are cheaper in >> >>> memory. >> >>>>> So, the iter... >> >>>>> methods are preferable. OTOH, they cannot do >> >>> sorting, >> >>>>> since by >> >>>>> definition you need to read all records before you >> >>> can sort >> >>>>> them. >> >>>>> >> >>>>>> 2. Why 'sort_handles=True' argument is >> >>> allowed on all >> >>>>> primary objects except family object ? >> >>>>> >> >>>>> I suppose that there has been no requirement so >> >>> far so no >> >>>>> one coded it up. >> >>>>> >> >>>>>>> The data is not ordered since it >> >>>>>>> comes from bsddb in random order. >> >>>>>> This could explain why I will not be able to >> >>> keep >> >>>>> order on XML import (to bsddb). :( >> >>>>>> >> >>>>>> Thanks. >> >>>>>> Jérôme >> >>>>>> >> >>>>>> --- En date de : Ven 14.1.11, Gerald Britton >> >>> <ger...@gm...> >> >>>>> a écrit : >> >>>>>>> De: Gerald Britton <ger...@gm...> >> >>>>>>> Objet: Re: [Gramps-devel] >> >>>>> self.db.iter_object_handles(sort_handles=True) >> >>>>>>> À: "jerome" <rom...@ya...> >> >>>>>>> Cc: gra...@li... >> >>>>>>> Date: Vendredi 14 janvier 2011, 19h53 >> >>>>>>> The data is not ordered since it >> >>>>>>> comes from bsddb in random order. If >> >>>>>>> we ordered it, we would have to sort it >> >>> by some >> >>>>> key. >> >>>>>>> So, if we did, >> >>>>>>> what keys would you use for: >> >>>>>>> >> >>>>>>> person >> >>>>>>> family >> >>>>>>> event >> >>>>>>> source >> >>>>>>> place >> >>>>>>> repository >> >>>>>>> note >> >>>>>>> media object >> >>>>>>> >> >>>>>>> On Fri, Jan 14, 2011 at 1:36 PM, jerome >> >>> <rom...@ya...> >> >>>>>>> wrote: >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> I am trying to get an answer to a >> >>> question >> >>>>> about the >> >>>>>>> code: why we cannot keep the order of >> >>> objects >> >>>>> after a Gramps >> >>>>>>> XML file import against export ? >> >>>>>>>> Nick pointed out that objects are >> >>> not ordered >> >>>>> on >> >>>>>>> export[1]. >> >>>>>>>> Why ? I suppose backup scripts or >> >>> revision >> >>>>> control >> >>>>>>> tools will work better with ordered >> >>> objects! >> >>>>> Anyway, to use >> >>>>>>> 'sort_handles=True' works on export, >> >>> except for >> >>>>> family >> >>>>>>> handles. Any reason for that ? A typo >> >>> somewhere ? >> >>>>> On my side >> >>>>>>> ? >> >>>>>>>> >> >>>>>>>> [1] http://www.gramps-project.org/bugs/view.php?id=4365 >> >>>>>>>> >> >>>>>>>> regards, >> >>>>>>>> Jérôme >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>> >> ------------------------------------------------------------------------------ >> >>>>>>>> Protect Your Site and Customers from >> >>> Malware >> >>>>> Attacks >> >>>>>>>> Learn about various malware tactics >> >>> and how >> >>>>> to avoid >> >>>>>>> them. Understand >> >>>>>>>> malware threats, the impact they can >> >>> have on >> >>>>> your >> >>>>>>> business, and how you >> >>>>>>>> can protect your company and >> >>> customers by >> >>>>> using code >> >>>>>>> signing. >> >>>>>>>> http://p.sf.net/sfu/oracle-sfdevnl >> >>>>>>>> >> >>>>> _______________________________________________ >> >>>>>>>> Gramps-devel mailing list >> >>>>>>>> Gra...@li... >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Gerald Britton >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Gerald Britton >> >>>>> >> >>>> >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> Gerald Britton >> >>> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Protect Your Site and Customers from Malware Attacks >> >> Learn about various malware tactics and how to avoid them. Understand >> >> malware threats, the impact they can have on your business, and how you >> >> can protect your company and customers by using code signing. >> >> http://p.sf.net/sfu/oracle-sfdevnl >> >> _______________________________________________ >> >> Gramps-devel mailing list >> >> Gra...@li... >> >> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> >> >> > >> >> >> >> ------------------------------------------------------------------------------ >> Protect Your Site and Customers from Malware Attacks >> Learn about various malware tactics and how to avoid them. Understand >> malware threats, the impact they can have on your business, and how you >> can protect your company and customers by using code signing. >> http://p.sf.net/sfu/oracle-sfdevnl >> _______________________________________________ >> Gramps-devel mailing list >> Gra...@li... >> https://lists.sourceforge.net/lists/listinfo/gramps-devel >> > -- Sent from my mobile device Gerald Britton |