From: doug <do...@o2...> - 2011-06-08 16:44:22
|
Can some attention be given to the Find Possible Duplicate People tool as part of the way forward? Doug On 08/06/11 16:30, Michiel Nauta wrote: > I have added a patch that keeps the handles of the data that is imported > when the family tree is empty (revision 17715/6). I think this will do > for the 3.3 release. I agree, the way forward is to promote the handle > to UID and get the urls from that. I'll leave that for version 3.4. > > Michiel > > On 06/08/2011 12:47 PM, Doug Blank wrote: >> On Jun 8, 2011, at 6:41 AM, Doug Blank<dou...@gm...> wrote: >> >>> If Narrweb is the only issue, let's go ahead and move the handles to a UID attribute and get the URL from there. This is trunk; gramps33 should not change handles if not required to in order to prevent collisions. >>> >>> On import, if an object does not set UID, then we can use the handle, if given. Else, use the newly created handle. That way old XML files will get correct urls in narweb. >>> >> >> Rereading Benny's response there is an issue here that old XML files will get UIDs from handles, but newer ones will get real UIDs. I think that is ok. >> >> -Doug >> >>> There will be possibly more than 1 UID so one will need to be primary. >>> >>> How does that sound? >>> >>> -Doug >>> >>> Sent from my iPad >>> >>> On Jun 8, 2011, at 3:01 AM, Michiel Nauta<m.d...@he...> wrote: >>> >>>> Keeping handles when importing in an empty database was also the direction I was thinking of. I'll look into this. >>>> >>>> That said, using the keys of you database for anything else than internal database management is plain stupid! >>>> >>>> Michiel >>>> >>>> On 06/08/2011 07:49 AM, Benny Malengier wrote: >>>>> 2011/6/7 Doug Blank<dou...@gm...> >>>>> >>>>>> On Tue, Jun 7, 2011 at 3:26 PM, Michiel Nauta<m.d...@he...> wrote: >>>>>>> Hi Doug, >>>>>>> >>>>>>> On 06/06/2011 03:27 PM, Doug Blank wrote: >>>>>>>> >>>>>>>> Michiel, >>>>>>>> >>>>>>>> Thanks for the work on making it so that we don't corrupt data on >>>>>>>> re-import of XML. (That is what your work does, right?) >>>>>>> >>>>>>> Yes that is right. >>>>>>> >>>>>>>> >>>>>>>> Once 3.3 is out, and we begin thinking about 3.4, this would be a good >>>>>>>> point to think about two possible next steps: >>>>>>>> >>>>>>>> 1) the infrastructure for keeping track of UID sets (handles) >>>>>>>> 2) a two-pass importer >>>>>>>> 2a) ability to automatically/semi-automatically merge imported data >>>>>>>> with existing data on import >>>>>>>> >>>>>>>> Currently we are in the position that if we have a duplicate handle, >>>>>>>> we choose to throw it away in favor of a new handle, so as not to >>>>>>>> corrupt data (correct?). Then we are left with trying to merge using >>>>>>>> the old value-based methods (which, as an aside, also needs to be >>>>>>>> improved). >>>>>>> >>>>>>> Perhaps this is a detail but at present we throw away all imported >>>>>> handles, >>>>>>> not just the duplicate ones. >>>>>> >>>>>> This might not be good. I think some people have come to expect that >>>>>> the NarrativeWeb pages will keep the same URLs over time. The URLs are >>>>>> based on the objects handles, if I understand it correctly. >>>>>> >>>>>> Perhaps we should keep them the same, if not duplicate? >>>>>> >>>>> >>>>> Ai, >>>>> >>>>> I did not take this into account. Not too many people will do export and >>>>> then import, but it will happen from time to time, and then the website >>>>> adresses will be completely different... >>>>> >>>>> So, should we redesign this part? Eg, import in an empty family tree keeps >>>>> the handles, and otherwise it is a real import and we discard the old >>>>> handles? Michiel, is this easy to achieve? I would suppose so as the -d flag >>>>> already keeps handles. >>>>> >>>>> Benny >>>>> >>>>> >>>>>> >>>>>>> Everything imported gets a new handle. As Benny >>>>>>> pointed out in a mail more than a month ago, Gramps will act strangely >>>>>> when >>>>>>> two objects of different types (e.g. Person and Event) have the same >>>>>> handle. >>>>>> >>>>>> Yes, but this will be a very rare occurrence, correct? Chances are >>>>>> there will be no collisions. >>>>>> >>>>>>> So in order to verify that an imported handle is not in use by the >>>>>> database >>>>>>> already, you have to check the handle for all primary object types. I >>>>>>> thought that would be too time consuming, so that is why I stick a new >>>>>>> handle on anything that gets imported, indeed to prevent data corruption. >>>>>> >>>>>> Well, this is going to get slower anyway, if we go to a two-pass system. >>>>>> >>>>>>>> The first step might be to not throw the import handle away, but >>>>>>>> somehow save it for later use. The connection between an old handle >>>>>>>> and a new handle is a relationship (a set) that associates these >>>>>>>> handles with the objects that they map to. This might be a temporary >>>>>>>> state right after an import. >>>>>>>> >>>>>>>> The second step might be a semi/automatic merge. The idea would be to >>>>>>>> take the old object and new object and (based on the modified dates) >>>>>>>> figure out which is an edit, delete, or insert. However, if the >>>>>>>> objects are not immediately merged, then we may want to save the >>>>>>>> handle set for later use. >>>>>>>> >>>>>>>> You have ideas on next steps? >>>>>>>> >>>>>>>> -Doug >>>>>>>> >>>>>>> >>>>>>> I haven't thought much about what to do after the 3.3 release. The merge >>>>>> on >>>>>>> import problem seems pretty scary. I think it would be good if it could >>>>>> be >>>>>>> split into smaller problems. Your observation that users might not want >>>>>> to >>>>>>> merge immediately after import, but perhaps somehow want to preserve the >>>>>>> relation between two objects for later use, is new to me. What springs to >>>>>>> mind is the data model the "BetterGedcom people" seem to be working on, >>>>>>> where instead of recording conclusions as we do now in Gramps, one tries >>>>>> to >>>>>>> record the actual research process, resulting in different >>>>>> representations >>>>>>> of the same person that somehow "make up" that person. I imagine that >>>>>> with >>>>>>> such a data model it would be easier to postpone a possible merge till >>>>>>> later. Trying to include this in the merge on import problem for the next >>>>>>> release just seems too big a project to me. >>>>>>> >>>>>>> Isn't the next logical step to present the list of possible matching ids, >>>>>> so >>>>>>> what is now shown in the "Import statistics" window, in some user >>>>>> interface >>>>>>> that makes it possible to execute the actual merge. If that works, extend >>>>>> it >>>>>>> with a merge on matching ID values instead of handles, and finally a >>>>>> match >>>>>>> based on actual data (name+birth+family relations). >>>>>> >>>>>> Yes, that does sound like a good step, if necessary. If the data are >>>>>> exactly the same, there is no need for the dialog. >>>>>> >>>>>> -Doug >>>>>> >>>>>>> Michiel >>>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> EditLive Enterprise is the world's most technically advanced content >>>>>> authoring tool. Experience the power of Track Changes, Inline Image >>>>>> Editing and ensure content is compliant with Accessibility Checking. >>>>>> http://p.sf.net/sfu/ephox-dev2dev >>>>>> _______________________________________________ >>>>>> Gramps-devel mailing list >>>>>> Gra...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/gramps-devel >>>>>> >>>>> >> > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |