From: Michiel N. <m.d...@he...> - 2011-06-07 19:26:31
|
Hi Doug, On 06/06/2011 03:27 PM, Doug Blank wrote: > Michiel, > > Thanks for the work on making it so that we don't corrupt data on > re-import of XML. (That is what your work does, right?) Yes that is right. > > Once 3.3 is out, and we begin thinking about 3.4, this would be a good > point to think about two possible next steps: > > 1) the infrastructure for keeping track of UID sets (handles) > 2) a two-pass importer > 2a) ability to automatically/semi-automatically merge imported data > with existing data on import > > Currently we are in the position that if we have a duplicate handle, > we choose to throw it away in favor of a new handle, so as not to > corrupt data (correct?). Then we are left with trying to merge using > the old value-based methods (which, as an aside, also needs to be > improved). Perhaps this is a detail but at present we throw away all imported handles, not just the duplicate ones. Everything imported gets a new handle. As Benny pointed out in a mail more than a month ago, Gramps will act strangely when two objects of different types (e.g. Person and Event) have the same handle. So in order to verify that an imported handle is not in use by the database already, you have to check the handle for all primary object types. I thought that would be too time consuming, so that is why I stick a new handle on anything that gets imported, indeed to prevent data corruption. > The first step might be to not throw the import handle away, but > somehow save it for later use. The connection between an old handle > and a new handle is a relationship (a set) that associates these > handles with the objects that they map to. This might be a temporary > state right after an import. > > The second step might be a semi/automatic merge. The idea would be to > take the old object and new object and (based on the modified dates) > figure out which is an edit, delete, or insert. However, if the > objects are not immediately merged, then we may want to save the > handle set for later use. > > You have ideas on next steps? > > -Doug > I haven't thought much about what to do after the 3.3 release. The merge on import problem seems pretty scary. I think it would be good if it could be split into smaller problems. Your observation that users might not want to merge immediately after import, but perhaps somehow want to preserve the relation between two objects for later use, is new to me. What springs to mind is the data model the "BetterGedcom people" seem to be working on, where instead of recording conclusions as we do now in Gramps, one tries to record the actual research process, resulting in different representations of the same person that somehow "make up" that person. I imagine that with such a data model it would be easier to postpone a possible merge till later. Trying to include this in the merge on import problem for the next release just seems too big a project to me. Isn't the next logical step to present the list of possible matching ids, so what is now shown in the "Import statistics" window, in some user interface that makes it possible to execute the actual merge. If that works, extend it with a merge on matching ID values instead of handles, and finally a match based on actual data (name+birth+family relations). Michiel |