From: Gerald B. <ger...@gm...> - 2007-10-30 13:02:22
|
I wasn't referring to the _UID attribute, but rather the person, event, place, source and family ids. Personally, I find the _UID attribute less than useful, and run GEDCOMS through an awk progam to remove them. So my idea is: Add an option to the GRAMPS gedcom import, "Unique IDs," and have the import look up each and every person, family, place, event and source id coming in from the GEDCOM in the database to which the records are to be added. If a duplicate is found, ignore it! This would require that the import program accept the incoming ids as unique and consistent, which is wy it should be an option. Also, because of the linked structure of GED, the import might get a family first, with references to spouses and children, while the people come in later. This could be resolved through a two-pass approach or by creating placeholders for out-of-order data. So, with this approach, you would import your gedcoms, with the new option set, into a new database, then open your master database, and import the new database with the option off. On Oct 30, 2007 7:19 AM, Julio S=E1nchez <jul...@gm...> wrote: > Hi, > > AFAIK, FamilySearch does not guarantee unique, eternal, identifiers. > However, the GEDCOM ID they use on the downloaded GEDCOMs has been perman= ent > for years. It is always the same for each person in the record. > > I have a small Perl script that copies that value into some form of the _= UID > nonstandard attribute. For instance: > > 1 _UID IGI::I500077973070 > > I.e., I qualify the number with a numbering authority code ('IGI'). This > way I can tell records I already downloaded and merge them together. > > Unconditionally doing this, however, is dangerous because there is no > guarantee that the FamilySearch IDs will not change in the future, so thi= s > should only be done under very controlled circumstances. > > I do the same for the Vital Records Index: > > 1 _UID VRI-2000-ES::I4611604-1 > > In this case, it is much safer because CDs are immutable, the concatenati= on > of year and region codes makes the code unique. A new CD edition would > change the IDs but, at least, no unwanted merges would happen. > > In every case I repeatedly reuse data from a computer database I have > constructed one specific algorithm to create unique IDs. One other exampl= e, > from the Guipuzcoa online church records: > > 1 _UID DEAH:111500101-0001-0-e7289c-2 > > It contains the source reference, page number, record number and a partia= l > hash of the name (this was derived by experimentation after several faile= d > tries, you don't wanna know the pathological cases that appear). Every > source that does not assign unique IDs needs ad-hoc handling. > > Ideally every source should generate a permanent unique ID for records it > originates. Records received from other sources are not given new IDs, b= ut > records merged from different sources keep all the received IDs. > > Regards, > > Julio > > 2007/10/30, Gerald Britton <ger...@gm...>: > > > > > > > > If the familysearch gedcoms have unique ids for people, events etc > > them gramps import could discard the duplicates as an option. So you > > could start a new db, import the geds (removing dups), the open your > > good db and import the new db into it. > > > > > > > > On 10/30/07, Benny Malengier <ben...@gm...> wrote: > > > 2007/10/30, Douglas S. Blank <db...@cs... >: > > > > > > > > [Moving to developers list.] > > > > > > > > The controversial aspect of this is the automatic merge (which is, = of > > > > course, the whole point). We can do this as a "fork" of the current > GEDCOM > > > > import, but that wouldn't be useful for anyone else, and would quic= kly > > > > suffer "bitrot" as the original GEDCOM import continued to change. > Perhaps > > > > there is a way that we can work within the GRAMPS GEDCOM import? > > > > > > > > > Well, in 3.0 you could make an automatic revision, then work on a the > data, > > > and do rollback to revision if problems. > > > > > > I always planned on working on a better (perhaps two-stage, interacti= ve) > > > > merge-er. But an easier option would be to have some type of option= on > the > > > > Import Dialog that either: a) kept all duplicates, or b) attempted > > > > automatic merges. This would be less controversial (I think) if the= re > was > > > > an "Undo Import" that was quick and painless, and readily available= . > Is > > > > there? > > > > > > > > > I see possibilities with automatic merge, but it should be on a uniq= ue > > > identifier. I just have too many people with the same name to be abl= e > to > > > have the name determine the merging or not (that is, all first sons o= f > all > > > children have the name of their grandfather, and are born in the same > decade > > > or so). > > > > > > Would developers allow a GEDCOM import to do automatic merges in the > same > > > > manner that ImportCSV works? > > > > > > > > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. > > > Like this the typical problem of importing GEDCOM is separated from t= he > > > merging, we allow merging of gramps family trees without the need to > pass > > > over GEDCOM, and we have power over our XML format, so we can add thi= ngs > > > there, so we could make export to XML and have extra data written nee= ded > for > > > possible merging. > > > Note that an overwrite in XML is already as easy as replacing the han= dle > > > with the handle of an existing object. > > > > > > Benny > > > > > > > -----------------------------------------------------------------------= -- > > > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Gramps-devel mailing list > > Gra...@li... > > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > > > |