Gerald,

I agree this would be useful, but I think it has a very limited applicability.  It requires matching on a single-valued attribute.  Notice that on FamilySearch there are many duplicates.  For instance, there may be a christening record for someone, a marriage record, a burial record and then the christenings and maybe marriages of every child.  The parents will be present several times with different IDs and when you merge them you will choose one and only one of the IDs as the ID for the person in GRAMPS.  From then on, if you try to import similar records the duplicates will not be found with an ID match except for that in the one record you chose as more important.  You are back to string matching.

If you go beyond the field of FamilySearch, then things become more complicated.  The fact is that you can only do this safely using multivalued unique identifiers.  And many genealogy programs, PAF included, do this with _UID.  It is nonstandard, but the others are either only meaninful in a LDS context or have been abused so much that they have become unusable for this.

W.r.t. the ugliness of _UID, what GRAMPS should do is hide them from sight, i.e. map it internally into "Unique Identifier" or some such and not show it except to, maybe, expert users (I don't hide them on GRAMPS, but I hide them in my version of phpGedView, except on GEDCOM extracts).  It may be read on import or derived through a variety of means, _UID being just the most common option.  But the core concept is keeping multivalued unique identifiers.

I chose not to modify GRAMPS to match on GEDCOM IDs because they are meaningful only in some particular context and likely to clash elsewhere.  I did not find a way to make it universal and yet safe, that's why I settled for preprocessing.  Of course, I'd love to have a way to do that preprocessing invisible, but I never thought it could be done safely.  GEDCOMs from IGI are tagged with IGI as the producer:

0 HEAD
1 SOUR IGI
2 VERS 5.0
2 NAME International Genealogical Index (R)

So it *might* be possible for a producer tag to be derived automatically, but other producers fail to put there anything useful:

0 HEAD
1 SOUR FTW
2 VERS 6.00
2 NAME Family Tree Maker for Windows

[I.e. nothing that can be used as a unique-making stem]

Anyway, I still don't know that the IGI IDs are guaranteed not to be recycled.


2007/10/30, Gerald Britton <gerald.britton@gmail.com>:
I wasn't referring to the _UID attribute, but rather the person,
event, place, source and family ids.  Personally, I find the _UID
attribute less than useful, and run GEDCOMS through an awk progam to
remove them.

So my idea is:

Add an option to the GRAMPS gedcom import, "Unique IDs," and have the
import look up each and every person, family, place, event and source
id coming in from the GEDCOM in the database to which the records are
to be added.  If a duplicate is found, ignore it!  This would require
that the import program accept the incoming ids as unique and
consistent, which is wy it should be an option.  Also, because of the
linked structure of GED, the import might get a family first, with
references to spouses and children, while the people come in later.
This could be resolved through a two-pass approach or by creating
placeholders for out-of-order data.

So, with this approach, you would import your gedcoms, with the new
option set, into a new database, then open your master database, and
import the new database with the option off.

On Oct 30, 2007 7:19 AM, Julio Sánchez <julio.sanchez@gmail.com > wrote:
> Hi,
>
> AFAIK, FamilySearch does not guarantee unique, eternal, identifiers.
> However, the GEDCOM ID they use on the downloaded GEDCOMs has been permanent
> for years.  It is always the same for each person in the record.
>
> I have a small Perl script that copies that value into some form of the _UID
> nonstandard attribute.  For instance:
>
> 1 _UID IGI::I500077973070
>
> I.e., I qualify the number with a numbering authority code ('IGI').  This
> way I can tell records I already downloaded and merge them together.
>
> Unconditionally doing this, however, is dangerous because there is no
> guarantee that the FamilySearch IDs will not change in the future, so this
> should only be done under very controlled circumstances.
>
> I do the same for the Vital Records Index:
>
> 1 _UID VRI-2000-ES::I4611604-1
>
> In this case, it is much safer because CDs are immutable, the concatenation
> of year and region codes makes the code unique.  A new CD edition would
> change the IDs but, at least, no unwanted merges would happen.
>
> In every case I repeatedly reuse data from a computer database I have
> constructed one specific algorithm to create unique IDs. One other example,
> from the Guipuzcoa online church records:
>
> 1 _UID DEAH:111500101-0001-0-e7289c-2
>
> It contains the source reference, page number, record number and a partial
> hash of the name (this was derived by experimentation after several failed
> tries, you don't wanna know the pathological cases that appear).  Every
> source that does not assign unique IDs needs ad-hoc handling.
>
> Ideally every source should generate a permanent unique ID for records it
> originates.  Records received from other sources are not given new IDs, but
> records merged from different sources keep all the received IDs.
>
> Regards,
>
> Julio
>
> 2007/10/30, Gerald Britton <gerald.britton@gmail.com>:
> >
> >
> >
> > If the familysearch gedcoms have unique ids for people, events etc
> > them gramps import could discard the duplicates as an option. So you
> > could start a new db, import the geds (removing dups), the open your
> > good db and import the new db into it.
> >
> >
> >
> > On 10/30/07, Benny Malengier <benny.malengier@gmail.com> wrote:
> > > 2007/10/30, Douglas S. Blank < dblank@cs.brynmawr.edu >:
> > > >
> > > > [Moving to developers list.]
> > > >
> > > > The controversial aspect of this is the automatic merge (which is, of
> > > > course, the whole point). We can do this as a "fork" of the current
> GEDCOM
> > > > import, but that wouldn't be useful for anyone else, and would quickly
> > > > suffer "bitrot" as the original GEDCOM import continued to change.
> Perhaps
> > > > there is a way that we can work within the GRAMPS GEDCOM import?
> > >
> > >
> > > Well, in 3.0 you could make an automatic revision, then work on a the
> data,
> > > and do rollback to revision if problems.
> > >
> > > I always planned on working on a better (perhaps two-stage, interactive)
> > > > merge-er. But an easier option would be to have some type of option on
> the
> > > > Import Dialog that either:  a) kept all duplicates, or b) attempted
> > > > automatic merges. This would be less controversial (I think) if there
> was
> > > > an "Undo Import" that was quick and painless, and readily available.
> Is
> > > > there?
> > >
> > >
> > > I see possibilities with automatic merge, but it should be on  a unique
> > > identifier.  I just have too many people with the same name to be able
> to
> > > have the name determine the merging or not (that is, all first sons of
> all
> > > children have the name of their grandfather, and are born in the same
> decade
> > > or so).
> > >
> > > Would developers allow a GEDCOM import to do automatic merges in the
> same
> > > > manner that ImportCSV works?
> > >
> > >
> > > I would rather see GEDCOM -> XML, and allow automatic merge of XML.
> > > Like this the typical problem of importing GEDCOM is separated from the
> > > merging, we allow merging of gramps family trees without the need to
> pass
> > > over GEDCOM, and we have power over our XML format, so we can add things
> > > there, so we could make export to XML and have extra data written needed
> for
> > > possible merging.
> > > Note that an overwrite in XML is already as easy as replacing the handle
> > > with the handle of an existing object.
> > >
> > > Benny
> > >
> >
> > -------------------------------------------------------------------------
> >
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > _______________________________________________
> > Gramps-devel mailing list
> > Gramps-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gramps-devel
> >
>
>