Thread: Re: [Gramps-devel] [Gramps-devs] merge mess from individual familysearch.org records

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

[Moving to developers list.]

Jeremy,

I have seen this limitation where you only get to download one family at a
time from a few sites. GRAMPS was not designed to handle this type of
importing of small, overlapping sets. The current merge is meant to be
very safe and doesn't assume anything.

However, I'd be glad to work with you to create a way to make this
painless in GRAMPS. As Benny mentioned, I created a comma-separated-value
(CSV) import (which *is* a part of GRAMPS) to deal with this problem. I
didn't know much about GEDCOM (nor GRAMPS) at the time, and wrote the
ImportCSV to automatically merge without duplication. It can be dangerous
in the wrong hands, but it can save hours in the right ones. You can read
about it here:

http://emergent.brynmawr.edu/emergent/GrampsCSVImport

But we can use the same merge techniques without having to go through a
spreadsheet, and I (and others) could use this too.

The controversial aspect of this is the automatic merge (which is, of
course, the whole point). We can do this as a "fork" of the current GEDCOM
import, but that wouldn't be useful for anyone else, and would quickly
suffer "bitrot" as the original GEDCOM import continued to change. Perhaps
there is a way that we can work within the GRAMPS GEDCOM import?

I always planned on working on a better (perhaps two-stage, interactive)
merge-er. But an easier option would be to have some type of option on the
Import Dialog that either:  a) kept all duplicates, or b) attempted
automatic merges. This would be less controversial (I think) if there was
an "Undo Import" that was quick and painless, and readily available. Is
there?

Would developers allow a GEDCOM import to do automatic merges in the same
manner that ImportCSV works?

(On the other hand, a GRAMPS-oriented crawler would be quite handy too,
and I'd be interested in exploring that approach too. Can you point me to
that?)

-Doug

On Mon, October 29, 2007 6:59 pm, Benny Malengier wrote:
> The merging after import is the best GRAMPS offers I think.
> But indeed slow, and you can overlook things, and when in families, the
family is only merged when both parents are merged.
> Perhaps some bugs in the code too...
>
> If you really don't want to enter the information, and don't want to
merge
> too much, you need to know if there is some kind of unique identifier in
the
> GEDCOM to quickly retrieve identical info.
> You could then merge manually with xml files, import every gedcom that
is
> a
> family in a seperate grdb, export to xml, and open the xml in a text editor
> after unzipping. The xml files of the family should be small. Copying
them
> manually in the xml file of your large database might be feasible. That is,
> in the family xml you change the handle to the handle of the persons
already
> in the database via the replace command. then copy all data not yet in
the
> larger xml. You could keep a spreadsheet with the mapping between GRAMPS
handle and the unique identifier (if present in the GEDCOM)
> Error prone, but if you know how GRAMPS xml works, quite doable. All
depends
> how large the files are you download.
>
> Another option is to first import all small GEDCOMS in a new empty
database,
> clean that as much as possible, then import in your real database when
you
> are satisfied. There is also a CSV import plugin written by Doug but not
part of GRAMPS (see our website) which might help somewhat (wouldn't
know,
> didn't try).
>
> Import with merge is something many people already broke their heads on. It
> is very difficult to do. Some programs offer a unique identifier (UID)
for
> this task, but even then, it is hard to know on differing data what to
do.
> GRAMPS has chosen not to do things automatically, but depend on the user to
> start the merging and check the data.
>
> Benny
>
> 2007/10/29, Jeremy C. Reed <re...@re...>:
>> So I found some of my wife's family at familysearch.org. But they only
allow downloading one family at a time.
>> So now I have a mess of triplicate records. Let me explain:
>> 1) Person A and family lists parents (B)
>> 2) Person B and family lists child A and lists B's parents (C) and spouse.
>> 3) Person C and family lists child B and lists C's parents (D) and spouse.
>> So 3 GEDCOM downloads. And now I have person A twice, person B three
times, and person C twice. Then I have spouses and keeping following
parents.
>> So I keep merging as I grab new records. Very time consuming.
>> Then I start seeing in Gramps, triplicate identical birth events, and
other identical events.
>> And even worse, I see a husband with the same wife three times and with
three sets of children (of course with identical names and identical
data
>> who are all merged together). And when I look at a person's siblings, I
see that have duplicated brothers and sisters.
>> So now I have hundreds of names and thousands of events. I downloaded
probably about 50 GEDCOM files (50 families). And it is an unusable
mess.
>> I am going to start over from scratch.
>> I'd prefer to not manually enter in the data.
>> I asked their support and was told that they don't support multiple
families in one download and it was suggested I try proprietary
commercial
>> PAF Insight.
>> Do you have any suggestions on how I can put that data (one family GED
file at a time) into my gramps database cleanly (no duplicated
details)?
>> I did see a perl-based crawler for grabbing data from familysearch.org but
>> haven't tried it yet.
>> Or do you have any suggestions on how I can merge this triplicate
information without having the merged records containing triplicate
events, duplicated siblings, etc.?
>> I am using gramps as provided by Ubuntu -- version 2.2.6-1ubuntu1.
>>   Jeremy C. Reed
>> p.s. Has anyone else seen this?

Thread: Re: [Gramps-devel] [Gramps-devs] merge mess from individual familysearch.org records

Gramps, the open source genealogy program

gramps-devel