From: Douglas S. B. <db...@cs...> - 2007-10-30 00:05:26
|
[Moving to developers list.] Jeremy, I have seen this limitation where you only get to download one family at a time from a few sites. GRAMPS was not designed to handle this type of importing of small, overlapping sets. The current merge is meant to be very safe and doesn't assume anything. However, I'd be glad to work with you to create a way to make this painless in GRAMPS. As Benny mentioned, I created a comma-separated-value (CSV) import (which *is* a part of GRAMPS) to deal with this problem. I didn't know much about GEDCOM (nor GRAMPS) at the time, and wrote the ImportCSV to automatically merge without duplication. It can be dangerous in the wrong hands, but it can save hours in the right ones. You can read about it here: http://emergent.brynmawr.edu/emergent/GrampsCSVImport But we can use the same merge techniques without having to go through a spreadsheet, and I (and others) could use this too. The controversial aspect of this is the automatic merge (which is, of course, the whole point). We can do this as a "fork" of the current GEDCOM import, but that wouldn't be useful for anyone else, and would quickly suffer "bitrot" as the original GEDCOM import continued to change. Perhaps there is a way that we can work within the GRAMPS GEDCOM import? I always planned on working on a better (perhaps two-stage, interactive) merge-er. But an easier option would be to have some type of option on the Import Dialog that either: a) kept all duplicates, or b) attempted automatic merges. This would be less controversial (I think) if there was an "Undo Import" that was quick and painless, and readily available. Is there? Would developers allow a GEDCOM import to do automatic merges in the same manner that ImportCSV works? (On the other hand, a GRAMPS-oriented crawler would be quite handy too, and I'd be interested in exploring that approach too. Can you point me to that?) -Doug On Mon, October 29, 2007 6:59 pm, Benny Malengier wrote: > The merging after import is the best GRAMPS offers I think. > But indeed slow, and you can overlook things, and when in families, the family is only merged when both parents are merged. > Perhaps some bugs in the code too... > > If you really don't want to enter the information, and don't want to merge > too much, you need to know if there is some kind of unique identifier in the > GEDCOM to quickly retrieve identical info. > You could then merge manually with xml files, import every gedcom that is > a > family in a seperate grdb, export to xml, and open the xml in a text editor > after unzipping. The xml files of the family should be small. Copying them > manually in the xml file of your large database might be feasible. That is, > in the family xml you change the handle to the handle of the persons already > in the database via the replace command. then copy all data not yet in the > larger xml. You could keep a spreadsheet with the mapping between GRAMPS handle and the unique identifier (if present in the GEDCOM) > Error prone, but if you know how GRAMPS xml works, quite doable. All depends > how large the files are you download. > > Another option is to first import all small GEDCOMS in a new empty database, > clean that as much as possible, then import in your real database when you > are satisfied. There is also a CSV import plugin written by Doug but not part of GRAMPS (see our website) which might help somewhat (wouldn't know, > didn't try). > > Import with merge is something many people already broke their heads on. It > is very difficult to do. Some programs offer a unique identifier (UID) for > this task, but even then, it is hard to know on differing data what to do. > GRAMPS has chosen not to do things automatically, but depend on the user to > start the merging and check the data. > > Benny > > 2007/10/29, Jeremy C. Reed <re...@re...>: >> So I found some of my wife's family at familysearch.org. But they only allow downloading one family at a time. >> So now I have a mess of triplicate records. Let me explain: >> 1) Person A and family lists parents (B) >> 2) Person B and family lists child A and lists B's parents (C) and spouse. >> 3) Person C and family lists child B and lists C's parents (D) and spouse. >> So 3 GEDCOM downloads. And now I have person A twice, person B three times, and person C twice. Then I have spouses and keeping following parents. >> So I keep merging as I grab new records. Very time consuming. >> Then I start seeing in Gramps, triplicate identical birth events, and other identical events. >> And even worse, I see a husband with the same wife three times and with three sets of children (of course with identical names and identical data >> who are all merged together). And when I look at a person's siblings, I see that have duplicated brothers and sisters. >> So now I have hundreds of names and thousands of events. I downloaded probably about 50 GEDCOM files (50 families). And it is an unusable mess. >> I am going to start over from scratch. >> I'd prefer to not manually enter in the data. >> I asked their support and was told that they don't support multiple families in one download and it was suggested I try proprietary commercial >> PAF Insight. >> Do you have any suggestions on how I can put that data (one family GED file at a time) into my gramps database cleanly (no duplicated details)? >> I did see a perl-based crawler for grabbing data from familysearch.org but >> haven't tried it yet. >> Or do you have any suggestions on how I can merge this triplicate information without having the merged records containing triplicate events, duplicated siblings, etc.? >> I am using gramps as provided by Ubuntu -- version 2.2.6-1ubuntu1. >> Jeremy C. Reed >> p.s. Has anyone else seen this? |
From: Benny M. <ben...@gm...> - 2007-10-30 08:26:52
|
2007/10/30, Douglas S. Blank <db...@cs...>: > > [Moving to developers list.] > > The controversial aspect of this is the automatic merge (which is, of > course, the whole point). We can do this as a "fork" of the current GEDCOM > import, but that wouldn't be useful for anyone else, and would quickly > suffer "bitrot" as the original GEDCOM import continued to change. Perhaps > there is a way that we can work within the GRAMPS GEDCOM import? Well, in 3.0 you could make an automatic revision, then work on a the data, and do rollback to revision if problems. I always planned on working on a better (perhaps two-stage, interactive) > merge-er. But an easier option would be to have some type of option on the > Import Dialog that either: a) kept all duplicates, or b) attempted > automatic merges. This would be less controversial (I think) if there was > an "Undo Import" that was quick and painless, and readily available. Is > there? I see possibilities with automatic merge, but it should be on a unique identifier. I just have too many people with the same name to be able to have the name determine the merging or not (that is, all first sons of all children have the name of their grandfather, and are born in the same decade or so). Would developers allow a GEDCOM import to do automatic merges in the same > manner that ImportCSV works? I would rather see GEDCOM -> XML, and allow automatic merge of XML. Like this the typical problem of importing GEDCOM is separated from the merging, we allow merging of gramps family trees without the need to pass over GEDCOM, and we have power over our XML format, so we can add things there, so we could make export to XML and have extra data written needed for possible merging. Note that an overwrite in XML is already as easy as replacing the handle with the handle of an existing object. Benny |
From: Douglas S. B. <db...@cs...> - 2007-10-30 10:51:13
|
On Tue, October 30, 2007 4:26 am, Benny Malengier wrote: > > Well, in 3.0 you could make an automatic revision, then work on a the > data, > and do rollback to revision if problems. That sounds perfect for allowing such a dangerous import. That could even be a page on the import wizard. > I see possibilities with automatic merge, but it should be on a unique > identifier. I just have too many people with the same name to be able to > have the name determine the merging or not (that is, all first sons of all > children have the name of their grandfather, and are born in the same > decade > or so). Yes, ImportCVS works on unique IDs. In fact, it has syntax for marking the ID for a person as either a GRAMPS GID, or something else. The "something else" does not get merged. >> Would developers allow a GEDCOM import to do automatic merges in the same >> manner that ImportCSV works? > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. > Like this the typical problem of importing GEDCOM is separated from the > merging, we allow merging of gramps family trees without the need to pass > over GEDCOM, and we have power over our XML format, so we can add things > there, so we could make export to XML and have extra data written needed > for > possible merging. > Note that an overwrite in XML is already as easy as replacing the handle > with the handle of an existing object. Well, the main problem is all of these sites just have GEDCOM download. Are you suggesting a stand-alone GEDCOM -> XML program? Why make users do an extra step of converting to XML? What happens now in GRAMPS if you import someone with a handle that already exists in GRAMPS? We could allow automatic merge on all imports. -Doug > Benny |
From: Gerald B. <ger...@gm...> - 2007-10-30 10:47:32
|
If the familysearch gedcoms have unique ids for people, events etc them gramps import could discard the duplicates as an option. So you could start a new db, import the geds (removing dups), the open your good db and import the new db into it. On 10/30/07, Benny Malengier <ben...@gm...> wrote: > 2007/10/30, Douglas S. Blank <db...@cs...>: > > > > [Moving to developers list.] > > > > The controversial aspect of this is the automatic merge (which is, of > > course, the whole point). We can do this as a "fork" of the current GEDCOM > > import, but that wouldn't be useful for anyone else, and would quickly > > suffer "bitrot" as the original GEDCOM import continued to change. Perhaps > > there is a way that we can work within the GRAMPS GEDCOM import? > > > Well, in 3.0 you could make an automatic revision, then work on a the data, > and do rollback to revision if problems. > > I always planned on working on a better (perhaps two-stage, interactive) > > merge-er. But an easier option would be to have some type of option on the > > Import Dialog that either: a) kept all duplicates, or b) attempted > > automatic merges. This would be less controversial (I think) if there was > > an "Undo Import" that was quick and painless, and readily available. Is > > there? > > > I see possibilities with automatic merge, but it should be on a unique > identifier. I just have too many people with the same name to be able to > have the name determine the merging or not (that is, all first sons of all > children have the name of their grandfather, and are born in the same decade > or so). > > Would developers allow a GEDCOM import to do automatic merges in the same > > manner that ImportCSV works? > > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. > Like this the typical problem of importing GEDCOM is separated from the > merging, we allow merging of gramps family trees without the need to pass > over GEDCOM, and we have power over our XML format, so we can add things > there, so we could make export to XML and have extra data written needed for > possible merging. > Note that an overwrite in XML is already as easy as replacing the handle > with the handle of an existing object. > > Benny > |
From: <jul...@gm...> - 2007-10-30 11:19:40
|
Hi, AFAIK, FamilySearch does not guarantee unique, eternal, identifiers. However, the GEDCOM ID they use on the downloaded GEDCOMs has been permanent for years. It is always the same for each person in the record. I have a small Perl script that copies that value into some form of the _UID nonstandard attribute. For instance: 1 _UID IGI::I500077973070 I.e., I qualify the number with a numbering authority code ('IGI'). This way I can tell records I already downloaded and merge them together. Unconditionally doing this, however, is dangerous because there is no guarantee that the FamilySearch IDs will not change in the future, so this should only be done under very controlled circumstances. I do the same for the Vital Records Index: 1 _UID VRI-2000-ES::I4611604-1 In this case, it is much safer because CDs are immutable, the concatenation of year and region codes makes the code unique. A new CD edition would change the IDs but, at least, no unwanted merges would happen. In every case I repeatedly reuse data from a computer database I have constructed one specific algorithm to create unique IDs. One other example, from the Guipuzcoa online church records: 1 _UID DEAH:111500101-0001-0-e7289c-2 It contains the source reference, page number, record number and a partial hash of the name (this was derived by experimentation after several failed tries, you don't wanna know the pathological cases that appear). Every source that does not assign unique IDs needs ad-hoc handling. Ideally every source should generate a permanent unique ID for records it originates. Records received from other sources are not given new IDs, but records merged from different sources keep all the received IDs. Regards, Julio 2007/10/30, Gerald Britton <ger...@gm...>: > > If the familysearch gedcoms have unique ids for people, events etc > them gramps import could discard the duplicates as an option. So you > could start a new db, import the geds (removing dups), the open your > good db and import the new db into it. > > > > On 10/30/07, Benny Malengier <ben...@gm...> wrote: > > 2007/10/30, Douglas S. Blank <db...@cs...>: > > > > > > [Moving to developers list.] > > > > > > The controversial aspect of this is the automatic merge (which is, of > > > course, the whole point). We can do this as a "fork" of the current > GEDCOM > > > import, but that wouldn't be useful for anyone else, and would quickly > > > suffer "bitrot" as the original GEDCOM import continued to change. > Perhaps > > > there is a way that we can work within the GRAMPS GEDCOM import? > > > > > > Well, in 3.0 you could make an automatic revision, then work on a the > data, > > and do rollback to revision if problems. > > > > I always planned on working on a better (perhaps two-stage, interactive) > > > merge-er. But an easier option would be to have some type of option on > the > > > Import Dialog that either: a) kept all duplicates, or b) attempted > > > automatic merges. This would be less controversial (I think) if there > was > > > an "Undo Import" that was quick and painless, and readily available. > Is > > > there? > > > > > > I see possibilities with automatic merge, but it should be on a unique > > identifier. I just have too many people with the same name to be able > to > > have the name determine the merging or not (that is, all first sons of > all > > children have the name of their grandfather, and are born in the same > decade > > or so). > > > > Would developers allow a GEDCOM import to do automatic merges in the > same > > > manner that ImportCSV works? > > > > > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. > > Like this the typical problem of importing GEDCOM is separated from the > > merging, we allow merging of gramps family trees without the need to > pass > > over GEDCOM, and we have power over our XML format, so we can add things > > there, so we could make export to XML and have extra data written needed > for > > possible merging. > > Note that an overwrite in XML is already as easy as replacing the handle > > with the handle of an existing object. > > > > Benny > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > |
From: Gerald B. <ger...@gm...> - 2007-10-30 13:02:22
|
I wasn't referring to the _UID attribute, but rather the person, event, place, source and family ids. Personally, I find the _UID attribute less than useful, and run GEDCOMS through an awk progam to remove them. So my idea is: Add an option to the GRAMPS gedcom import, "Unique IDs," and have the import look up each and every person, family, place, event and source id coming in from the GEDCOM in the database to which the records are to be added. If a duplicate is found, ignore it! This would require that the import program accept the incoming ids as unique and consistent, which is wy it should be an option. Also, because of the linked structure of GED, the import might get a family first, with references to spouses and children, while the people come in later. This could be resolved through a two-pass approach or by creating placeholders for out-of-order data. So, with this approach, you would import your gedcoms, with the new option set, into a new database, then open your master database, and import the new database with the option off. On Oct 30, 2007 7:19 AM, Julio S=E1nchez <jul...@gm...> wrote: > Hi, > > AFAIK, FamilySearch does not guarantee unique, eternal, identifiers. > However, the GEDCOM ID they use on the downloaded GEDCOMs has been perman= ent > for years. It is always the same for each person in the record. > > I have a small Perl script that copies that value into some form of the _= UID > nonstandard attribute. For instance: > > 1 _UID IGI::I500077973070 > > I.e., I qualify the number with a numbering authority code ('IGI'). This > way I can tell records I already downloaded and merge them together. > > Unconditionally doing this, however, is dangerous because there is no > guarantee that the FamilySearch IDs will not change in the future, so thi= s > should only be done under very controlled circumstances. > > I do the same for the Vital Records Index: > > 1 _UID VRI-2000-ES::I4611604-1 > > In this case, it is much safer because CDs are immutable, the concatenati= on > of year and region codes makes the code unique. A new CD edition would > change the IDs but, at least, no unwanted merges would happen. > > In every case I repeatedly reuse data from a computer database I have > constructed one specific algorithm to create unique IDs. One other exampl= e, > from the Guipuzcoa online church records: > > 1 _UID DEAH:111500101-0001-0-e7289c-2 > > It contains the source reference, page number, record number and a partia= l > hash of the name (this was derived by experimentation after several faile= d > tries, you don't wanna know the pathological cases that appear). Every > source that does not assign unique IDs needs ad-hoc handling. > > Ideally every source should generate a permanent unique ID for records it > originates. Records received from other sources are not given new IDs, b= ut > records merged from different sources keep all the received IDs. > > Regards, > > Julio > > 2007/10/30, Gerald Britton <ger...@gm...>: > > > > > > > > If the familysearch gedcoms have unique ids for people, events etc > > them gramps import could discard the duplicates as an option. So you > > could start a new db, import the geds (removing dups), the open your > > good db and import the new db into it. > > > > > > > > On 10/30/07, Benny Malengier <ben...@gm...> wrote: > > > 2007/10/30, Douglas S. Blank <db...@cs... >: > > > > > > > > [Moving to developers list.] > > > > > > > > The controversial aspect of this is the automatic merge (which is, = of > > > > course, the whole point). We can do this as a "fork" of the current > GEDCOM > > > > import, but that wouldn't be useful for anyone else, and would quic= kly > > > > suffer "bitrot" as the original GEDCOM import continued to change. > Perhaps > > > > there is a way that we can work within the GRAMPS GEDCOM import? > > > > > > > > > Well, in 3.0 you could make an automatic revision, then work on a the > data, > > > and do rollback to revision if problems. > > > > > > I always planned on working on a better (perhaps two-stage, interacti= ve) > > > > merge-er. But an easier option would be to have some type of option= on > the > > > > Import Dialog that either: a) kept all duplicates, or b) attempted > > > > automatic merges. This would be less controversial (I think) if the= re > was > > > > an "Undo Import" that was quick and painless, and readily available= . > Is > > > > there? > > > > > > > > > I see possibilities with automatic merge, but it should be on a uniq= ue > > > identifier. I just have too many people with the same name to be abl= e > to > > > have the name determine the merging or not (that is, all first sons o= f > all > > > children have the name of their grandfather, and are born in the same > decade > > > or so). > > > > > > Would developers allow a GEDCOM import to do automatic merges in the > same > > > > manner that ImportCSV works? > > > > > > > > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. > > > Like this the typical problem of importing GEDCOM is separated from t= he > > > merging, we allow merging of gramps family trees without the need to > pass > > > over GEDCOM, and we have power over our XML format, so we can add thi= ngs > > > there, so we could make export to XML and have extra data written nee= ded > for > > > possible merging. > > > Note that an overwrite in XML is already as easy as replacing the han= dle > > > with the handle of an existing object. > > > > > > Benny > > > > > > > -----------------------------------------------------------------------= -- > > > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Gramps-devel mailing list > > Gra...@li... > > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > > > |
From: <jul...@gm...> - 2007-10-30 13:45:54
|
Gerald, I agree this would be useful, but I think it has a very limited applicability. It requires matching on a single-valued attribute. Notice that on FamilySearch there are many duplicates. For instance, there may be a christening record for someone, a marriage record, a burial record and then the christenings and maybe marriages of every child. The parents will be present several times with different IDs and when you merge them you wil= l choose one and only one of the IDs as the ID for the person in GRAMPS. Fro= m then on, if you try to import similar records the duplicates will not be found with an ID match except for that in the one record you chose as more important. You are back to string matching. If you go beyond the field of FamilySearch, then things become more complicated. The fact is that you can only do this safely using multivalue= d unique identifiers. And many genealogy programs, PAF included, do this wit= h _UID. It is nonstandard, but the others are either only meaninful in a LDS context or have been abused so much that they have become unusable for this= . W.r.t. the ugliness of _UID, what GRAMPS should do is hide them from sight, i.e. map it internally into "Unique Identifier" or some such and not show i= t except to, maybe, expert users (I don't hide them on GRAMPS, but I hide the= m in my version of phpGedView, except on GEDCOM extracts). It may be read on import or derived through a variety of means, _UID being just the most common option. But the core concept is keeping multivalued unique identifiers. I chose not to modify GRAMPS to match on GEDCOM IDs because they are meaningful only in some particular context and likely to clash elsewhere. = I did not find a way to make it universal and yet safe, that's why I settled for preprocessing. Of course, I'd love to have a way to do that preprocessing invisible, but I never thought it could be done safely. GEDCOMs from IGI are tagged with IGI as the producer: 0 HEAD 1 SOUR IGI 2 VERS 5.0 2 NAME International Genealogical Index (R) So it *might* be possible for a producer tag to be derived automatically, but other producers fail to put there anything useful: 0 HEAD 1 SOUR FTW 2 VERS 6.00 2 NAME Family Tree Maker for Windows [I.e. nothing that can be used as a unique-making stem] Anyway, I still don't know that the IGI IDs are guaranteed not to be recycled. 2007/10/30, Gerald Britton <ger...@gm...>: > > I wasn't referring to the _UID attribute, but rather the person, > event, place, source and family ids. Personally, I find the _UID > attribute less than useful, and run GEDCOMS through an awk progam to > remove them. > > So my idea is: > > Add an option to the GRAMPS gedcom import, "Unique IDs," and have the > import look up each and every person, family, place, event and source > id coming in from the GEDCOM in the database to which the records are > to be added. If a duplicate is found, ignore it! This would require > that the import program accept the incoming ids as unique and > consistent, which is wy it should be an option. Also, because of the > linked structure of GED, the import might get a family first, with > references to spouses and children, while the people come in later. > This could be resolved through a two-pass approach or by creating > placeholders for out-of-order data. > > So, with this approach, you would import your gedcoms, with the new > option set, into a new database, then open your master database, and > import the new database with the option off. > > On Oct 30, 2007 7:19 AM, Julio S=E1nchez <jul...@gm...> wrote: > > Hi, > > > > AFAIK, FamilySearch does not guarantee unique, eternal, identifiers. > > However, the GEDCOM ID they use on the downloaded GEDCOMs has been > permanent > > for years. It is always the same for each person in the record. > > > > I have a small Perl script that copies that value into some form of the > _UID > > nonstandard attribute. For instance: > > > > 1 _UID IGI::I500077973070 > > > > I.e., I qualify the number with a numbering authority code > ('IGI'). This > > way I can tell records I already downloaded and merge them together. > > > > Unconditionally doing this, however, is dangerous because there is no > > guarantee that the FamilySearch IDs will not change in the future, so > this > > should only be done under very controlled circumstances. > > > > I do the same for the Vital Records Index: > > > > 1 _UID VRI-2000-ES::I4611604-1 > > > > In this case, it is much safer because CDs are immutable, the > concatenation > > of year and region codes makes the code unique. A new CD edition would > > change the IDs but, at least, no unwanted merges would happen. > > > > In every case I repeatedly reuse data from a computer database I have > > constructed one specific algorithm to create unique IDs. One other > example, > > from the Guipuzcoa online church records: > > > > 1 _UID DEAH:111500101-0001-0-e7289c-2 > > > > It contains the source reference, page number, record number and a > partial > > hash of the name (this was derived by experimentation after several > failed > > tries, you don't wanna know the pathological cases that appear). Every > > source that does not assign unique IDs needs ad-hoc handling. > > > > Ideally every source should generate a permanent unique ID for records > it > > originates. Records received from other sources are not given new IDs, > but > > records merged from different sources keep all the received IDs. > > > > Regards, > > > > Julio > > > > 2007/10/30, Gerald Britton <ger...@gm...>: > > > > > > > > > > > > If the familysearch gedcoms have unique ids for people, events etc > > > them gramps import could discard the duplicates as an option. So you > > > could start a new db, import the geds (removing dups), the open your > > > good db and import the new db into it. > > > > > > > > > > > > On 10/30/07, Benny Malengier <ben...@gm...> wrote: > > > > 2007/10/30, Douglas S. Blank <db...@cs... >: > > > > > > > > > > [Moving to developers list.] > > > > > > > > > > The controversial aspect of this is the automatic merge (which is= , > of > > > > > course, the whole point). We can do this as a "fork" of the > current > > GEDCOM > > > > > import, but that wouldn't be useful for anyone else, and would > quickly > > > > > suffer "bitrot" as the original GEDCOM import continued to change= . > > Perhaps > > > > > there is a way that we can work within the GRAMPS GEDCOM import? > > > > > > > > > > > > Well, in 3.0 you could make an automatic revision, then work on a > the > > data, > > > > and do rollback to revision if problems. > > > > > > > > I always planned on working on a better (perhaps two-stage, > interactive) > > > > > merge-er. But an easier option would be to have some type of > option on > > the > > > > > Import Dialog that either: a) kept all duplicates, or b) > attempted > > > > > automatic merges. This would be less controversial (I think) if > there > > was > > > > > an "Undo Import" that was quick and painless, and readily > available. > > Is > > > > > there? > > > > > > > > > > > > I see possibilities with automatic merge, but it should be on a > unique > > > > identifier. I just have too many people with the same name to be > able > > to > > > > have the name determine the merging or not (that is, all first sons > of > > all > > > > children have the name of their grandfather, and are born in the > same > > decade > > > > or so). > > > > > > > > Would developers allow a GEDCOM import to do automatic merges in th= e > > same > > > > > manner that ImportCSV works? > > > > > > > > > > > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. > > > > Like this the typical problem of importing GEDCOM is separated from > the > > > > merging, we allow merging of gramps family trees without the need t= o > > pass > > > > over GEDCOM, and we have power over our XML format, so we can add > things > > > > there, so we could make export to XML and have extra data written > needed > > for > > > > possible merging. > > > > Note that an overwrite in XML is already as easy as replacing the > handle > > > > with the handle of an existing object. > > > > > > > > Benny > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > This SF.net email is sponsored by: Splunk Inc. > > > Still grepping through log files to find problems? Stop. > > > Now Search log events and configuration files using AJAX and a > browser. > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > _______________________________________________ > > > Gramps-devel mailing list > > > Gra...@li... > > > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > > > > > > > |
From: <jul...@gm...> - 2007-10-30 14:00:24
|
One further note, now that I have read more of the message tails. The problem reported by Jeremy I know well. I run a very modified version of 2.0 where most of these event (and source reference, etc.) duplications don't happen. I.e. the merge has to be done by hand, but duplicate events are detected and merged while merging individuals. This, combined to the fact that I preprocess the GEDCOM files and then keep the _UID tags, lets m= e remove the duplicates now and, more importantly, in the future too. Work done now does not go down the drain next time you search records. I needed this because I merge a lot. Unfortunately, I never found time to do the forward port of this. And, anyway, I'd love to have automatic merge while importing as well, it would be tremendously helpful. Regards, Julio 2007/10/30, Julio S=E1nchez <jul...@gm...>: > > Gerald, > > I agree this would be useful, but I think it has a very limited > applicability. It requires matching on a single-valued attribute. Notic= e > that on FamilySearch there are many duplicates. For instance, there may = be > a christening record for someone, a marriage record, a burial record and > then the christenings and maybe marriages of every child. The parents wi= ll > be present several times with different IDs and when you merge them you w= ill > choose one and only one of the IDs as the ID for the person in GRAMPS. F= rom > then on, if you try to import similar records the duplicates will not be > found with an ID match except for that in the one record you chose as mor= e > important. You are back to string matching. > > If you go beyond the field of FamilySearch, then things become more > complicated. The fact is that you can only do this safely using multival= ued > unique identifiers. And many genealogy programs, PAF included, do this w= ith > _UID. It is nonstandard, but the others are either only meaninful in a L= DS > context or have been abused so much that they have become unusable for th= is. > > > W.r.t. the ugliness of _UID, what GRAMPS should do is hide them from > sight, i.e. map it internally into "Unique Identifier" or some such and > not show it except to, maybe, expert users (I don't hide them on GRAMPS, = but > I hide them in my version of phpGedView, except on GEDCOM extracts). It = may > be read on import or derived through a variety of means, _UID being just = the > most common option. But the core concept is keeping multivalued unique > identifiers. > > I chose not to modify GRAMPS to match on GEDCOM IDs because they are > meaningful only in some particular context and likely to clash elsewhere.= I > did not find a way to make it universal and yet safe, that's why I settle= d > for preprocessing. Of course, I'd love to have a way to do that > preprocessing invisible, but I never thought it could be done safely. > GEDCOMs from IGI are tagged with IGI as the producer: > > 0 HEAD > 1 SOUR IGI > 2 VERS 5.0 > 2 NAME International Genealogical Index (R) > > So it *might* be possible for a producer tag to be derived automatically, > but other producers fail to put there anything useful: > > 0 HEAD > 1 SOUR FTW > 2 VERS 6.00 > 2 NAME Family Tree Maker for Windows > > [I.e. nothing that can be used as a unique-making stem] > > Anyway, I still don't know that the IGI IDs are guaranteed not to be > recycled. > > > 2007/10/30, Gerald Britton <ger...@gm...>: > > > > I wasn't referring to the _UID attribute, but rather the person, > > event, place, source and family ids. Personally, I find the _UID > > attribute less than useful, and run GEDCOMS through an awk progam to > > remove them. > > > > So my idea is: > > > > Add an option to the GRAMPS gedcom import, "Unique IDs," and have the > > import look up each and every person, family, place, event and source > > id coming in from the GEDCOM in the database to which the records are > > to be added. If a duplicate is found, ignore it! This would require > > that the import program accept the incoming ids as unique and > > consistent, which is wy it should be an option. Also, because of the > > linked structure of GED, the import might get a family first, with > > references to spouses and children, while the people come in later. > > This could be resolved through a two-pass approach or by creating > > placeholders for out-of-order data. > > > > So, with this approach, you would import your gedcoms, with the new > > option set, into a new database, then open your master database, and > > import the new database with the option off. > > > > On Oct 30, 2007 7:19 AM, Julio S=E1nchez <jul...@gm... > wro= te: > > > Hi, > > > > > > AFAIK, FamilySearch does not guarantee unique, eternal, identifiers. > > > However, the GEDCOM ID they use on the downloaded GEDCOMs has been > > permanent > > > for years. It is always the same for each person in the record. > > > > > > I have a small Perl script that copies that value into some form of > > the _UID > > > nonstandard attribute. For instance: > > > > > > 1 _UID IGI::I500077973070 > > > > > > I.e., I qualify the number with a numbering authority code > > ('IGI'). This > > > way I can tell records I already downloaded and merge them together. > > > > > > Unconditionally doing this, however, is dangerous because there is no > > > guarantee that the FamilySearch IDs will not change in the future, so > > this > > > should only be done under very controlled circumstances. > > > > > > I do the same for the Vital Records Index: > > > > > > 1 _UID VRI-2000-ES::I4611604-1 > > > > > > In this case, it is much safer because CDs are immutable, the > > concatenation > > > of year and region codes makes the code unique. A new CD edition > > would > > > change the IDs but, at least, no unwanted merges would happen. > > > > > > In every case I repeatedly reuse data from a computer database I have > > > constructed one specific algorithm to create unique IDs. One other > > example, > > > from the Guipuzcoa online church records: > > > > > > 1 _UID DEAH:111500101-0001-0-e7289c-2 > > > > > > It contains the source reference, page number, record number and a > > partial > > > hash of the name (this was derived by experimentation after several > > failed > > > tries, you don't wanna know the pathological cases that > > appear). Every > > > source that does not assign unique IDs needs ad-hoc handling. > > > > > > Ideally every source should generate a permanent unique ID for record= s > > it > > > originates. Records received from other sources are not given new > > IDs, but > > > records merged from different sources keep all the received IDs. > > > > > > Regards, > > > > > > Julio > > > > > > 2007/10/30, Gerald Britton <ger...@gm...>: > > > > > > > > > > > > > > > > If the familysearch gedcoms have unique ids for people, events etc > > > > them gramps import could discard the duplicates as an option. So yo= u > > > > could start a new db, import the geds (removing dups), the open you= r > > > > good db and import the new db into it. > > > > > > > > > > > > > > > > On 10/30/07, Benny Malengier <ben...@gm...> wrote: > > > > > 2007/10/30, Douglas S. Blank < db...@cs... >: > > > > > > > > > > > > [Moving to developers list.] > > > > > > > > > > > > The controversial aspect of this is the automatic merge (which > > is, of > > > > > > course, the whole point). We can do this as a "fork" of the > > current > > > GEDCOM > > > > > > import, but that wouldn't be useful for anyone else, and would > > quickly > > > > > > suffer "bitrot" as the original GEDCOM import continued to > > change. > > > Perhaps > > > > > > there is a way that we can work within the GRAMPS GEDCOM import= ? > > > > > > > > > > > > > > > Well, in 3.0 you could make an automatic revision, then work on a > > the > > > data, > > > > > and do rollback to revision if problems. > > > > > > > > > > I always planned on working on a better (perhaps two-stage, > > interactive) > > > > > > merge-er. But an easier option would be to have some type of > > option on > > > the > > > > > > Import Dialog that either: a) kept all duplicates, or b) > > attempted > > > > > > automatic merges. This would be less controversial (I think) if > > there > > > was > > > > > > an "Undo Import" that was quick and painless, and readily > > available. > > > Is > > > > > > there? > > > > > > > > > > > > > > > I see possibilities with automatic merge, but it should be on a > > unique > > > > > identifier. I just have too many people with the same name to be > > able > > > to > > > > > have the name determine the merging or not (that is, all first > > sons of > > > all > > > > > children have the name of their grandfather, and are born in the > > same > > > decade > > > > > or so). > > > > > > > > > > Would developers allow a GEDCOM import to do automatic merges in > > the > > > same > > > > > > manner that ImportCSV works? > > > > > > > > > > > > > > > I would rather see GEDCOM -> XML, and allow automatic merge of > > XML. > > > > > Like this the typical problem of importing GEDCOM is separated > > from the > > > > > merging, we allow merging of gramps family trees without the need > > to > > > pass > > > > > over GEDCOM, and we have power over our XML format, so we can add > > things > > > > > there, so we could make export to XML and have extra data written > > needed > > > for > > > > > possible merging. > > > > > Note that an overwrite in XML is already as easy as replacing the > > handle > > > > > with the handle of an existing object. > > > > > > > > > > Benny > > > > > > > > > > > > > > > -----------------------------------------------------------------------= -- > > > > > > > > This SF.net email is sponsored by: Splunk Inc. > > > > Still grepping through log files to find problems? Stop. > > > > Now Search log events and configuration files using AJAX and a > > browser. > > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > > _______________________________________________ > > > > Gramps-devel mailing list > > > > Gra...@li... > > > > https://lists.sourceforge.net/lists/listinfo/gramps-devel > > > > > > > > > > > > > > |
From: Douglas S. B. <db...@cs...> - 2007-10-30 14:01:06
|
On Tue, October 30, 2007 9:02 am, Gerald Britton wrote: > I wasn't referring to the _UID attribute, but rather the person, > event, place, source and family ids. Personally, I find the _UID > attribute less than useful, and run GEDCOMS through an awk progam to > remove them. Yes, I wasn't talking about a _UID either. It is beyond the GEDCOM standard, correct? Supporting it would be a bit controversial, but perhaps an option "Respect _UID on import"? > So my idea is: > > Add an option to the GRAMPS gedcom import, "Unique IDs," and have the > import look up each and every person, family, place, event and source > id coming in from the GEDCOM in the database to which the records are > to be added. If a duplicate is found, ignore it! This would require > that the import program accept the incoming ids as unique and > consistent, which is wy it should be an option. Also, because of the > linked structure of GED, the import might get a family first, with > references to spouses and children, while the people come in later. > This could be resolved through a two-pass approach or by creating > placeholders for out-of-order data. This is basically what the ImportCSV does. However, it assumes that the importing data is the most up-to-date and will change the current data if it differs. Importing data twice won't do anything, regardless. (ImportCSV only deals with a subset of genealogy data... just the basics, but most/all of what you get from sites like FamilySearch.) > So, with this approach, you would import your gedcoms, with the new > option set, into a new database, then open your master database, and > import the new database with the option off. This could happen behind the scenes on import if auto-merge is selected: 1. create a new temp db 2. select the file(s) to import and merge 3. import and merge them into temp 4. import them into current I hope we can do it without this temp DB, however. But it would require some way of identifying same individuals. In ImportCSV, I do this manually by putting the person's GRAMPS ID in the data---time consuming but safe. So, there are actually two issues: merging people using their common IDs from the source GEDCOMs, and then merging those into GRAMPS. It would be nice to support both kinds of merge. -Doug > On Oct 30, 2007 7:19 AM, Julio Sánchez <jul...@gm...> wrote: >> Hi, >> >> AFAIK, FamilySearch does not guarantee unique, eternal, identifiers. >> However, the GEDCOM ID they use on the downloaded GEDCOMs has been >> permanent >> for years. It is always the same for each person in the record. >> >> I have a small Perl script that copies that value into some form of the >> _UID >> nonstandard attribute. For instance: >> >> 1 _UID IGI::I500077973070 >> >> I.e., I qualify the number with a numbering authority code ('IGI'). >> This >> way I can tell records I already downloaded and merge them together. >> >> Unconditionally doing this, however, is dangerous because there is no >> guarantee that the FamilySearch IDs will not change in the future, so >> this >> should only be done under very controlled circumstances. >> >> I do the same for the Vital Records Index: >> >> 1 _UID VRI-2000-ES::I4611604-1 >> >> In this case, it is much safer because CDs are immutable, the >> concatenation >> of year and region codes makes the code unique. A new CD edition would >> change the IDs but, at least, no unwanted merges would happen. >> >> In every case I repeatedly reuse data from a computer database I have >> constructed one specific algorithm to create unique IDs. One other >> example, >> from the Guipuzcoa online church records: >> >> 1 _UID DEAH:111500101-0001-0-e7289c-2 >> >> It contains the source reference, page number, record number and a >> partial >> hash of the name (this was derived by experimentation after several >> failed >> tries, you don't wanna know the pathological cases that appear). Every >> source that does not assign unique IDs needs ad-hoc handling. >> >> Ideally every source should generate a permanent unique ID for records >> it >> originates. Records received from other sources are not given new IDs, >> but >> records merged from different sources keep all the received IDs. >> >> Regards, >> >> Julio >> >> 2007/10/30, Gerald Britton <ger...@gm...>: >> > >> > >> > >> > If the familysearch gedcoms have unique ids for people, events etc >> > them gramps import could discard the duplicates as an option. So you >> > could start a new db, import the geds (removing dups), the open your >> > good db and import the new db into it. >> > >> > >> > >> > On 10/30/07, Benny Malengier <ben...@gm...> wrote: >> > > 2007/10/30, Douglas S. Blank <db...@cs... >: >> > > > >> > > > [Moving to developers list.] >> > > > >> > > > The controversial aspect of this is the automatic merge (which is, >> of >> > > > course, the whole point). We can do this as a "fork" of the >> current >> GEDCOM >> > > > import, but that wouldn't be useful for anyone else, and would >> quickly >> > > > suffer "bitrot" as the original GEDCOM import continued to change. >> Perhaps >> > > > there is a way that we can work within the GRAMPS GEDCOM import? >> > > >> > > >> > > Well, in 3.0 you could make an automatic revision, then work on a >> the >> data, >> > > and do rollback to revision if problems. >> > > >> > > I always planned on working on a better (perhaps two-stage, >> interactive) >> > > > merge-er. But an easier option would be to have some type of >> option on >> the >> > > > Import Dialog that either: a) kept all duplicates, or b) >> attempted >> > > > automatic merges. This would be less controversial (I think) if >> there >> was >> > > > an "Undo Import" that was quick and painless, and readily >> available. >> Is >> > > > there? >> > > >> > > >> > > I see possibilities with automatic merge, but it should be on a >> unique >> > > identifier. I just have too many people with the same name to be >> able >> to >> > > have the name determine the merging or not (that is, all first sons >> of >> all >> > > children have the name of their grandfather, and are born in the >> same >> decade >> > > or so). >> > > >> > > Would developers allow a GEDCOM import to do automatic merges in the >> same >> > > > manner that ImportCSV works? >> > > >> > > >> > > I would rather see GEDCOM -> XML, and allow automatic merge of XML. >> > > Like this the typical problem of importing GEDCOM is separated from >> the >> > > merging, we allow merging of gramps family trees without the need to >> pass >> > > over GEDCOM, and we have power over our XML format, so we can add >> things >> > > there, so we could make export to XML and have extra data written >> needed >> for >> > > possible merging. >> > > Note that an overwrite in XML is already as easy as replacing the >> handle >> > > with the handle of an existing object. >> > > >> > > Benny >> > > >> > >> > ------------------------------------------------------------------------- >> > >> > This SF.net email is sponsored by: Splunk Inc. >> > Still grepping through log files to find problems? Stop. >> > Now Search log events and configuration files using AJAX and a >> browser. >> > Download your FREE copy of Splunk now >> http://get.splunk.com/ >> > _______________________________________________ >> > Gramps-devel mailing list >> > Gra...@li... >> > https://lists.sourceforge.net/lists/listinfo/gramps-devel >> > >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Gramps-devel mailing list > Gra...@li... > https://lists.sourceforge.net/lists/listinfo/gramps-devel > -- Douglas S. Blank Associate Professor, Bryn Mawr College http://cs.brynmawr.edu/~dblank/ Office: 610 526 6501 |
From: Jeremy C. R. <re...@re...> - 2007-10-30 15:26:13
|
gramps-devel rejected me, so I just subscribed. ... since I composed that the check and repair has been running for maybe 30 minutes. The gramps window and the Checking database windows are just all grey. ... ---------- Forwarded message ---------- Date: Tue, 30 Oct 2007 10:10:57 -0500 (CDT) From: Jeremy C. Reed <re...@re...> To: Douglas S. Blank <db...@cs...> Cc: Benny Malengier <ben...@gm...>, gra...@li... Subject: Re: [Gramps-devs] merge mess from individual familysearch.org records On Mon, 29 Oct 2007, Douglas S. Blank wrote: > [Moving to developers list.] I am not on that list. So continue to CC me. > (On the other hand, a GRAMPS-oriented crawler would be quite handy too, > and I'd be interested in exploring that approach too. Can you point me to > that?) http://family.geckotribe.com/gedcom-crawler/ And I patched gedcom-crawler.pl: -$this = pack($sockaddr, &AF_INET, 0, $thisaddr); +$this = pack($sockaddr, &AF_INET, 0, "\0\0\0\0"); I used it last night: <snip> 3024 families done, 0 in queue ................................................................................ 3025 families done, 0 in queue .................................................................... Crawl completed in 34 minutes 14 seconds 2054.38s real 10.66s user 1.72s system So now I have a huge .ged file. 3024 families. That is great. Downloading one at a time and merging one at time took me six hours for about 50 families. And my database had duplicated events, etc. So I created a new database. And imported it. Something is wrong. Going to pedigree chart for the main person (the main familyid) tht I started with the crawler doesn't show any parents. But my new database has thousands of names. I don't know if it is related but when I clicked the left arrow for children of someone who didn't have children caused error report: 1042428: ERROR: gramps.py: line 147: Unhandled exception Traceback (most recent call last): File "/usr/pkg/share/gramps/DataViews/_PedigreeView.py", line 1208, in on_show_child_menu cname = escape(NameDisplay.displayer.display(child)) File "/usr/pkg/share/gramps/NameDisplay.py", line 503, in display name = person.get_primary_name() AttributeError: 'NoneType' object has no attribute 'get_primary_name' When I looked on my console I also had: 357785: WARNING: _ReadGedcom.py: line 753: Line 120755 was not understood, so it was ignored. 363207: WARNING: _ReadGedcom.py: line 753: Line 121109 was not understood, so it was ignored. 387285: WARNING: _ReadGedcom.py: line 753: Line 129150 was not understood, so it was ignored. 438460: WARNING: _ReadGedcom.py: line 753: Line 151450 was not understood, so it was ignored. I restarted gramps. Opened the new database. It said there was some problem (I don't remember) and said to run the check and repair. So I did that. Python is at 98% cpu and has the check and repair has been going on for at least 15 minutes. Jeremy C. Reed |
From: Jeremy C. R. <re...@re...> - 2007-10-30 19:29:09
|
On Tue, 30 Oct 2007, Jeremy C. Reed wrote: > > (On the other hand, a GRAMPS-oriented crawler would be quite handy too, > > and I'd be interested in exploring that approach too. Can you point me to > > that?) > > http://family.geckotribe.com/gedcom-crawler/ > > And I patched gedcom-crawler.pl: > > -$this = pack($sockaddr, &AF_INET, 0, $thisaddr); > +$this = pack($sockaddr, &AF_INET, 0, "\0\0\0\0"); > > I used it last night: > > <snip> > 3024 families done, 0 in queue > ................................................................................ > 3025 families done, 0 in queue > .................................................................... > > Crawl completed in 34 minutes 14 seconds > > 2054.38s real 10.66s user 1.72s system > > So now I have a huge .ged file. 3024 families. That is great. Downloading > one at a time and merging one at time took me six hours for about 50 > families. And my database had duplicated events, etc. > > So I created a new database. And imported it. > > Something is wrong. Going to pedigree chart for the main person (the main > familyid) tht I started with the crawler doesn't show any parents. > > But my new database has thousands of names. > > I don't know if it is related but when I clicked the left arrow for > children of someone who didn't have children caused error report: > > 1042428: ERROR: gramps.py: line 147: Unhandled exception > Traceback (most recent call last): > File "/usr/pkg/share/gramps/DataViews/_PedigreeView.py", line 1208, in > on_show_child_menu > cname = escape(NameDisplay.displayer.display(child)) > File "/usr/pkg/share/gramps/NameDisplay.py", line 503, in display > name = person.get_primary_name() > AttributeError: 'NoneType' object has no attribute 'get_primary_name' > > > When I looked on my console I also had: > 357785: WARNING: _ReadGedcom.py: line 753: Line 120755 was not > understood, so it was ignored. > 363207: WARNING: _ReadGedcom.py: line 753: Line 121109 was not understood, > so it was ignored. > 387285: WARNING: _ReadGedcom.py: line 753: Line 129150 was not understood, > so it was ignored. > 438460: WARNING: _ReadGedcom.py: line 753: Line 151450 was not understood, > so it was ignored. > > I restarted gramps. Opened the new database. It said there was some > problem (I don't remember) and said to run the check and repair. So I did > that. Python is at 98% cpu and has the check and repair has been going on > for at least 15 minutes. Following up. I killed it after about an hour. On another system I imported the GED file. And ran the check and repair tool. And it hung and I killed it after an hour. Also the perl script referenced above had a few bogus lines reported by gramps. Which I manually fixed, such as: -0 @F1853125 @ FAM +0 @F1853125@ FAM Also looking closer at my new database -- no duplicates. That is great. The bad news is there is no association between families. So the database is useless. All it has is child and parents. But no grandparents. So I have thousands of names but not linked together. The perl script downloaded them all by following the parent's links to families so the original data is definitely together. I can provide a bzipped copy of the GED file if anyone wants to look at it. It might be useful for others' tests. I don't know a lot about the GED format, so I can't recognize if the problem (families not linked together) is in the GED file or due to the gramps import. Jeremy C. Reed |
From: <jul...@gm...> - 2007-10-30 15:27:19
|
2007/10/30, Douglas S. Blank <db...@cs...>: > > On Tue, October 30, 2007 9:02 am, Gerald Britton wrote: > > I wasn't referring to the _UID attribute, but rather the person, > > event, place, source and family ids. Personally, I find the _UID > > attribute less than useful, and run GEDCOMS through an awk progam to > > remove them. > > Yes, I wasn't talking about a _UID either. It is beyond the GEDCOM > standard, correct? Supporting it would be a bit controversial, but perhaps > an option "Respect _UID on import"? That's good. About standard alternatives, REFN could fit the bill: A user-defined number or text that the submitter uses to identify this record. For instance, it may be a record number within the submitter's automated or manual system, or it may be a page and position number on a pedigree chart. It is multivalued and takes a TYPE subtag that can make it unique. Unfortunately, no one uses it like that, preferring the non-standard _UID instead. A problem is that no definition of _UID exists that I could find. Apparently, it was introduced in PAF 5.1 in 2002 taken from an earlier draft, that I cannot find. So it is unclear if it supports my understanding of it. But, anyway, let's separate the case for GRAMPS storing multivalued unique identifiers and using them for matching purposes, from the specific mappings on import and export for different formats. The former I think is a must, for the latter I grant there is a lot of room for personal taste. Regards, Julio |
From: Benny M. <ben...@gm...> - 2007-10-30 16:30:55
|
Ok, to recap: 1/we need multiple valued attributes. For 3.0 there is the idea to introduce tags on notes instead of one type. This looks somewhat identical. One would have the field unique identifiers, and then a list of unique identifiers, one of them the handle as used in GRAMPS 2/on import, the unique identifier is searched in present database, if present, a merge is initiated, skipping double data. Is that what you are suggesting? Benny |
From: <jul...@gm...> - 2007-10-30 16:51:56
|
In a nutshell, yes. But aren't all GRAMPS attributes multivalued, I mean, they can be repeated, right? If not, then yes, some special handling is needed, but in 2.0 works as is. Or you mean one instance, with several values? It is immaterial, I think, modulo parsing issues. Regards, Julio 2007/10/30, Benny Malengier <ben...@gm...>: > > Ok, > > to recap: > > 1/we need multiple valued attributes. For 3.0 there is the idea to > introduce tags on notes instead of one type. This looks somewhat identical. > One would have the field unique identifiers, and then a list of unique > identifiers, one of them the handle as used in GRAMPS > > 2/on import, the unique identifier is searched in present database, if > present, a merge is initiated, skipping double data. > > Is that what you are suggesting? > > Benny > |
From: Benny M. <ben...@gm...> - 2007-10-30 17:06:04
|
b2ssIEkgd2FzIHRoaW5raW5nIGFib3V0CgprZXksICh2YWx1ZTEsIHZhbHVlMiwgdmFsdWUzKQoK d2hlcmVhcyB5b3Ugc2F5IGl0IGlzIG9rIHRvIGhhdmUKa2V5LCB2YWx1ZTEKa2V5LCB2YWx1ZTIK a2V5LCB2YWx1ZTMKClllYWgsIHRoZSBzYW1lIHRoaW5nLiBPbmUgd291bGQgbmVlZCBhbiBpbmRl eCB0aG91Z2ggb24gdGhlIHZhbHVlIHRvIGJlIGFibGUKdG8gcXVpY2tseSBmaW5kIGl0IGluIGEg bGFyZ2UgZGF0YWJhc2UuCgoyMDA3LzEwLzMwLCBKdWxpbyBTw6FuY2hleiA8anVsaW8uc2FuY2hl ekBnbWFpbC5jb20+Ogo+Cj4gSW4gYSBudXRzaGVsbCwgeWVzLiAgQnV0IGFyZW4ndCBhbGwgR1JB TVBTIGF0dHJpYnV0ZXMgbXVsdGl2YWx1ZWQsIEkgbWVhbiwKPiB0aGV5IGNhbiBiZSByZXBlYXRl ZCwgcmlnaHQ/ICBJZiBub3QsIHRoZW4geWVzLCBzb21lIHNwZWNpYWwgaGFuZGxpbmcgaXMKPiBu ZWVkZWQsIGJ1dCBpbiAyLjAgd29ya3MgYXMgaXMuICBPciB5b3UgbWVhbiBvbmUgaW5zdGFuY2Us IHdpdGggc2V2ZXJhbAo+IHZhbHVlcz8gIEl0IGlzIGltbWF0ZXJpYWwsIEkgdGhpbmssIG1vZHVs byBwYXJzaW5nIGlzc3Vlcy4KPgo+IFJlZ2FyZHMsCj4KPiBKdWxpbwo+Cj4gMjAwNy8xMC8zMCwg QmVubnkgTWFsZW5naWVyIDxiZW5ueS5tYWxlbmdpZXJAZ21haWwuY29tPjoKPiA+Cj4gPiBPaywK PiA+Cj4gPiB0byByZWNhcDoKPiA+Cj4gPiAxL3dlIG5lZWQgbXVsdGlwbGUgdmFsdWVkIGF0dHJp YnV0ZXMuIEZvciAzLjAgdGhlcmUgaXMgdGhlIGlkZWEgdG8KPiA+IGludHJvZHVjZSB0YWdzIG9u IG5vdGVzIGluc3RlYWQgb2Ygb25lIHR5cGUuIFRoaXMgbG9va3Mgc29tZXdoYXQgaWRlbnRpY2Fs Lgo+ID4gT25lIHdvdWxkIGhhdmUgdGhlIGZpZWxkIHVuaXF1ZSBpZGVudGlmaWVycywgYW5kIHRo ZW4gYSBsaXN0IG9mIHVuaXF1ZQo+ID4gaWRlbnRpZmllcnMsIG9uZSBvZiB0aGVtIHRoZSBoYW5k bGUgYXMgdXNlZCBpbiBHUkFNUFMKPiA+Cj4gPiAyL29uIGltcG9ydCwgdGhlIHVuaXF1ZSBpZGVu dGlmaWVyIGlzIHNlYXJjaGVkIGluIHByZXNlbnQgZGF0YWJhc2UsIGlmCj4gPiBwcmVzZW50LCBh IG1lcmdlIGlzIGluaXRpYXRlZCwgc2tpcHBpbmcgZG91YmxlIGRhdGEuCj4gPgo+ID4gSXMgdGhh dCB3aGF0IHlvdSBhcmUgc3VnZ2VzdGluZz8KPiA+Cj4gPiBCZW5ueQo+ID4KPgo+Cg== |
From: Douglas S. B. <db...@cs...> - 2007-10-30 17:21:40
|
Benny Malengier wrote: > 2/on import, the unique identifier is searched in present > database, if present, a merge is initiated, skipping double data. The only variation I can think of is whether we want to skip, or update from (if different from original) when importing. In any event, I think a summary of what happened would be quite useful, such as: The following people where just added: Smith, Ed [I001223] Smith, Edwin, Jr. [I01232] (That would be handy in general.) I just noticed the new little "info icon" in the status bar after I did an import yesterday and had a warning. Very cool! Is that new? Perhaps an info on what just occurred on import would be useful. -Doug > Is that what you are suggesting? > > Benny |
From: <jul...@gm...> - 2007-10-30 17:45:21
|
Douglas, By the way, another thing that is a matter of taste, is that I prefer the already present information on the database to take precedence over that being imported. In general, the information in the database I have already vetted. Any contradicting information is interesting, but not as a primary. If, for instance, a death event comes that contradicts what I have, I want it to be kept as alternative rather than supersede my information. Because if the new date is right, I will make it the real date when I decide so. But if my choice is superseded, everytime I import from a source that is mistaken I have to go back and fix it. I may even add a note to the "wrong" information explaining why it is wrong or what's the deal with it. This happens all the time with death/burials, many church databases list as death date what is actually burial. Or marriage records on Catholic churchs may have several different places or dates depending on exactly what rite was done, so conflicting data is common. So once I have made a decision on what is the "right" information, I don't want it relegated, let alone deleted. But this is a matter of preference, it very much depends on whether you consider your database a good source or not. I do for my main database, I have invested a lot on it, it is priceless. For other auxiliary or temporary databases, I am not so sure. Regards, Julio 2007/10/30, Douglas S. Blank <db...@cs...>: > > Benny Malengier wrote: > > 2/on import, the unique identifier is searched in present > > database, if present, a merge is initiated, skipping double > data. > > The only variation I can think of is whether we want to skip, or update > from (if different from original) when importing. > > In any event, I think a summary of what happened would be quite useful, > such as: > > The following people where just added: > > Smith, Ed [I001223] > Smith, Edwin, Jr. [I01232] > > (That would be handy in general.) > > I just noticed the new little "info icon" in the status bar after I did > an import yesterday and had a warning. Very cool! Is that new? Perhaps > an info on what just occurred on import would be useful. > > -Doug > > > Is that what you are suggesting? > > > > Benny > > |
From: Gerald B. <ger...@gm...> - 2007-10-30 18:09:01
|
Agree wholeheartedly. Additionally, we could use an exception report from the merge operation that lists anomalies to allow us to fix them On Oct 30, 2007 1:45 PM, Julio S=E1nchez <jul...@gm...> wrote: > Douglas, > > By the way, another thing that is a matter of taste, is that I prefer the > already present information on the database to take precedence over that > being imported. In general, the information in the database I have alrea= dy > vetted. Any contradicting information is interesting, but not as a prima= ry. > If, for instance, a death event comes that contradicts what I have, I wan= t > it to be kept as alternative rather than supersede my information. Becau= se > if the new date is right, I will make it the real date when I decide so. > But if my choice is superseded, everytime I import from a source that is > mistaken I have to go back and fix it. I may even add a note to the "wro= ng" > information explaining why it is wrong or what's the deal with it. This > happens all the time with death/burials, many church databases list as de= ath > date what is actually burial. Or marriage records on Catholic churchs ma= y > have several different places or dates depending on exactly what rite was > done, so conflicting data is common. So once I have made a decision on wh= at > is the "right" information, I don't want it relegated, let alone deleted. > > But this is a matter of preference, it very much depends on whether you > consider your database a good source or not. I do for my main database, = I > have invested a lot on it, it is priceless. For other auxiliary or > temporary databases, I am not so sure. > > Regards, > > Julio > > > > 2007/10/30, Douglas S. Blank <db...@cs...>: > > > Benny Malengier wrote: > > > 2/on import, the unique identifier is searched in present > > > database, if present, a merge is initiated, skipping double > data. > > > > The only variation I can think of is whether we want to skip, or update > > from (if different from original) when importing. > > > > In any event, I think a summary of what happened would be quite useful, > > such as: > > > > The following people where just added: > > > > Smith, Ed [I001223] > > Smith, Edwin, Jr. [I01232] > > > > (That would be handy in general.) > > > > I just noticed the new little "info icon" in the status bar after I did > > an import yesterday and had a warning. Very cool! Is that new? Perhaps > > an info on what just occurred on import would be useful. > > > > -Doug > > > > > Is that what you are suggesting? > > > > > > Benny > > > > > > |