On 26/09/11 10:21, Doug Blank wrote:
On Sun, Sep 25, 2011 at 4:27 PM, Ken <ken.mymail@gmail.com> wrote:
Hello All,

    Thank you Heinz, great work here.
    This looks like a great start towards something that is desperately
needed in Gramps.
    For those of you lucky enough to be able to write the code I do have a
problem with using the Id's as a method of matching.
    I share data with a relative I've also got to use Gramps. Recently I
sent a database and they had added people, events and notes and returned a
Gramps file to me. After imported their database into a backup of my databse
a lot of people, events and notes had become duplicate enteries. On doing
some investigating I found they had run the [Tools > Family Tree Processing
Reorder Gramps ID's ] tool. This had allocated new ID's to people, events,
notes etc.
    May I suggest that the comparison be between a persons name, date of
birth, date of death, events, notes, sources etc. This would also be needed
when comparing events, sources and notes etc.
    I do understand that this would be a long and probably slow process, so
maybe this type of comparison could be an option.
    I would give a much better merge, and save the end user a lot of time
fixing the database. The way I see such a tool is to do just that, saving
work and time.
Agreed that sometimes this might take some time, but that is ok.

In your example, the issue could almost be avoided completely with a
UID as outlined in [1]. Thus, even if a user changed some important
information (like name or ID), the UID would still be retained and
could be used to match. A nice UI could allow one to ignore those
minor/irrelevant differences (or include them), as found by a tool
like Heinz's.

I think this is the time to keep a set of UIDs for each person.

-Doug

[1] http://www.gramps-project.org/wiki/index.php?title=GEPS_009:_Import_Export_Merge
Thanks Doug.
I've read your link. That would be good if using UID did avoid the problem.
I have tried your method give earlier in this thread, using a copy of the original file I sent and the file returned to me.
I get the following error:
Mark all entries related to changed person
Traceback (most recent call last):
  File "GrampsCompare.py", line 349, in <module>
    getRelatedNodes(db2, comparePerson, ns2l, compMainNodes, compRelNodes)
  File "GrampsCompare.py", line 93, in getRelatedNodes
    for subnode in node:    # Recursive call for all subnodes
TypeError: 'NoneType' object is not iterable

Ken.

Kind Regards,
Ken Benseman.
New Zealand.

On 22/09/11 21:47, derHeinzi wrote:

Hello Developers,
this is quite a long message to a difficult matter. So bear with me.

Please find attached a python script for comparing data in 2 Gramps xml
files.
http://gramps.1791082.n4.nabble.com/file/n3832887/GrampsCompare.py
GrampsCompare.py
The comparison is done in both databases starting with the "same" person,
which you have to specify.
For test:
- Create gramps xml-files by unzipping two gramps archives.
- Find IDs of the same "key"-person in both files.
- Start script with parameters firstFile, firstID, secondFile, secondID
- Output is written to screen. You might want to redirect to file.
It is not (yet) a tool to compare entries in 2 different databases but you
can already find the changes that have been done to a database you shared
with some other person or to a backup you did some time ago.

I'm currently working with Gramps 3.2.5-1 on WinXP and used xml-files from
this version for program development and test. But as long as the attributes
"id" "handle" and "hlink" are in the xml it should work for other versions
as well.

My intention was to find a possible way to a database compare and merge in
Gramps. Following the devs and users mailing lists for quite a while now
this matter came up from time to time, but I found no concept of solution
discussed and no hint that someone is working on this right now.
GrampsConnect was mentioned in some posts, but I did not check if there is a
concept or solution there yet.

After hacking a quick database compare script which took several hours to
complete a run on my database with about 3000 people, I tweaked the script
to now finish in less than a minute. This should be an amount of time that a
user could accept for a complicated function to finish?
Since I'm fairly new to Python and using this script as a way to learn the
language there might be even better ways to do things. But hey, I'm proud of
what I achieved in these few evenings! :-)
I added tons of comments to the code to make you guys understand what the
script is doing! So have fun with it. I don't claim any (copy)rights.

Now, what could a database compare and merge look like in the gui and what
is still do be done.

First to the GUI of a compare and merge. If you look at the compare and
merge window for a person in Gramps you see the person and related info side
by side.
This could be changed to a display as shown in attached cmpwin.png.
http://gramps.1791082.n4.nabble.com/file/n3832887/cmpwin.png cmpwin.png
The changes are:
- For every subnode type (tag) in the database there is a "section" in the
window. (For person eg. gender, names, ...
- Only subnodes without handles are displayed for comparison.
- All subnodes referring to other nodes with handles are shown in two lists
(If a compare by script was performed beforehand the list entrys might show
different colors for identical, changed and missing references.) The first
list contains the items that "match" items in the other database, the second
one shows the items that do not match or could match more than one items in
the other database.
- There is a means to "link" nodes from both databases. (The button with the
"=" between the lower lists. The broken or unbroken chain symbol on a button
would be more appropriate.) If you see the 2 marriage entries in this
example on the right side, you have to decide which of these is already in
the left database. So you select first the left and then one of the right
marriage events to see the data in the quickview. If you find them to refer
to the same marriage, you select both and press the "=" button to link these
2 entries. They will be moved to the lists above (matched information).
- If you doubleclick (there could be a button for this) an entry in the
matched information list, the content of the window is replaced by the
content of the selected node (e.g. the family from the childof reference).
There could be a "back" or even a "history" button for navigation like in
browsers.
- With the "+" button you add information from the 2nd database, with "<"
you replace information.
- The window looks and works the same for all comparisons, no matter if
events, persons, families, ... are compared.

What do you think of this GUI concept (not design!, thats far from nice)? Do
you think this could be a way to handle data from 2 databases?

Now to the question what still has to be done.
- The standalone script has first to be improved and in the end to be
integrated into Gramps.
- The comparison of the data nodes currently does not have a "closer look"
at the data. The data itself has to be taken into account. Eg. currently
only a check attributesDB1 == attributesDB2 is done. This should ignore the
'change' attribute.
- In the end this function must not rely on same IDs or handles but has to
check the data itself. This might get a bit complicated, but I think with
the approach that you only have to check the data referred to by already
matched nodes it can be solved and handled. Even for comparison with a
Gramps xml from an imported GEDCOM.
- It might be useful to generate an output file with the found differences
for further evaluation in Gramps or other software?

My excuses for this long post. I hope it was worth reading and clear enough
to make you understand what I try to say. My mother tongue is not English
which makes it a little difficult to explain complicated things.

Kind regards and have fun
Heinz


--
View this message in context:
http://gramps.1791082.n4.nabble.com/Database-compare-and-merge-tp3832887p3832887.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Gramps-devel mailing list
Gramps-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gramps-devel


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Gramps-devel mailing list
Gramps-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gramps-devel