Re: [Gramps-users] problème gedcom

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, 12 Sep 2008 08:22:39 am Gary Burton wrote:
> Hello Al,
> 
> Things are not as bad as they look with gedcom import/export in version 3. I 
have added some commentary to your bug report.
> 
> I'm off to bed - it's late!
> 
> Bye
> 
> Gary

Gary,

Thank you for the suggestion. I followed your advice in the commentary or the 
bug #0002370 report.  Here is a detailed record of what I did and what are 
the file names:

 1) Create a new database. Whenever possible, I followed the defaults, so this 
new empty data base is called "Family Tree 1".

 2) Import the XML data in example/gramps/data.gramps. Imported and populated 
with it the "Family Tree 1".

 3) Export the whole database to a gedcom file. The exported file was named 
Untitled_1.ged

4) Create another database - and named it "Family Tree 2".
 5) Import the gedcom file just created. This populates "Family Tree 2"
 6) Export the whole database again to a second gedcom file. Called this 
Untitled_2.ged.
 7) Compare the two gedcom files. Yes, there are significant differences, but 
even more telling is the preponderance of the same "1 CHAN" tag sequences, 
created by the export function:
 1 CHAN
 2 DATE 1 JAN 1970
 3 TIME 10:00:00
I do not know what the tag CHAN means - would it be "CHAnge Name", per chance? 
Or is it the date and time of entry of the item into the gedcom, arbitrarily 
and temporarily assigned an arbitrary date (In this case, it should possibly 
be ignored at this stage.) ???

When I talk about preponderance, let me illustrate with numbers:
---------------------------------------------------------------------------
apk@amd64:~/Documents/python/gramps> ./demoB.py  
~/gramps-work/example/gramps/Untitled_1.ged

Processed file =  /home/apk/gramps-work/example/gramps/Untitled_1.ged
No of lines    =  32648
No of lines, starting with tag "1 CHAN",followed by
wrong date and time =  2826
---------------------------------------------------------------------------
apk@amd64:~/Documents/python/gramps> ./demoB.py  
~/gramps-work/example/gramps/Untitled_2.ged

Processed file =  /home/apk/gramps-work/example/gramps/Untitled_2.ged
No of lines    =  32604
No of lines, starting with tag "1 CHAN",followed by
wrong date and time =  2826
apk@amd64:~/Documents/python/gramps>
---------------------------------------------------------------------------

So we have 2826 sets of the spurious data sequences, created by the export 
function.  This data bears no resemblance to anything in the data base.  In 
the Untitled_2.ged file there are 44 fewer lines than in Untitled_1.ged.  
There are also significant other differences.  Not huge, but significant. One 
is amusing - the copyright notice lost the name of the author...   

 Actually, none of the "CHAN" sequences are reflected in the diff files - they 
remain unchanged.

Perhaps I am a purist (I don't think so, but you might get that impression).  
IMHO import and export of the same format should  satisfy a sanity check - no 
loss or change of information ( except the losses during the initial 
exportation due to the inadequacy of the format).

I will happily provide detailed information - in fact I would like to do that. 
However, there is not much point in doing so if after reading the above you 
think that  "Things are not as bad as they look with gedcom import/export".  
I think it boils down to the question whether the gedcom import/export is 
important or not.

There are some remarkably good results - the Slavic names in "Kirilica" script 
and the (probably) Japanese names in Kanji (Chinese) script are transmitted 
successfully. The miracle of utf-8 - impressive!

Kind regards and thank you for your time,

Al.

PS: It's 00:48 hrs here and I shall return to bed.  I did take a sleeping, 
too...
Here is the listing of a little Python script to chase up the "CHAN" 
sequences:
#!/usr/bin/env python
# demoB.py - program to count repeated occurances of
# 1 CHAN
# 2 DATE 1 JAN 1970
# 3 TIME 10:00:00
# in a GRAMPS generated .ged file
# Useage: ./demoBug.py <fileName>

import os
import sys

def prolog():
    try:
##        fileName = '/home/apk/gramps-work/example/gramps/Untitled_1.ged'
        fileName = sys.argv[1]
    except IndexError:
        print 'File Name and path should be passed as parameter.'
        sys.exit(1)       
    try:
        f = file(fileName,'r')
    except IOError:
        print fileName,' = File was not found.'
        sys.exit(1)
    return  f,fileName

def push(stack,line,n=3):
    stack.append(line)
    if len(stack) > 3:
        stack = stack[-3:]
    return stack

def epilog(f,fileName,countLines,countBugs):
    f.close()
    print
    print 'Processed file = ', fileName
    print 'No of lines    = ', countLines
    print 'No of lines, starting with tag "1 CHAN",followed by'
    print 'wrong date and time = ', countBugs   

if __name__ == '__main__':

    f,fileName = prolog()
    countLines = 0
    countBugs  = 0
    stack = [ ]
    for line in f:
        countLines += 1
        stack = push(stack,line)
        if stack == ['1 CHAN\n', '2 DATE 1 JAN 1970\n', '3 TIME 10:00:00\n']:
            countBugs += 1
    epilog(f,fileName,countLines,countBugs)

A.
-- 
Algis Kabaila,
http://akabaila.pcug.org.au/StructuralAnalysis

Re: [Gramps-users] problème gedcom

Gramps, the open source genealogy program

Re: [Gramps-users] problème gedcom