From: Al K. <aka...@pc...> - 2008-09-13 14:55:51
|
On Fri, 12 Sep 2008 08:22:39 am Gary Burton wrote: > Hello Al, > > Things are not as bad as they look with gedcom import/export in version 3. I have added some commentary to your bug report. > > I'm off to bed - it's late! > > Bye > > Gary Gary, Thank you for the suggestion. I followed your advice in the commentary or the bug #0002370 report. Here is a detailed record of what I did and what are the file names: 1) Create a new database. Whenever possible, I followed the defaults, so this new empty data base is called "Family Tree 1". 2) Import the XML data in example/gramps/data.gramps. Imported and populated with it the "Family Tree 1". 3) Export the whole database to a gedcom file. The exported file was named Untitled_1.ged 4) Create another database - and named it "Family Tree 2". 5) Import the gedcom file just created. This populates "Family Tree 2" 6) Export the whole database again to a second gedcom file. Called this Untitled_2.ged. 7) Compare the two gedcom files. Yes, there are significant differences, but even more telling is the preponderance of the same "1 CHAN" tag sequences, created by the export function: 1 CHAN 2 DATE 1 JAN 1970 3 TIME 10:00:00 I do not know what the tag CHAN means - would it be "CHAnge Name", per chance? Or is it the date and time of entry of the item into the gedcom, arbitrarily and temporarily assigned an arbitrary date (In this case, it should possibly be ignored at this stage.) ??? When I talk about preponderance, let me illustrate with numbers: --------------------------------------------------------------------------- apk@amd64:~/Documents/python/gramps> ./demoB.py ~/gramps-work/example/gramps/Untitled_1.ged Processed file = /home/apk/gramps-work/example/gramps/Untitled_1.ged No of lines = 32648 No of lines, starting with tag "1 CHAN",followed by wrong date and time = 2826 --------------------------------------------------------------------------- apk@amd64:~/Documents/python/gramps> ./demoB.py ~/gramps-work/example/gramps/Untitled_2.ged Processed file = /home/apk/gramps-work/example/gramps/Untitled_2.ged No of lines = 32604 No of lines, starting with tag "1 CHAN",followed by wrong date and time = 2826 apk@amd64:~/Documents/python/gramps> --------------------------------------------------------------------------- So we have 2826 sets of the spurious data sequences, created by the export function. This data bears no resemblance to anything in the data base. In the Untitled_2.ged file there are 44 fewer lines than in Untitled_1.ged. There are also significant other differences. Not huge, but significant. One is amusing - the copyright notice lost the name of the author... Actually, none of the "CHAN" sequences are reflected in the diff files - they remain unchanged. Perhaps I am a purist (I don't think so, but you might get that impression). IMHO import and export of the same format should satisfy a sanity check - no loss or change of information ( except the losses during the initial exportation due to the inadequacy of the format). I will happily provide detailed information - in fact I would like to do that. However, there is not much point in doing so if after reading the above you think that "Things are not as bad as they look with gedcom import/export". I think it boils down to the question whether the gedcom import/export is important or not. There are some remarkably good results - the Slavic names in "Kirilica" script and the (probably) Japanese names in Kanji (Chinese) script are transmitted successfully. The miracle of utf-8 - impressive! Kind regards and thank you for your time, Al. PS: It's 00:48 hrs here and I shall return to bed. I did take a sleeping, too... Here is the listing of a little Python script to chase up the "CHAN" sequences: #!/usr/bin/env python # demoB.py - program to count repeated occurances of # 1 CHAN # 2 DATE 1 JAN 1970 # 3 TIME 10:00:00 # in a GRAMPS generated .ged file # Useage: ./demoBug.py <fileName> import os import sys def prolog(): try: ## fileName = '/home/apk/gramps-work/example/gramps/Untitled_1.ged' fileName = sys.argv[1] except IndexError: print 'File Name and path should be passed as parameter.' sys.exit(1) try: f = file(fileName,'r') except IOError: print fileName,' = File was not found.' sys.exit(1) return f,fileName def push(stack,line,n=3): stack.append(line) if len(stack) > 3: stack = stack[-3:] return stack def epilog(f,fileName,countLines,countBugs): f.close() print print 'Processed file = ', fileName print 'No of lines = ', countLines print 'No of lines, starting with tag "1 CHAN",followed by' print 'wrong date and time = ', countBugs if __name__ == '__main__': f,fileName = prolog() countLines = 0 countBugs = 0 stack = [ ] for line in f: countLines += 1 stack = push(stack,line) if stack == ['1 CHAN\n', '2 DATE 1 JAN 1970\n', '3 TIME 10:00:00\n']: countBugs += 1 epilog(f,fileName,countLines,countBugs) A. -- Algis Kabaila, http://akabaila.pcug.org.au/StructuralAnalysis |