From: Kees B. <kee...@xs...> - 2007-02-24 09:24:35
|
Op zaterdag 24 februari 2007 01:43, schreef Don Allingham: > This is an crash message, not a syntax error. The problem here is that > I'm working on improving the character set recognition, and I don't have > every thing set correctly yet. The GEDCOM parser is a two pass parser, > and it determines the character set on the first pass, and passes the > information to the second pass so that the second pass can read the data > correctly. >=20 > Right now, the first pass is not passing the character set to the second > pass yet. So any non-UTF8 characters that are found before the > "CHAR" (character set) token in the GEDCOM file will cause a translation > error. Ah, I see. But you (sort of) asked for testers of the GEDCOM import. That's why I tried. >=20 > This will be corrected in the near future. Right now, I'm having issues > of properly detecting UTF16, so the character sets are not fully working > yet. Linus Torvalds once said: "Sadly, when MS makes a whopper of a mistake (and they do it all too often), we're left having to work with the resulting breakage." UTF-16 is one of those mistakes. Anyway, support for reading GEDCOM in UTF-16 is needed to enable more people to switch to Gramps, right? But how much effort is making that worthwhile? >=20 > However, if you don't have non-UTF8 characters in the first few lines of > the file (which most files don't have), you should be okay. Typically > this is a problem for GEDCOM files created by programs that have > non-ASCII characters in their names or address (these lines appear > before the CHAR line, so the character set is not set correctly). That is probably it. My GEDCOM is created by ProGen, and the first lines ar= e: 0 HEAD 1 SOUR PRO-GEN 2 VERS 3.0b-p12 2 CORP PRO-GEN Genealogie =E1a la Carte 3 WWW www.pro-gen.nl 1 DEST PRO-GEN 1 DATE 23 FEB 2007 1 SUBM @S1@ 1 FILE BAKKER.GED 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 0 @S1@ SUBM 1 NAME A.C. Bakker The CORP line should in fact be "PRO-GEN Genealogie =E0 la Carte". I'll strip it for the time being, and see how far it goes. >=20 > Personally, I find this to be a major flaw in GEDCOM. >=20 > Don =2D- Kees |