I'm using GEDCOM.Net to provide more complete GEDCOM parsing in my project, Family Lines (see http://familylines.codeplex.com/). So far, GEDCOM.Net has proven to be quite a bit faster and more complete than the original parsing code.
I've run into a few problems parsing some GED files from the web. The issues, and my fixes, are added as replies.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In AnselEncoding.cs, I ran into a length/off-by-one problem. I have a hack fix in place but don't yet understand the "real" issue.
In AnselEncoding.cs, GetChars(), at line 420, change these lines:
chars = (char)ucs_;
combine = true;
to:
if (charIndex + 1 < chars.Length)
{
chars = (char) ucs;
combine = true;
}
_
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In GedcomRecordReader.cs, I encountered some GED files which had empty lines for the TEXT tag associated to SOURCE tags. This caused the parser to drop the TEXT. In ReadSourceRecord(), at line 3099, change this line:
if (_lineValueType == GedcomLineValueType.DataType)
to:
if (_lineValueType != GedcomLineValueType.PointerType)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I encountered a number of GED files that stated their encoding was "ASCII". They were in fact 8-bit ASCII, but StreamReader treats "ASCII" as 7-bit not 8-bit.
My "fix" for this in GedcomReader.cs was to treat "ASCII" the same as "ANSI". In ReadHeaderRecord(), at line 1063, disable the existing "ASCII" case and add it together with "ANSI" at line 1050, as follows:
case "ANSI":
case "ASCII":
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
GedcomDate.cs cannot be used to parse more than one date string using a single instance. Namely, once the DatePeriod is set for a parse, all subsequent parses will be "stuck" with that DatePeriod value. In GedcomDate.cs, ParseDateString, before line 538, I added:
DatePeriod = GedcomDatePeriod.Exact;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using GEDCOM.Net to provide more complete GEDCOM parsing in my project, Family Lines (see http://familylines.codeplex.com/). So far, GEDCOM.Net has proven to be quite a bit faster and more complete than the original parsing code.
I've run into a few problems parsing some GED files from the web. The issues, and my fixes, are added as replies.
In AnselEncoding.cs, I ran into a length/off-by-one problem. I have a hack fix in place but don't yet understand the "real" issue.
In AnselEncoding.cs, GetChars(), at line 420, change these lines:
chars = (char)ucs_;
combine = true;
to:
if (charIndex + 1 < chars.Length)
{
chars = (char) ucs;
combine = true;
}
_
In GedcomRecordReader.cs, I encountered some GED files which had empty lines for the TEXT tag associated to SOURCE tags. This caused the parser to drop the TEXT. In ReadSourceRecord(), at line 3099, change this line:
to:
I encountered a number of GED files that stated their encoding was "ASCII". They were in fact 8-bit ASCII, but StreamReader treats "ASCII" as 7-bit not 8-bit.
My "fix" for this in GedcomReader.cs was to treat "ASCII" the same as "ANSI". In ReadHeaderRecord(), at line 1063, disable the existing "ASCII" case and add it together with "ANSI" at line 1050, as follows:
In GedcomDate.cs, I found it helpful to have CompareByDate() have support for null parameters. Change lines 413-414 to:
GedcomDate.cs cannot be used to parse more than one date string using a single instance. Namely, once the DatePeriod is set for a parse, all subsequent parses will be "stuck" with that DatePeriod value. In GedcomDate.cs, ParseDateString, before line 538, I added:
An editorial comment, as I don't yet have a fix. GedcomDate does not handle B.C. dates:
- a "B.C." trailer is parsed, but is thrown away
- the use of DateTime precludes support for B.C. as the DateTime class MinValue is 1 A.D.