gedcom-parse-devel Mailing List for The Gedcom parser library (Page 2)
Status: Beta
Brought to you by:
verthezp
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(10) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2003 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Peter V. <Pet...@ad...> - 2001-12-28 21:03:11
|
I've added date parsing code to the gedcom parser (I've just checked it in). At first I thought to put it in the same parser as the rest of the gedcom file, but that appeared to be hard. I would have needed to separate the date tokens in the gedcom lexer, making the lexer and its use in the parser much bigger. Anyway, it is better to keep the two levels (overall syntax and syntax of the line values) separate. As a consequence, the date parser can be called separately, via gedcom_parse_date, declared in include/gedcom.h (Perry, this might be useful for you in LifeLines - the date struct uses some of the ideas that you wrote in one of your earlier mails to me). Note that I haven't tested it too much ('make check' works...) I think I'll write some docs now... Cheers, Peter. -- =================================================================== Peter Verthez Software engineer Email at work: mailto:Pet...@al... at home: mailto:Pet...@ad... WWW: http://gallery.uunet.be/Peter.Verthez =================================================================== Smith & Wesson The original point and click interface... |
From: Peter V. <Pet...@ad...> - 2001-12-28 09:32:27
|
Hi Perry, I don't know the definition of an LR parser very well, but I think that the issues you give here are no problem for yacc/bison... prapp wrote: > > I think that GEDCOM dates are not LR for two reasons: > > 1) date_phrase matches, for example, "1 SEP 1993 or 1994" > but you don't know that is a date_phrase til you get past the 1993 > (and find out it is not a valid date) Not true. According to the Gedcom spec, date phrases have to be encapsulated in parentheses. Again, the spec is not very consistent in its notation, because it says: DATE_VALUE := ... | (<DATE_PHRASE>) DATE_PHRASE := (<TEXT>) and so you'd end up with double parentheses, but still, the DATE_PHRASE explanation says "The date phrase is enclosed in matching parentheses", so the parentheses (single then) are for real. So, the date "1 SEP 1993 or 1994" cannot even be a date phrase, it is simply not valid according to the spec. > > 2) "25" looks like the start of a day month year, but is actually > just a year That should be no problem, I think. It would be a bigger problem if a day could stand on its own as a date, because then you couldn't distinguish between the single day and the single year. LR parsers use a single token as lookahead, and on this basis, this date can be parsed perfectly. > > If I am correct, it will be more difficult to parse dates with > a bison grammar, yes ? I don't see any problems at the moment, but maybe I will further today :-) > > In fact, I decided to do a custom date parse, because I don't know enough > to handle the phrase backtracking. > (Also because I'm revising an existing date parser in LifeLines which is > custom, and is a freeform, non-LR parser). I agree that it will be difficult to have the date parsing of gedcom-parse integrated into LifeLines. I was thinking of having a separate bison parser for the dates, but in a first try I won't do that, because I want to concentrate on the parsing itself (having a separate parser means it needs again its own lexer, which gets too messy to handle now). > > (I think I'll use some context-sensitivity in calendars eventually -- if I > add support for the Islam calendar eventually, I can recognize it by the > month name, > and then can expect AH or BH as an optional trailer instead of AD or BC). From your experience with Gedcom files, what do you think the Gedcom spec means in the description of YEAR_GREG ? Is the optional suffix "(B.C.)" or "B.C."? To my interpretation, it should be the first one, but what have you seen in actual Gedcom files? > > I'm not planning to worry about a BC-equivalent for the Hebrew or Roman > calendars :) > (both go back pretty far, and probably have no standard trailer for such) I agree with that. Also because the Roman calendar is not defined at all in the spec... Although frankly neither is the Islam calendar. But I agree that this would be a useful addition to the Gedcom standard... Best regards, Peter. -- =================================================================== Peter Verthez Software engineer Email at work: mailto:Pet...@al... at home: mailto:Pet...@ad... WWW: http://gallery.uunet.be/Peter.Verthez =================================================================== Don't believe anything you read, hear or think. |
From: Peter V. <Pet...@ad...> - 2001-12-28 07:21:25
|
Hi guys, I've created a mailing list for discussions on gedcom-parse development. The mail address is: ged...@li... Best regards, Peter. -- =================================================================== Peter Verthez Software engineer Email at work: mailto:Pet...@al... at home: mailto:Pet...@ad... WWW: http://gallery.uunet.be/Peter.Verthez =================================================================== Things are more like they are now than they ever were before. - Dwight D. Eisenhower |
From: prapp <pr...@er...> - 2001-12-28 03:28:35
|
I think that GEDCOM dates are not LR for two reasons: 1) date_phrase matches, for example, "1 SEP 1993 or 1994" but you don't know that is a date_phrase til you get past the 1993 (and find out it is not a valid date) 2) "25" looks like the start of a day month year, but is actually just a year If I am correct, it will be more difficult to parse dates with a bison grammar, yes ? In fact, I decided to do a custom date parse, because I don't know enough to handle the phrase backtracking. (Also because I'm revising an existing date parser in LifeLines which is custom, and is a freeform, non-LR parser). (I think I'll use some context-sensitivity in calendars eventually -- if I add support for the Islam calendar eventually, I can recognize it by the month name, and then can expect AH or BH as an optional trailer instead of AD or BC). I'm not planning to worry about a BC-equivalent for the Hebrew or Roman calendars :) (both go back pretty far, and probably have no standard trailer for such) |