Re: [Gedcom-parse-devel] DATE is not LR (I think)
Status: Beta
Brought to you by:
verthezp
From: Peter V. <Pet...@ad...> - 2001-12-28 09:32:27
|
Hi Perry, I don't know the definition of an LR parser very well, but I think that the issues you give here are no problem for yacc/bison... prapp wrote: > > I think that GEDCOM dates are not LR for two reasons: > > 1) date_phrase matches, for example, "1 SEP 1993 or 1994" > but you don't know that is a date_phrase til you get past the 1993 > (and find out it is not a valid date) Not true. According to the Gedcom spec, date phrases have to be encapsulated in parentheses. Again, the spec is not very consistent in its notation, because it says: DATE_VALUE := ... | (<DATE_PHRASE>) DATE_PHRASE := (<TEXT>) and so you'd end up with double parentheses, but still, the DATE_PHRASE explanation says "The date phrase is enclosed in matching parentheses", so the parentheses (single then) are for real. So, the date "1 SEP 1993 or 1994" cannot even be a date phrase, it is simply not valid according to the spec. > > 2) "25" looks like the start of a day month year, but is actually > just a year That should be no problem, I think. It would be a bigger problem if a day could stand on its own as a date, because then you couldn't distinguish between the single day and the single year. LR parsers use a single token as lookahead, and on this basis, this date can be parsed perfectly. > > If I am correct, it will be more difficult to parse dates with > a bison grammar, yes ? I don't see any problems at the moment, but maybe I will further today :-) > > In fact, I decided to do a custom date parse, because I don't know enough > to handle the phrase backtracking. > (Also because I'm revising an existing date parser in LifeLines which is > custom, and is a freeform, non-LR parser). I agree that it will be difficult to have the date parsing of gedcom-parse integrated into LifeLines. I was thinking of having a separate bison parser for the dates, but in a first try I won't do that, because I want to concentrate on the parsing itself (having a separate parser means it needs again its own lexer, which gets too messy to handle now). > > (I think I'll use some context-sensitivity in calendars eventually -- if I > add support for the Islam calendar eventually, I can recognize it by the > month name, > and then can expect AH or BH as an optional trailer instead of AD or BC). From your experience with Gedcom files, what do you think the Gedcom spec means in the description of YEAR_GREG ? Is the optional suffix "(B.C.)" or "B.C."? To my interpretation, it should be the first one, but what have you seen in actual Gedcom files? > > I'm not planning to worry about a BC-equivalent for the Hebrew or Roman > calendars :) > (both go back pretty far, and probably have no standard trailer for such) I agree with that. Also because the Roman calendar is not defined at all in the spec... Although frankly neither is the Islam calendar. But I agree that this would be a useful addition to the Gedcom standard... Best regards, Peter. -- =================================================================== Peter Verthez Software engineer Email at work: mailto:Pet...@al... at home: mailto:Pet...@ad... WWW: http://gallery.uunet.be/Peter.Verthez =================================================================== Don't believe anything you read, hear or think. |