From: Alex R. <sh...@gr...> - 2006-04-20 19:32:18
|
On Thu, 2006-04-20 at 22:25 +0300, Eero Tamminen wrote: > > Should we go from the longest to the shortest match in bce list? >=20 > If there's this kind of a problem, I think all of these (qual, cal, bce) > lists should be sorted from longest to shortest. It's probably enough > just to specify that all date handlers should list these from longest to > shortest, than force that in code. Not necessarily all of them from longest to shortest, but the entries that match same thing need to be sorted that way. E.g. BCE and BC. But BCE and "before common era" could be ordered either way. I don't quite see your distinction between specifying and forcing. The code should have it right. What did you mean on this one? > > Is there any reason you don't use the following? > > self._bce_re =3D re.compile("(.*)\s+(B[.]?C[.]?(E[.]?)?)( ?.*)") > > That will cover the problem with greediness and cases where > > someone isn't consistent in putting in a '.'. >=20 > It will also allow things like B.CE and BC.E, so maybe something > like following would be better: > (BCE?|B[.]C([.]E[.]?)?) See, the problems with these are that in the localized handler self._bce_re must have the exact same groupls defined. If we just compose _bce_re from the list of 'B\.C\.E\.', 'B C E', etc, then localized handlers can just re-define bce list and be done. If the number of groups varies then every handler needs to rewrite a lot of the parser, or suffer the breakage. It seems that the solution I propose will work for all handlers: bce list is defined so that entries matching same strings are ordered longest to shortest. Alex --=20 Alexander Roitman http://www.gramps-project.org |