From: Eero T. <ee...@us...> - 2006-04-22 10:54:58
|
Hi, On Friday 21 April 2006 21:46, Martin Hawlisch wrote: > now that the date handlers pass all tests and we are close to a new > release I would suggest to leave that as is for now (it is working) > before starting to create complicated regex. Well, those new regex is not > complicated for developers but maybe too complicates for new translators. > Simply translating a list is much more easy. There are still bugs in the date parsing that I mentioned in the earlier mail, because it seems that with regex group alternatives, the longest alternative is not automatically matched, just the first matching one. The automatic tester doesn't cover these because they are specific to strings used by the locale specific parsers. 1. Attached patch should fix following issues in the localized date parser: - In RU parser the _range1 alternatives were in an order which matches shorter form first - In ES parser the _range1 alternatives where in an order which matches the shorter ('ent' instead of 'ent.') first - In FR parser the range alternatives where in an order which matches the shorter ('ent' instead of 'ent.') first 2. In the DateParser.py it fixes similar issue with (e.g. with Islamic calendar) by improving the code in the following way: - Adding a utility method that sorts the keys for the regex so that longest key is first and quotes the '.' characters. The method returns a string containing the keys as a RE group - Changing init_strings() method to use the new utility method. (current code does this operation only for some arrays which can be overriden by the locale specific parsers and in about different way for each one...) - Removing the quoting from BCE strings as its now done by the utility method The patch modifieds the DE, FI & RU date parsers accordingly by removing quotes from their string lists (as they are now quoted automatically). Note that for 1), the RU & ES parsers could be fixed also by calling the new (self.re_longest_first) method instead of join() for the string arrays. 3. Additionally the patch contains following DataParser.py modifications: - Does not add the temporary strings to self in strings_init() (makes code slightly more readable and insignificantly faster & less memory using) - Moves the gregorian validity check from parse_calendar() to _parse_greg_julian() method as it doesn't seem to be used anywhere. This "improvement" could be dropped in case the other calendars will later get their own validation functions - Eero |