From: engelbert g. <gr...@us...> - 2008-09-01 14:26:24
|
hello, any objections to apply this patch --- docutils/nodes.py (revision 5503) +++ docutils/nodes.py (working copy) @@ -1766,13 +1766,183 @@ .. _HTML 4.01 spec: http://www.w3.org/TR/html401 .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1 """ - id = _non_id_chars.sub('-', ' '.join(string.lower().split())) + if isinstance(string, unicode): + id = string.lower().translate(_non_id_translate) + else: + try: + id = string.decode().lower().translate(_non_id_translate) + except UnicodeDecodeError: + id = string.lower() + id = _non_id_chars.sub('-', ' '.join(id.split())) id = _non_id_at_ends.sub('', id) return str(id) _non_id_chars = re.compile('[^a-z0-9]+') _non_id_at_ends = re.compile('^[-0-9]+|-+$') +_non_id_translate = { + # From Latin-1 Supplement + 0x00df: u'ss', # sharp s + 0x00e0: ord('a'), # a with grave and 180 other mappings is the test ``isinstance(string, unicode)`` required ? tests pass and i would extend test_nodes.test_make_id a little cheers |
From: David G. <go...@py...> - 2008-09-03 15:21:44
|
Please include a link next time: https://sourceforge.net/tracker/?func=detail&atid=422032&aid=1878977&group_id=38414 On Mon, Sep 1, 2008 at 10:26, engelbert gruber <gr...@us...> wrote: > any objections to apply this patch Yes, because the _non_id_translate dictionary is huge and incomplete, and most of its function can automatically be calculated from unicodedata. Far fewer explicit expansions would be necessary. See the 2008-02-03 comment from mgeisler. I would be in favor of the patch if these changes are made. Note that this would require Python 2.3, but I think that is fine. > is the test ``isinstance(string, unicode)`` required ? It shouldn't be required, but I'm not sure. Try it without. > tests pass and i would extend test_nodes.test_make_id a little -- David Goodger <http://python.net/~goodger> |
From: engelbert g. <gr...@us...> - 2008-09-04 06:48:24
|
> See the 2008-02-03 comment from mgeisler. I would be in favor of the > patch if these changes are made. Note that this would require Python > 2.3, but I think that is fine. i uploaded a patch using unicodedata.normalize this reduces the translate dictionary to 41 entries unicodedata.normalize does not exist in python2.2 possibly unicodedata.decomposition does similar but mapping from ``\u00df`` to ``sz`` is not supprted in python2.2 string.translate either. https://sourceforge.net/tracker/index.php?func=detail&aid=1878977&group_id=38414&atid=422032 |
From: David G. <go...@py...> - 2008-09-04 20:37:24
|
I uploaded an updated patch that handles Python 2.2 (disables the feature by catching exceptions). When we drop 2.2-compatibility, the try/except can be removed. Add some tests (the more comprehensive, the better) and it's good to go. https://sourceforge.net/tracker/index.php?func=detail&aid=1878977&group_id=38414&atid=422032 -- David Goodger <http://python.net/~goodger> |