Thread: [Docutils-develop] Patch [ 1878977 ] make_id(): deaccent characters

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

hello,

any objections to apply this patch

--- docutils/nodes.py   (revision 5503)
+++ docutils/nodes.py   (working copy)
@@ -1766,13 +1766,183 @@
    .. _HTML 4.01 spec: http://www.w3.org/TR/html401
    .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1
    """
-    id = _non_id_chars.sub('-', ' '.join(string.lower().split()))
+    if isinstance(string, unicode):
+        id = string.lower().translate(_non_id_translate)
+    else:
+        try:
+            id = string.decode().lower().translate(_non_id_translate)
+        except UnicodeDecodeError:
+            id = string.lower()
+    id = _non_id_chars.sub('-', ' '.join(id.split()))
    id = _non_id_at_ends.sub('', id)
    return str(id)

 _non_id_chars = re.compile('[^a-z0-9]+')
 _non_id_at_ends = re.compile('^[-0-9]+|-+$')
+_non_id_translate = {
+    # From Latin-1 Supplement
+    0x00df: u'ss',      # sharp s
+    0x00e0: ord('a'),   # a with grave
and 180 other mappings

is the test ``isinstance(string, unicode)`` required ?

tests pass and i would extend test_nodes.test_make_id a little

cheers




docutils-develop