On Fri, Apr 19, 2013 at 1:25 AM, Edward K. Ream <edreamleo@gmail.com> wrote:

Simple changes to the script yields this very fast way to compute the delimiters:

ch = unichr if sys.version_info < (3,) else chr
delimiters = ''.join([
    ch(45), # hyphen-minus
       [snip]
    ]

It's more elegant to do the following:

delim_list = [
    45, # hyphen-minus
    47, # solidus
    58, # colon
    [snip]
]
ch = unichr if sys.version_info < (3,) else chr
delimiters = ''.join([ch(i) for i in delim_list])
comments = '\n'.join(['%s, # %s %s' % (i,ch(i),unicodedata.name(ch(i)).lower()) for i in delim_list])

Sending the comments string to Leo's log pane, rather than the Windows console, yields::

45, # - hyphen-minus
47, # / solidus
58, # : colon
92, # \ reverse solidus
161, # ¡ inverted exclamation mark
183, # · middle dot
191, # ¿ inverted question mark
894, # ; greek question mark
903, # · greek ano teleia
1370, # ՚ armenian apostrophe
1371, # ՛ armenian emphasis mark
1372, # ՜ armenian exclamation mark
1373, # ՝ armenian comma
1374, # ՞ armenian question mark
1375, # ՟ armenian abbreviation mark
1417, # ։ armenian full stop
1418, # ֊ armenian hyphen
1470, # ־ hebrew punctuation maqaf
1472, # ׀ hebrew punctuation paseq
1475, # ׃ hebrew punctuation sof pasuq
1478, # ׆ hebrew punctuation nun hafukha
1523, # ׳ hebrew punctuation geresh
1524, # ״ hebrew punctuation gershayim
1545, # ؉ arabic-indic per mille sign
1546, # ؊ arabic-indic per ten thousand sign
1548, # ، arabic comma
1549, # ؍ arabic date separator
1563, # ؛ arabic semicolon
1566, # ؞ arabic triple dot punctuation mark
1567, # ؟ arabic question mark
1642, # ٪ arabic percent sign
1643, # ٫ arabic decimal separator
1644, # ٬ arabic thousands separator
1645, # ٭ arabic five pointed star
1748, # ۔ arabic full stop
1792, # ܀ syriac end of paragraph
1793, # ܁ syriac supralinear full stop
1794, # ܂ syriac sublinear full stop
1795, # ܃ syriac supralinear colon
1796, # ܄ syriac sublinear colon
1797, # ܅ syriac horizontal colon
1798, # ܆ syriac colon skewed left
1799, # ܇ syriac colon skewed right
1800, # ܈ syriac supralinear colon skewed left
1801, # ܉ syriac sublinear colon skewed right
1802, # ܊ syriac contraction
1803, # ܋ syriac harklean obelus
1804, # ܌ syriac harklean metobelus
1805, # ܍ syriac harklean asteriscus
2039, # ߷ nko symbol gbakurunen
2040, # ߸ nko comma
2041, # ߹ nko exclamation mark
2096, # ࠰ samaritan punctuation nequdaa
2097, # ࠱ samaritan punctuation afsaaq
2098, # ࠲ samaritan punctuation anged
2099, # ࠳ samaritan punctuation bau
2100, # ࠴ samaritan punctuation atmaau
2101, # ࠵ samaritan punctuation shiyyaalaa
2102, # ࠶ samaritan abbreviation mark
2103, # ࠷ samaritan punctuation melodic qitsa
2104, # ࠸ samaritan punctuation ziqaa
2105, # ࠹ samaritan punctuation qitsa
2106, # ࠺ samaritan punctuation zaef
2107, # ࠻ samaritan punctuation turu
2108, # ࠼ samaritan punctuation arkaanu
2109, # ࠽ samaritan punctuation sof mashfaat
2110, # ࠾ samaritan punctuation annaau
2404, # । devanagari danda
2405, # ॥ devanagari double danda
2416, # ॰ devanagari abbreviation sign
3572, # ෴ sinhala punctuation kunddaliya
3663, # ๏ thai character fongman
3674, # ๚ thai character angkhankhu
3675, # ๛ thai character khomut
3844, # ༄ tibetan mark initial yig mgo mdun ma
3845, # ༅ tibetan mark closing yig mgo sgab ma
3846, # ༆ tibetan mark caret yig mgo phur shad ma
3847, # ༇ tibetan mark yig mgo tsheg shad ma
3848, # ༈ tibetan mark sbrul shad
3849, # ༉ tibetan mark bskur yig mgo
3850, # ༊ tibetan mark bka- shog yig mgo
3851, # ་ tibetan mark intersyllabic tsheg
3852, # ༌ tibetan mark delimiter tsheg bstar
3853, # ། tibetan mark shad
3854, # ༎ tibetan mark nyis shad
3855, # ༏ tibetan mark tsheg shad
3856, # ༐ tibetan mark nyis tsheg shad
3857, # ༑ tibetan mark rin chen spungs shad
3858, # ༒ tibetan mark rgya gram shad
3973, # ྅ tibetan mark paluta
4048, # ࿐ tibetan mark bska- shog gi mgo rgyan
4049, # ࿑ tibetan mark mnyam yig gi mgo rgyan
4050, # ࿒ tibetan mark nyis tsheg
4051, # ࿓ tibetan mark initial brda rnying yig mgo mdun ma
4052, # ࿔ tibetan mark closing brda rnying yig mgo sgab ma
4170, # ၊ myanmar sign little section
4171, # ။ myanmar sign section
4172, # ၌ myanmar symbol locative
4173, # ၍ myanmar symbol completed
4174, # ၎ myanmar symbol aforementioned
4175, # ၏ myanmar symbol genitive
4347, # ჻ georgian paragraph separator
4961, # ፡ ethiopic wordspace
4962, # ። ethiopic full stop
4963, # ፣ ethiopic comma
4964, # ፤ ethiopic semicolon
4965, # ፥ ethiopic colon
4966, # ፦ ethiopic preface colon
4967, # ፧ ethiopic question mark
4968, # ፨ ethiopic paragraph separator
5120, # ᐀ canadian syllabics hyphen
5741, # ᙭ canadian syllabics chi sign
5742, # ᙮ canadian syllabics full stop
5867, # ᛫ runic single punctuation
5868, # ᛬ runic multiple punctuation
5869, # ᛭ runic cross punctuation
5941, # ᜵ philippine single punctuation
5942, # ᜶ philippine double punctuation
6100, # ។ khmer sign khan
6101, # ៕ khmer sign bariyoosan
6102, # ៖ khmer sign camnuc pii kuuh
6104, # ៘ khmer sign beyyal
6105, # ៙ khmer sign phnaek muan
6106, # ៚ khmer sign koomuut
6144, # ᠀ mongolian birga
6145, # ᠁ mongolian ellipsis
6146, # ᠂ mongolian comma
6147, # ᠃ mongolian full stop
6148, # ᠄ mongolian colon
6149, # ᠅ mongolian four dots
6150, # ᠆ mongolian todo soft hyphen
6151, # ᠇ mongolian sibe syllable boundary marker
6152, # ᠈ mongolian manchu comma
6153, # ᠉ mongolian manchu full stop
6154, # ᠊ mongolian nirugu
6468, # ᥄ limbu exclamation mark
6469, # ᥅ limbu question mark
6622, # ᧞ new tai lue sign lae
6623, # ᧟ new tai lue sign laev
6686, # ᨞ buginese pallawa
6687, # ᨟ buginese end of section
6816, # ᪠ tai tham sign wiang
6817, # ᪡ tai tham sign wiangwaak
6818, # ᪢ tai tham sign sawan
6819, # ᪣ tai tham sign keow
6820, # ᪤ tai tham sign hoy
6821, # ᪥ tai tham sign dokmai
6822, # ᪦ tai tham sign reversed rotated rana
6824, # ᪨ tai tham sign kaan
6825, # ᪩ tai tham sign kaankuu
6826, # ᪪ tai tham sign satkaan
6827, # ᪫ tai tham sign satkaankuu
6828, # ᪬ tai tham sign hang
6829, # ᪭ tai tham sign caang
7002, # ᭚ balinese panti
7003, # ᭛ balinese pamada
7004, # ᭜ balinese windu
7005, # ᭝ balinese carik pamungkah
7006, # ᭞ balinese carik siki
7007, # ᭟ balinese carik pareren
7008, # ᭠ balinese pameneng
7227, # ᰻ lepcha punctuation ta-rol
7228, # ᰼ lepcha punctuation nyet thyoom ta-rol
7229, # ᰽ lepcha punctuation cer-wa
7230, # ᰾ lepcha punctuation tshook cer-wa
7231, # ᰿ lepcha punctuation tshook
7294, # ᱾ ol chiki punctuation mucaad
7295, # ᱿ ol chiki punctuation double mucaad
7379, # ᳓ vedic sign nihshvasa
8208, # ‐ hyphen
8209, # ‑ non-breaking hyphen
8210, # ‒ figure dash
8211, # – en dash
8212, # — em dash
8213, # ― horizontal bar
8214, # ‖ double vertical line
8215, # ‗ double low line
8224, # † dagger
8225, # ‡ double dagger
8226, # • bullet
8227, # ‣ triangular bullet
8228, # ․ one dot leader
8229, # ‥ two dot leader
8230, # … horizontal ellipsis
8231, # ‧ hyphenation point
8240, # ‰ per mille sign
8241, # ‱ per ten thousand sign
8242, # ′ prime
8243, # ″ double prime
8244, # ‴ triple prime
8245, # ‵ reversed prime
8246, # ‶ reversed double prime
8247, # ‷ reversed triple prime
8248, # ‸ caret
8251, # ※ reference mark
8252, # ‼ double exclamation mark
8253, # ‽ interrobang
8254, # ‾ overline
8257, # ⁁ caret insertion point
8258, # ⁂ asterism
8259, # ⁃ hyphen bullet
8263, # ⁇ double question mark
8264, # ⁈ question exclamation mark
8265, # ⁉ exclamation question mark
8266, # ⁊ tironian sign et
8267, # ⁋ reversed pilcrow sign
8268, # ⁌ black leftwards bullet
8269, # ⁍ black rightwards bullet
8270, # ⁎ low asterisk
8271, # ⁏ reversed semicolon
8272, # ⁐ close up
8273, # ⁑ two asterisks aligned vertically
8275, # ⁓ swung dash
8277, # ⁕ flower punctuation mark
8278, # ⁖ three dot punctuation
8279, # ⁗ quadruple prime
8280, # ⁘ four dot punctuation
8281, # ⁙ five dot punctuation
8282, # ⁚ two dot punctuation
8283, # ⁛ four dot mark
8284, # ⁜ dotted cross
8285, # ⁝ tricolon
8286, # ⁞ vertical four dots
11513, # ⳹ coptic old nubian full stop
11514, # ⳺ coptic old nubian direct question mark
11515, # ⳻ coptic old nubian indirect question mark
11516, # ⳼ coptic old nubian verse divider
11518, # ⳾ coptic full stop
11519, # ⳿ coptic morphological divider
11776, # ⸀ right angle substitution marker
11777, # ⸁ right angle dotted substitution marker
11782, # ⸆ raised interpolation marker
11783, # ⸇ raised dotted interpolation marker
11784, # ⸈ dotted transposition marker
11787, # ⸋ raised square
11790, # ⸎ editorial coronis
11791, # ⸏ paragraphos
11792, # ⸐ forked paragraphos
11793, # ⸑ reversed forked paragraphos
11794, # ⸒ hypodiastole
11795, # ⸓ dotted obelos
11796, # ⸔ downwards ancora
11797, # ⸕ upwards ancora
11798, # ⸖ dotted right-pointing angle
11799, # ⸗ double oblique hyphen
11800, # ⸘ inverted interrobang
11801, # ⸙ palm branch
11802, # ⸚ hyphen with diaeresis
11803, # ⸛ tilde with ring above
11806, # ⸞ tilde with dot above
11807, # ⸟ tilde with dot below
11818, # ⸪ two dots over one dot punctuation
11819, # ⸫ one dot over two dots punctuation
11820, # ⸬ squared four dot punctuation
11821, # ⸭ five dot mark
11822, # ⸮ reversed question mark
11824, # ⸰ ring point
11825, # ⸱ word separator middle dot
12289, # 、 ideographic comma
12290, # 。 ideographic full stop
12291, # 〃 ditto mark
12316, # 〜 wave dash
12336, # 〰 wavy dash
12349, # 〽 part alternation mark
12448, # ゠ katakana-hiragana double hyphen
12539, # ・ katakana middle dot
42238, # ꓾ lisu punctuation comma
42239, # ꓿ lisu punctuation full stop
42509, # ꘍ vai comma
42510, # ꘎ vai full stop
42511, # ꘏ vai question mark
42611, # ꙳ slavonic asterisk
42622, # ꙾ cyrillic kavyka
42738, # ꛲ bamum njaemli
42739, # ꛳ bamum full stop
42740, # ꛴ bamum colon
42741, # ꛵ bamum comma
42742, # ꛶ bamum semicolon
42743, # ꛷ bamum question mark
43124, # ꡴ phags-pa single head mark
43125, # ꡵ phags-pa double head mark
43126, # ꡶ phags-pa mark shad
43127, # ꡷ phags-pa mark double shad
43214, # ꣎ saurashtra danda
43215, # ꣏ saurashtra double danda
43256, # ꣸ devanagari sign pushpika
43257, # ꣹ devanagari gap filler
43258, # ꣺ devanagari caret
43310, # ꤮ kayah li sign cwi
43311, # ꤯ kayah li sign shya
43359, # ꥟ rejang section mark
43457, # ꧁ javanese left rerenggan
43458, # ꧂ javanese right rerenggan
43459, # ꧃ javanese pada andap
43460, # ꧄ javanese pada madya
43461, # ꧅ javanese pada luhur
43462, # ꧆ javanese pada windu
43463, # ꧇ javanese pada pangkat
43464, # ꧈ javanese pada lingsa
43465, # ꧉ javanese pada lungsi
43466, # ꧊ javanese pada adeg
43467, # ꧋ javanese pada adeg adeg
43468, # ꧌ javanese pada piseleh
43469, # ꧍ javanese turned pada piseleh
43486, # ꧞ javanese pada tirta tumetes
43487, # ꧟ javanese pada isen-isen
43612, # ꩜ cham punctuation spiral
43613, # ꩝ cham punctuation danda
43614, # ꩞ cham punctuation double danda
43615, # ꩟ cham punctuation triple danda
43742, # ꫞ tai viet symbol ho hoi
43743, # ꫟ tai viet symbol koi koi
44011, # ꯫ meetei mayek cheikhei
65040, # ︐ presentation form for vertical comma
65041, # ︑ presentation form for vertical ideographic comma
65042, # ︒ presentation form for vertical ideographic full stop
65043, # ︓ presentation form for vertical colon
65044, # ︔ presentation form for vertical semicolon
65045, # ︕ presentation form for vertical exclamation mark
65046, # ︖ presentation form for vertical question mark
65049, # ︙ presentation form for vertical horizontal ellipsis
65072, # ︰ presentation form for vertical two dot leader
65073, # ︱ presentation form for vertical em dash
65074, # ︲ presentation form for vertical en dash
65093, # ﹅ sesame dot
65094, # ﹆ white sesame dot
65097, # ﹉ dashed overline
65098, # ﹊ centreline overline
65099, # ﹋ wavy overline
65100, # ﹌ double wavy overline
65104, # ﹐ small comma
65105, # ﹑ small ideographic comma
65106, # ﹒ small full stop
65108, # ﹔ small semicolon
65109, # ﹕ small colon
65110, # ﹖ small question mark
65111, # ﹗ small exclamation mark
65112, # ﹘ small em dash
65119, # ﹟ small number sign
65120, # ﹠ small ampersand
65121, # ﹡ small asterisk
65123, # ﹣ small hyphen-minus
65128, # ﹨ small reverse solidus
65130, # ﹪ small percent sign
65131, # ﹫ small commercial at
65281, # ! fullwidth exclamation mark
65282, # " fullwidth quotation mark
65283, # # fullwidth number sign
65285, # % fullwidth percent sign
65286, # & fullwidth ampersand
65287, # ' fullwidth apostrophe
65290, # * fullwidth asterisk
65292, # , fullwidth comma
65293, # - fullwidth hyphen-minus
65294, # . fullwidth full stop
65295, # / fullwidth solidus
65306, # : fullwidth colon
65307, # ; fullwidth semicolon
65311, # ? fullwidth question mark
65312, # @ fullwidth commercial at
65340, # \ fullwidth reverse solidus
65377, # 。 halfwidth ideographic full stop
65380, # 、 halfwidth ideographic comma
65381, # ・ halfwidth katakana middle dot

I'll use this code (minus the unicode characters in the comments) to create the delimiters constants in Leo's ascii-only, unified-code-base version of docutils.

Edward