On Thu, Apr 18, 2013 at 4:57 PM, Guenter Milde <milde@users.sf.net> wrote:
On 2013-04-18, Edward K. Ream wrote:

Hi Edward,

thanks for the report.

Hi Guenter,

Thanks for the reply.
 
The strange thing is that in both sets we see a lot of characters that
are definitely no delimiters (and not in the hard-coded set). There seems
to be a problem with encoding for printing. It seems your terminal does not
support utf8. This makes it hard to spot the difference.

Yeah. There are big problems with the Windows console in this regard.

Differences can be safely ignored for production. They might be a false
positive or a hint that it is time to update the pre-generated strings.
However, I cannot reproduce the problem here (Debian/testing, Python 2.7,
Docutils 0.11svn, locale de_DE.utf8. 

I wrote the following function to make the differences easier to spot::

def compare(s1,s2):
   
    print(len(s1),len(s2))
    d1,d2 = {},{}
    for uc in s1:
        assert isinstance(uc,(str,unicode)),type(uc)
        n = ord(uc)
        d1[n] = uc
    for uc in s2:
        assert isinstance(uc,(str,unicode)),type(uc)
        n = ord(uc)
        d2[n] = uc
    nset = set()
    for n in d1.keys():
        nset.add(n)
    for n in d2.keys():
        nset.add(n)
    matches = 0
    for n in sorted(nset):
        uc1 = d1.get(n)
        uc2 = d2.get(n)
        if uc1 is None and uc2 is None:
            print('%5s huh??' % (n))
        elif uc1 is None:
            print('%5s' % (n),'missing1',uc2,unicodedata.name(uc2,'Unknown'))
        elif uc2 is None:
            print('%5s' % (n),'missing2',uc1,unicodedata.name(uc1,'Unknown'))
        elif uc1 == uc2:
            print('%5s' % (n),'match',uc1,unicodedata.name(uc1,'Unknown'))
            matches += 1
        else:
            print('%5s' % (n), uc1, unicodedata.name(uc1,'Unknown'), uc2,unicodedata.name(uc2,'Unknown'))
    print('matches: %s' % matches)

And here is the result for compare(delimiters,d)

434 366
   45 match - HYPHEN-MINUS
   47 match / SOLIDUS
   58 match : COLON
   92 match \ REVERSE SOLIDUS
  161 match ┬í INVERTED EXCLAMATION MARK
  183 match ┬╖ MIDDLE DOT
  191 match ┬┐ INVERTED QUESTION MARK
  894 match ═╛ GREEK QUESTION MARK
  903 match ╬ç GREEK ANO TELEIA
 1370 match ╒Ü ARMENIAN APOSTROPHE
 1371 match ╒¢ ARMENIAN EMPHASIS MARK
 1372 match ╒£ ARMENIAN EXCLAMATION MARK
 1373 match ╒¥ ARMENIAN COMMA
 1374 match ╒₧ ARMENIAN QUESTION MARK
 1375 match ╒ƒ ARMENIAN ABBREVIATION MARK
 1417 match ╓ë ARMENIAN FULL STOP
 1418 match ╓è ARMENIAN HYPHEN
 1470 match ╓╛ HEBREW PUNCTUATION MAQAF
 1472 match ╫Ç HEBREW PUNCTUATION PASEQ
 1475 match ╫â HEBREW PUNCTUATION SOF PASUQ
 1478 match ╫å HEBREW PUNCTUATION NUN HAFUKHA
 1523 match ╫│ HEBREW PUNCTUATION GERESH
 1524 match ╫┤ HEBREW PUNCTUATION GERSHAYIM
 1545 match ╪ë ARABIC-INDIC PER MILLE SIGN
 1546 match ╪è ARABIC-INDIC PER TEN THOUSAND SIGN
 1548 match ╪î ARABIC COMMA
 1549 match ╪ì ARABIC DATE SEPARATOR
 1563 match ╪¢ ARABIC SEMICOLON
 1566 match ╪₧ ARABIC TRIPLE DOT PUNCTUATION MARK
 1567 match ╪ƒ ARABIC QUESTION MARK
 1642 match ┘¬ ARABIC PERCENT SIGN
 1643 match ┘½ ARABIC DECIMAL SEPARATOR
 1644 match ┘¼ ARABIC THOUSANDS SEPARATOR
 1645 match ┘¡ ARABIC FIVE POINTED STAR
 1748 match █ö ARABIC FULL STOP
 1792 match ▄Ç SYRIAC END OF PARAGRAPH
 1793 match ▄ü SYRIAC SUPRALINEAR FULL STOP
 1794 match ▄é SYRIAC SUBLINEAR FULL STOP
 1795 match ▄â SYRIAC SUPRALINEAR COLON
 1796 match ▄ä SYRIAC SUBLINEAR COLON
 1797 match ▄à SYRIAC HORIZONTAL COLON
 1798 match ▄å SYRIAC COLON SKEWED LEFT
 1799 match ▄ç SYRIAC COLON SKEWED RIGHT
 1800 match ▄ê SYRIAC SUPRALINEAR COLON SKEWED LEFT
 1801 match ▄ë SYRIAC SUBLINEAR COLON SKEWED RIGHT
 1802 match ▄è SYRIAC CONTRACTION
 1803 match ▄ï SYRIAC HARKLEAN OBELUS
 1804 match ▄î SYRIAC HARKLEAN METOBELUS
 1805 match ▄ì SYRIAC HARKLEAN ASTERISCUS
 2039 match ▀╖ NKO SYMBOL GBAKURUNEN
 2040 match ▀╕ NKO COMMA
 2041 match ▀╣ NKO EXCLAMATION MARK
 2096 match αá░ SAMARITAN PUNCTUATION NEQUDAA
 2097 match αá▒ SAMARITAN PUNCTUATION AFSAAQ
 2098 match αá▓ SAMARITAN PUNCTUATION ANGED
 2099 match αá│ SAMARITAN PUNCTUATION BAU
 2100 match αá┤ SAMARITAN PUNCTUATION ATMAAU
 2101 match αá╡ SAMARITAN PUNCTUATION SHIYYAALAA
 2102 match αá╢ SAMARITAN ABBREVIATION MARK
 2103 match αá╖ SAMARITAN PUNCTUATION MELODIC QITSA
 2104 match αá╕ SAMARITAN PUNCTUATION ZIQAA
 2105 match αá╣ SAMARITAN PUNCTUATION QITSA
 2106 match αá║ SAMARITAN PUNCTUATION ZAEF
 2107 match αá╗ SAMARITAN PUNCTUATION TURU
 2108 match αá╝ SAMARITAN PUNCTUATION ARKAANU
 2109 match αá╜ SAMARITAN PUNCTUATION SOF MASHFAAT
 2110 match αá╛ SAMARITAN PUNCTUATION ANNAAU
 2404 match αÑñ DEVANAGARI DANDA
 2405 match αÑÑ DEVANAGARI DOUBLE DANDA
 2416 match αÑ░ DEVANAGARI ABBREVIATION SIGN
 3572 match α╖┤ SINHALA PUNCTUATION KUNDDALIYA
 3663 match α╣Å THAI CHARACTER FONGMAN
 3674 match α╣Ü THAI CHARACTER ANGKHANKHU
 3675 match α╣¢ THAI CHARACTER KHOMUT
 3844 match α╝ä TIBETAN MARK INITIAL YIG MGO MDUN MA
 3845 match α╝à TIBETAN MARK CLOSING YIG MGO SGAB MA
 3846 match α╝å TIBETAN MARK CARET YIG MGO PHUR SHAD MA
 3847 match α╝ç TIBETAN MARK YIG MGO TSHEG SHAD MA
 3848 match α╝ê TIBETAN MARK SBRUL SHAD
 3849 match α╝ë TIBETAN MARK BSKUR YIG MGO
 3850 match α╝è TIBETAN MARK BKA- SHOG YIG MGO
 3851 match α╝ï TIBETAN MARK INTERSYLLABIC TSHEG
 3852 match α╝î TIBETAN MARK DELIMITER TSHEG BSTAR
 3853 match α╝ì TIBETAN MARK SHAD
 3854 match α╝Ä TIBETAN MARK NYIS SHAD
 3855 match α╝Å TIBETAN MARK TSHEG SHAD
 3856 match α╝É TIBETAN MARK NYIS TSHEG SHAD
 3857 match α╝æ TIBETAN MARK RIN CHEN SPUNGS SHAD
 3858 match α╝Æ TIBETAN MARK RGYA GRAM SHAD
 3973 match α╛à TIBETAN MARK PALUTA
 4048 match α┐É TIBETAN MARK BSKA- SHOG GI MGO RGYAN
 4049 match α┐æ TIBETAN MARK MNYAM YIG GI MGO RGYAN
 4050 match α┐Æ TIBETAN MARK NYIS TSHEG
 4051 match α┐ô TIBETAN MARK INITIAL BRDA RNYING YIG MGO MDUN MA
 4052 match α┐ö TIBETAN MARK CLOSING BRDA RNYING YIG MGO SGAB MA
 4170 match ßüè MYANMAR SIGN LITTLE SECTION
 4171 match ßüï MYANMAR SIGN SECTION
 4172 match ßüî MYANMAR SYMBOL LOCATIVE
 4173 match ßüì MYANMAR SYMBOL COMPLETED
 4174 match ßüÄ MYANMAR SYMBOL AFOREMENTIONED
 4175 match ßüÅ MYANMAR SYMBOL GENITIVE
 4347 match ßâ╗ GEORGIAN PARAGRAPH SEPARATOR
 4961 match ßìí ETHIOPIC WORDSPACE
 4962 match ßìó ETHIOPIC FULL STOP
 4963 match ßìú ETHIOPIC COMMA
 4964 match ßìñ ETHIOPIC SEMICOLON
 4965 match ßìÑ ETHIOPIC COLON
 4966 match ßìª ETHIOPIC PREFACE COLON
 4967 match ßìº ETHIOPIC QUESTION MARK
 4968 match ßì¿ ETHIOPIC PARAGRAPH SEPARATOR
 5120 match ßÉÇ CANADIAN SYLLABICS HYPHEN
 5741 match ßÖ¡ CANADIAN SYLLABICS CHI SIGN
 5742 match ßÖ« CANADIAN SYLLABICS FULL STOP
 5867 match ߢ½ RUNIC SINGLE PUNCTUATION
 5868 match ߢ¼ RUNIC MULTIPLE PUNCTUATION
 5869 match ߢ¡ RUNIC CROSS PUNCTUATION
 5941 match ߣ╡ PHILIPPINE SINGLE PUNCTUATION
 5942 match ߣ╢ PHILIPPINE DOUBLE PUNCTUATION
 6100 match ߃ö KHMER SIGN KHAN
 6101 match ߃ò KHMER SIGN BARIYOOSAN
 6102 match ߃û KHMER SIGN CAMNUC PII KUUH
 6104 match ߃ÿ KHMER SIGN BEYYAL
 6105 match ៙ KHMER SIGN PHNAEK MUAN
 6106 match ៚ KHMER SIGN KOOMUUT
 6144 match ßáÇ MONGOLIAN BIRGA
 6145 match ßáü MONGOLIAN ELLIPSIS
 6146 match ßáé MONGOLIAN COMMA
 6147 match ßáâ MONGOLIAN FULL STOP
 6148 match ßáä MONGOLIAN COLON
 6149 match ßáà MONGOLIAN FOUR DOTS
 6150 match ßáå MONGOLIAN TODO SOFT HYPHEN
 6151 match ßáç MONGOLIAN SIBE SYLLABLE BOUNDARY MARKER
 6152 match ßáê MONGOLIAN MANCHU COMMA
 6153 match ßáë MONGOLIAN MANCHU FULL STOP
 6154 match ßáè MONGOLIAN NIRUGU
 6468 match ßÑä LIMBU EXCLAMATION MARK
 6469 match ßÑà LIMBU QUESTION MARK
 6622 match ߺ₧ NEW TAI LUE SIGN LAE
 6623 match ߺƒ NEW TAI LUE SIGN LAEV
 6686 match ß¿₧ BUGINESE PALLAWA
 6687 match ß¿ƒ BUGINESE END OF SECTION
 6816 match ߬á TAI THAM SIGN WIANG
 6817 match ߬í TAI THAM SIGN WIANGWAAK
 6818 match ߬ó TAI THAM SIGN SAWAN
 6819 match ߬ú TAI THAM SIGN KEOW
 6820 match ߬ñ TAI THAM SIGN HOY
 6821 match ᪥ TAI THAM SIGN DOKMAI
 6822 match ߬ª TAI THAM SIGN REVERSED ROTATED RANA
 6824 match ߬¿ TAI THAM SIGN KAAN
 6825 match ߬⌐ TAI THAM SIGN KAANKUU
 6826 match ߬¬ TAI THAM SIGN SATKAAN
 6827 match ߬½ TAI THAM SIGN SATKAANKUU
 6828 match ߬¼ TAI THAM SIGN HANG
 6829 match ߬¡ TAI THAM SIGN CAANG
 7002 match ß¡Ü BALINESE PANTI
 7003 match ß¡¢ BALINESE PAMADA
 7004 match ß¡£ BALINESE WINDU
 7005 match ß¡¥ BALINESE CARIK PAMUNGKAH
 7006 match ß¡₧ BALINESE CARIK SIKI
 7007 match ß¡ƒ BALINESE CARIK PAREREN
 7008 match ß¡á BALINESE PAMENENG
 7227 match ß░╗ LEPCHA PUNCTUATION TA-ROL
 7228 match ß░╝ LEPCHA PUNCTUATION NYET THYOOM TA-ROL
 7229 match ß░╜ LEPCHA PUNCTUATION CER-WA
 7230 match ß░╛ LEPCHA PUNCTUATION TSHOOK CER-WA
 7231 match ß░┐ LEPCHA PUNCTUATION TSHOOK
 7294 match ß▒╛ OL CHIKI PUNCTUATION MUCAAD
 7295 match ß▒┐ OL CHIKI PUNCTUATION DOUBLE MUCAAD
 7379 match ß│ô VEDIC SIGN NIHSHVASA
 8208 match ΓÇÉ HYPHEN
 8209 match ΓÇæ NON-BREAKING HYPHEN
 8210 match ΓÇÆ FIGURE DASH
 8211 match ΓÇô EN DASH
 8212 match ΓÇö EM DASH
 8213 match ΓÇò HORIZONTAL BAR
 8214 match ΓÇû DOUBLE VERTICAL LINE
 8215 match ΓÇù DOUBLE LOW LINE
 8224 match ΓÇá DAGGER
 8225 match ΓÇí DOUBLE DAGGER
 8226 match ΓÇó BULLET
 8227 match ΓÇú TRIANGULAR BULLET
 8228 match ΓÇñ ONE DOT LEADER
 8229 match ΓÇÑ TWO DOT LEADER
 8230 match ΓǪ HORIZONTAL ELLIPSIS
 8231 match ΓǺ HYPHENATION POINT
 8240 match ΓÇ░ PER MILLE SIGN
 8241 match ΓÇ▒ PER TEN THOUSAND SIGN
 8242 match ΓÇ▓ PRIME
 8243 match ΓÇ│ DOUBLE PRIME
 8244 match ΓÇ┤ TRIPLE PRIME
 8245 match ΓÇ╡ REVERSED PRIME
 8246 match ΓÇ╢ REVERSED DOUBLE PRIME
 8247 match ΓÇ╖ REVERSED TRIPLE PRIME
 8248 match ΓÇ╕ CARET
 8251 match ΓÇ╗ REFERENCE MARK
 8252 match ΓÇ╝ DOUBLE EXCLAMATION MARK
 8253 match ΓÇ╜ INTERROBANG
 8254 match ΓÇ╛ OVERLINE
 8257 match Γüü CARET INSERTION POINT
 8258 match Γüé ASTERISM
 8259 match Γüâ HYPHEN BULLET
 8263 match Γüç DOUBLE QUESTION MARK
 8264 match Γüê QUESTION EXCLAMATION MARK
 8265 match Γüë EXCLAMATION QUESTION MARK
 8266 match Γüè TIRONIAN SIGN ET
 8267 match Γüï REVERSED PILCROW SIGN
 8268 match Γüî BLACK LEFTWARDS BULLET
 8269 match Γüì BLACK RIGHTWARDS BULLET
 8270 match ΓüÄ LOW ASTERISK
 8271 match ΓüÅ REVERSED SEMICOLON
 8272 match ΓüÉ CLOSE UP
 8273 match Γüæ TWO ASTERISKS ALIGNED VERTICALLY
 8275 match Γüô SWUNG DASH
 8277 match Γüò FLOWER PUNCTUATION MARK
 8278 match Γüû THREE DOT PUNCTUATION
 8279 match Γüù QUADRUPLE PRIME
 8280 match Γüÿ FOUR DOT PUNCTUATION
 8281 match ΓüÖ FIVE DOT PUNCTUATION
 8282 match ΓüÜ TWO DOT PUNCTUATION
 8283 match Γü¢ FOUR DOT MARK
 8284 match Γü£ DOTTED CROSS
 8285 match Γü¥ TRICOLON
 8286 match Γü₧ VERTICAL FOUR DOTS
11513 match Γ│╣ COPTIC OLD NUBIAN FULL STOP
11514 match Γ│║ COPTIC OLD NUBIAN DIRECT QUESTION MARK
11515 match Γ│╗ COPTIC OLD NUBIAN INDIRECT QUESTION MARK
11516 match Γ│╝ COPTIC OLD NUBIAN VERSE DIVIDER
11518 match Γ│╛ COPTIC FULL STOP
11519 match Γ│┐ COPTIC MORPHOLOGICAL DIVIDER
11776 match ⸀ RIGHT ANGLE SUBSTITUTION MARKER
11777 match ⸁ RIGHT ANGLE DOTTED SUBSTITUTION MARKER
11782 match ⸆ RAISED INTERPOLATION MARKER
11783 match ⸇ RAISED DOTTED INTERPOLATION MARKER
11784 match ⸈ DOTTED TRANSPOSITION MARKER
11787 match ⸋ RAISED SQUARE
11790 match ⸎ EDITORIAL CORONIS
11791 match ⸏ PARAGRAPHOS
11792 match ⸐ FORKED PARAGRAPHOS
11793 match ⸑ REVERSED FORKED PARAGRAPHOS
11794 match ⸒ HYPODIASTOLE
11795 match ⸓ DOTTED OBELOS
11796 match ⸔ DOWNWARDS ANCORA
11797 match ⸕ UPWARDS ANCORA
11798 match ⸖ DOTTED RIGHT-POINTING ANGLE
11799 match ⸗ DOUBLE OBLIQUE HYPHEN
11800 match Γ╕ÿ INVERTED INTERROBANG
11801 match ⸙ PALM BRANCH
11802 match ⸚ HYPHEN WITH DIAERESIS
11803 match ⸛ TILDE WITH RING ABOVE
11806 match ⸞ TILDE WITH DOT ABOVE
11807 match ⸟ TILDE WITH DOT BELOW
11818 match ⸪ TWO DOTS OVER ONE DOT PUNCTUATION
11819 match ⸫ ONE DOT OVER TWO DOTS PUNCTUATION
11820 match ⸬ SQUARED FOUR DOT PUNCTUATION
11821 match ⸭ FIVE DOT MARK
11822 match ⸮ REVERSED QUESTION MARK
11824 match Γ╕░ RING POINT
11825 match Γ╕▒ WORD SEPARATOR MIDDLE DOT
12289 match 、 IDEOGRAPHIC COMMA
12290 match 。 IDEOGRAPHIC FULL STOP
12291 match 〃 DITTO MARK
12316 match 〜 WAVE DASH
12336 match 〰 WAVY DASH
12349 match 〽 PART ALTERNATION MARK
12448 match πéá KATAKANA-HIRAGANA DOUBLE HYPHEN
12539 match ・ KATAKANA MIDDLE DOT
42238 match Ωô╛ LISU PUNCTUATION COMMA
42239 match Ωô┐ LISU PUNCTUATION FULL STOP
42509 match Ωÿì VAI COMMA
42510 match ΩÿÄ VAI FULL STOP
42511 match ΩÿÅ VAI QUESTION MARK
42611 match ΩÖ│ SLAVONIC ASTERISK
42622 match ΩÖ╛ CYRILLIC KAVYKA
42738 match ꛲ BAMUM NJAEMLI
42739 match ꛳ BAMUM FULL STOP
42740 match ꛴ BAMUM COLON
42741 match ꛵ BAMUM COMMA
42742 match ꛶ BAMUM SEMICOLON
42743 match ꛷ BAMUM QUESTION MARK
43124 match ꡴ PHAGS-PA SINGLE HEAD MARK
43125 match ꡵ PHAGS-PA DOUBLE HEAD MARK
43126 match ꡶ PHAGS-PA MARK SHAD
43127 match ꡷ PHAGS-PA MARK DOUBLE SHAD
43214 match ΩúÄ SAURASHTRA DANDA
43215 match ΩúÅ SAURASHTRA DOUBLE DANDA
43256 match Ωú╕ DEVANAGARI SIGN PUSHPIKA
43257 match Ωú╣ DEVANAGARI GAP FILLER
43258 match Ωú║ DEVANAGARI CARET
43310 match Ωñ« KAYAH LI SIGN CWI
43311 match Ωñ» KAYAH LI SIGN SHYA
43359 match ꥟ REJANG SECTION MARK
43457 match ꧁ JAVANESE LEFT RERENGGAN
43458 match ꧂ JAVANESE RIGHT RERENGGAN
43459 match ꧃ JAVANESE PADA ANDAP
43460 match ꧄ JAVANESE PADA MADYA
43461 match ꧅ JAVANESE PADA LUHUR
43462 match ꧆ JAVANESE PADA WINDU
43463 match ꧇ JAVANESE PADA PANGKAT
43464 match ꧈ JAVANESE PADA LINGSA
43465 match ꧉ JAVANESE PADA LUNGSI
43466 match ꧊ JAVANESE PADA ADEG
43467 match ꧋ JAVANESE PADA ADEG ADEG
43468 match ꧌ JAVANESE PADA PISELEH
43469 match ꧍ JAVANESE TURNED PADA PISELEH
43486 match ꧞ JAVANESE PADA TIRTA TUMETES
43487 match ꧟ JAVANESE PADA ISEN-ISEN
43612 match ꩜ CHAM PUNCTUATION SPIRAL
43613 match ꩝ CHAM PUNCTUATION DANDA
43614 match ꩞ CHAM PUNCTUATION DOUBLE DANDA
43615 match ꩟ CHAM PUNCTUATION TRIPLE DANDA
43742 match Ω½₧ TAI VIET SYMBOL HO HOI
43743 match ꫟ TAI VIET SYMBOL KOI KOI
44011 match Ω»½ MEETEI MAYEK CHEIKHEI
55296 missing2 φáÇ Unknown
55298 missing2 φáé Unknown
55300 missing2 φáä Unknown
55305 missing2 φáë Unknown
56407 missing2 φ▒ù Unknown
56432 missing2 φ▒░ Unknown
56433 missing2 φ▒▒ Unknown
56434 missing2 φ▒▓ Unknown
56435 missing2 φ▒│ Unknown
56507 missing2 φ▓╗ Unknown
56508 missing2 φ▓╝ Unknown
56510 missing2 φ▓╛ Unknown
56511 missing2 φ▓┐ Unknown
56512 missing2 φ│Ç Unknown
56513 missing2 φ│ü Unknown
56576 missing2 φ┤Ç Unknown
56577 missing2 φ┤ü Unknown
56607 missing2 φ┤ƒ Unknown
56639 missing2 φ┤┐ Unknown
56912 missing2 φ╣É Unknown
56913 missing2 φ╣æ Unknown
56914 missing2 φ╣Æ Unknown
56915 missing2 φ╣ô Unknown
56916 missing2 φ╣ö Unknown
56917 missing2 φ╣ò Unknown
56918 missing2 φ╣û Unknown
56919 missing2 φ╣ù Unknown
56920 missing2 φ╣ÿ Unknown
56959 missing2 φ╣┐ Unknown
57145 missing2 φ╝╣ Unknown
57146 missing2 φ╝║ Unknown
57147 missing2 φ╝╗ Unknown
57148 missing2 φ╝╝ Unknown
57149 missing2 φ╝╜ Unknown
57150 missing2 φ╝╛ Unknown
57151 missing2 φ╝┐ Unknown
57247 missing2 φ╛ƒ Unknown
57296 missing2 φ┐É Unknown
65040 match ︐ PRESENTATION FORM FOR VERTICAL COMMA
65041 match ︑ PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA
65042 match ︒ PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
65043 match ︓ PRESENTATION FORM FOR VERTICAL COLON
65044 match ︔ PRESENTATION FORM FOR VERTICAL SEMICOLON
65045 match ︕ PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK
65046 match ︖ PRESENTATION FORM FOR VERTICAL QUESTION MARK
65049 match ︙ PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS
65072 match ︰ PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
65073 match ︱ PRESENTATION FORM FOR VERTICAL EM DASH
65074 match ︲ PRESENTATION FORM FOR VERTICAL EN DASH
65093 match ﹅ SESAME DOT
65094 match ﹆ WHITE SESAME DOT
65097 match ﹉ DASHED OVERLINE
65098 match ﹊ CENTRELINE OVERLINE
65099 match ﹋ WAVY OVERLINE
65100 match ﹌ DOUBLE WAVY OVERLINE
65104 match ﹐ SMALL COMMA
65105 match ﹑ SMALL IDEOGRAPHIC COMMA
65106 match ﹒ SMALL FULL STOP
65108 match ﹔ SMALL SEMICOLON
65109 match ﹕ SMALL COLON
65110 match ﹖ SMALL QUESTION MARK
65111 match ﹗ SMALL EXCLAMATION MARK
65112 match ﹘ SMALL EM DASH
65119 match ﹟ SMALL NUMBER SIGN
65120 match ﹠ SMALL AMPERSAND
65121 match ﹡ SMALL ASTERISK
65123 match ﹣ SMALL HYPHEN-MINUS
65128 match ﹨ SMALL REVERSE SOLIDUS
65130 match ﹪ SMALL PERCENT SIGN
65131 match ﹫ SMALL COMMERCIAL AT
65281 match ! FULLWIDTH EXCLAMATION MARK
65282 match " FULLWIDTH QUOTATION MARK
65283 match # FULLWIDTH NUMBER SIGN
65285 match % FULLWIDTH PERCENT SIGN
65286 match & FULLWIDTH AMPERSAND
65287 match ' FULLWIDTH APOSTROPHE
65290 match * FULLWIDTH ASTERISK
65292 match , FULLWIDTH COMMA
65293 match - FULLWIDTH HYPHEN-MINUS
65294 match . FULLWIDTH FULL STOP
65295 match / FULLWIDTH SOLIDUS
65306 match : FULLWIDTH COLON
65307 match ; FULLWIDTH SEMICOLON
65311 match ? FULLWIDTH QUESTION MARK
65312 match @ FULLWIDTH COMMERCIAL AT
65340 match \ FULLWIDTH REVERSE SOLIDUS
65377 match 。 HALFWIDTH IDEOGRAPHIC FULL STOP
65380 match 、 HALFWIDTH IDEOGRAPHIC COMMA
65381 match ・ HALFWIDTH KATAKANA MIDDLE DOT
matches: 364

In other words, the delimiters set has extra, unknown, entries for code points between 55296 and 57296.

HTH.

Edward