There are significant, um, anomalies, in punctuation_chars.py.  Fixing these is essential for Leo: Leo must have a unified code base that will run on Python 2.6+ and Python 3.0+.

Even the slightest change to punctuation_chars.py is fraught with consequences.  I ran all unit tests with Python 2.6 after every change I describe below.

All *expected* unit tests pass.  By that I mean that some unit tests fail because system error messages are different on Windows and Linux.  I disabled such tests, which (I think) cause three dependency tests to fail.  In short, the changes described here cause no *other* unit tests to fail.

Anomaly 1.  Trailing unknown characters in delimiters.

The unknown characters are the 34 characters after ・ in delimiters, that is:

𐄀𐄁𐎟𐏐𐡗𐤟𐤿𐩐𐩑𐩒𐩓𐩔𐩕𐩖𐩗𐩘𐩿𐬹𐬺𐬻𐬼𐬽𐬾𐬿𑂻𑂼𑂾𑂿𑃀𑃁𒑰𒑱𒑲𒑳

:-)

There are other unknown characters in the delimiters string, but I haven't yet written a script to get rid of them.

Anomaly 2: Duplicate '\' (solidus) characters in delimiters.

The delimiters strings starts with:: ur"\-\/\:  This defines solidus three times!  Replacing the string so that it starts with ur"\-/: has no effect on the unit tests.

This same redundancy also occurs in the other strings defined in the file, namely openers, closers and closing_delimiters.

Anomaly 3: No need for ur"...

At this point, I conjectured that the ur string could be replaced by a u string.  Indeed, it can, so far as the unit tests are concerned.  The final definition for delimiters starts with: delimiters = u"\\-/:

This is **very** important!  Indeed, 2to3 does not translate ur strings to r strings, as discussed in Pep 414.  Happily, the delimiters string contains no \uXXXX or \UXXXXXXXX escapes, so the ur definition of delimiters is the same as the u definition of delimiters.

Summary

1. Making all the changes to the delimiters string has no effect on unit tests.  Perhaps there are subtleties not caught by the unit tests...

2. Similar changes should be made to openers, closers and closing_delimiters.

3. The present definitions define solidus in *all* the strings, that is, in openers, closers, delimiters and closing_delimiters.  This seems dubious.  Investigation by those who know more than I seems warranted.

4. Most importantly, from Leo's point of view, eliminating all the ur constants from this file makes it possible to use it unchanged in both Python 2 and 3.

Your comments and corrections, please.

Edward