From: David G. <go...@py...> - 2017-01-16 22:08:14
|
On Tue, Jan 10, 2017 at 3:59 AM, Guenter Milde <mi...@us...> wrote: > On 2017-01-09, Edward d'Auvergne wrote: >> On 5 January 2017 at 11:45, Guenter Milde <mi...@us...> wrote: > > ... > >>> * replace the current handling of combining characters with a version >>> counting for all zero-width characters. > >>> * clarify in the specs, that "line length" or similar in definitions like > >>> An underline/overline is a single repeated punctuation character that >>> begins in column 1 and forms a line extending at least as far as the >>> right edge of the title text. > >>> are valid for monospace characters of unit width with some listed >>> exceptions. > > >> I was wondering if you have heard about the wcwidth() and wcswidth() >> implementations [1, 2]? > > Thank you for the pointer. > >> If this fast bisect algorithm is of interest, >> the Python wcwidth package might need to be downgraded to the 10+ year >> old 5.x Unicode standard used in Python 2. > > There are several issues when using the wcwidth module: > > +1 don't reinvent the wheel: > maintained implementation of a column-width determination function > > +1 stability: character tables are part of the module, do not depend on > Python version. > > The current implementation of wide-char correction depends on > unicodedata from the installed Python version. > > -2 external dependency > > -1 updating this module may break rST documents > > > In addition, also the external module cannot solve the ambiguity: > > Example:: > > from wcwidth import wcswidth > text = u'wait ⌚ or ⌛' > print text > print 'x'*len(text) > print 'x'*wcswidth(text) > > > For wcswidth, WATCH and HOURGLASS are 2 columns wide. > In my text editor, WATCH and HOURGLASS are single-width characters (which > also makes most sense to me). > On some terminals, both characters are followed by space to make them double > width. In `geany`, the text panel uses single width and the terminal panel > double width. > > The problem is generic: > > No established formal standards exist at present on which Unicode > character shall occupy how many cell positions on character terminals. > -- Markus Kuhn http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c > > IMO, Docutils should account for the display in "common" text editors using > monospaced fonts. Speed is no primary issue. > Maybe using a local implementation is best. > > > The documentation must make clear the remaining ambiguity and point to > fail-safe text source: > > * additional underline characters in section headings and simple tables > > * avoid "critical" characters in grid tables (use substitutions if required). > >> Where is the width >> algorithm implemented in docutils? > > docutils/docutils/statemachine.py:1450: def pad_double_width(self, pad_char): > > Uses `unicodedata.east_asian_width`. > > @David: > > How about using a wcswidth()-like implementation instead of len() when > determining text length for section headings and tables instead of the > padding with `double_width_pad_char`? Sure, sounds fine to me. > +1 works also for zero-width characters and combining characters > (solves https://sourceforge.net/p/docutils/bugs/128/) > > -1 API change What exactly would the API change be? David Goodger <http://python.net/~goodger> |