Thread: [Docutils-develop] Unicode markup

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

However docutils doesn't interpret unicode characters as markup but in
many cases it would make a lot of sense.  Actually we've recently got
one real unicode char handled as markup: a real em-dash in
attributions.  Here missing things I can think of:

* Many unicode chars should be accepted for bullet lists (including
  BULLET (U+2022), of course).

* Many unicode chars should be accepted for section adorements.
  Example: OVERLINE (U+203E).

  * When using OVERLINE below the section title, it would make sense
    to use underline above it in a double adorment style.  Should we
    open the spec to different character for overline and underline?

  * Same characters should also be allowed in transitions.

* Many punctuation characters should get same status for inline markup
  recognition as ASCII punctuation.  This is tricky because the
  currently allowed puntuation was hand-picked, with end-of-sentence
  punctuation allowed only after end-strings.  But consider e.g.
  Spanish, where questions and exclamations also have inverted
  question/exclamation marks at the *begining* of the sentence.

* Should we allow superscript digits for footnote references?  I think
  not, superscript digits are a hack...

* Should we allow line drawing characters in tables?  They certainly
  look neat if one has the nerve to draw them ;-).  I've seen some
  editors that help with them but only on DOS (in IBM PC encoding).

Obviously most if not all of the above cases are applicable to long
lists of Unicode characters, maintaing which by hand in docutils would
be a bad idea.  If possible we should define the behavior in terms of
Unicode character properties / block names.  In cases where it's not
possible, perhaps it's not worth it.  So, is there some Unicode expert
on this list that could spare us the research for choosing appropriate
character properties for these roles?

-- 
Beni Cherniavsky <cb...@tx...>

Thread: [Docutils-develop] Unicode markup

docutils-develop