Thread: [Docutils-develop] [ docutils-Bugs-3166907 ] rst parser doesn't handle languages without spaces

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #3166907, was opened at 2011-01-28 20:20
Message generated for change (Tracker Item Submitted) made by gbiggs
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3166907&group_id=38414

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: G Biggs (gbiggs)
Assigned to: Nobody/Anonymous (nobody)
Summary: rst parser doesn't handle languages without spaces

Initial Comment:
When parsing a reStructuredText document in a language without standard spaces, such as Japanese, inlining does not work as it should. The problem relates to the unicode_delimeters and end_string_suffix variables of the Inliner class in parsers/rst/states.py. When inline_obj() in that file checks for an end-of-inline match ("endmatch = end_pattern.search(string[matchend:])"), the RE fails because end_string_suffix doesn't handle the use of any character after the inlined string's suffix. Inline literals are demonstrated in the attached files. Even the full-width space used in East Asian languages such as Japanese and Chinese doesn't work (lines 10 and 12). Adding a new line or ASCII space before/after the inlined string allows it to be parsed normally.

Parsing demo.txt with rst2html.py gives the following errors:

demo.txt:6: (WARNING/2) Inline literal start-string without end-string.
demo.txt:12: (WARNING/2) Inline literal start-string without end-string.
demo.txt:14: (WARNING/2) Inline literal start-string without end-string.

In addition, on lines 4, 8 and 10, the inline literal is not even detected because it is preceded by a Japanese character (including the full-width space).

Probably, either unicode_delimiters needs to be expanded to include the full character set from languages such as Japanese and Chinese (can it do a Unicode range?) or the patterns used to find the start and end of inlined strings need to be.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3166907&group_id=38414

Thread: [Docutils-develop] [ docutils-Bugs-3166907 ] rst parser doesn't handle languages without spaces

docutils-develop