From: Guenter M. <mi...@us...> - 2011-01-31 07:58:54
|
On 2011-01-28, SourceForge.net wrote: > Summary: rst parser doesn't handle languages without spaces > Initial Comment: > When parsing a reStructuredText document in a language without standard > spaces, such as Japanese, inlining does not work as it should. The > problem relates to the unicode_delimeters and end_string_suffix > variables of the Inliner class in parsers/rst/states.py. When > inline_obj() in that file checks for an end-of-inline match > ("endmatch = end_pattern.search(string[matchend:])"), the RE fails > because end_string_suffix doesn't handle the use of any character after > the inlined string's suffix. This is the documented behaviour. It reduces the number of false positivs (allowing in-line markup inside words). > Inline literals are demonstrated in the attached files. Even the > full-width space used in East Asian languages such as Japanese and > Chinese doesn't work (lines 10 and 12). Adding a new line or ASCII > space before/after the inlined string allows it to be parsed normally. The "official" workaround is to use a protected space (``\ ``). > Probably, either unicode_delimiters needs to be expanded to include the > full character set from languages such as Japanese and Chinese (can it > do a Unicode range?) or the patterns used to find the start and end of > inlined strings need to be. IMV, the full-width space (and maybe other space variants from "General Punctuation") could be accepted around inline markup. I cannot say whether adding the full JCK character set is an improvement as it would require escaping the inline markup characters inside a Chinese/Japanese/Korean word. Günter |