From: SourceForge.net <no...@so...> - 2011-01-28 07:20:47
|
Bugs item #3166907, was opened at 2011-01-28 20:20 Message generated for change (Tracker Item Submitted) made by gbiggs You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3166907&group_id=38414 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: G Biggs (gbiggs) Assigned to: Nobody/Anonymous (nobody) Summary: rst parser doesn't handle languages without spaces Initial Comment: When parsing a reStructuredText document in a language without standard spaces, such as Japanese, inlining does not work as it should. The problem relates to the unicode_delimeters and end_string_suffix variables of the Inliner class in parsers/rst/states.py. When inline_obj() in that file checks for an end-of-inline match ("endmatch = end_pattern.search(string[matchend:])"), the RE fails because end_string_suffix doesn't handle the use of any character after the inlined string's suffix. Inline literals are demonstrated in the attached files. Even the full-width space used in East Asian languages such as Japanese and Chinese doesn't work (lines 10 and 12). Adding a new line or ASCII space before/after the inlined string allows it to be parsed normally. Parsing demo.txt with rst2html.py gives the following errors: demo.txt:6: (WARNING/2) Inline literal start-string without end-string. demo.txt:12: (WARNING/2) Inline literal start-string without end-string. demo.txt:14: (WARNING/2) Inline literal start-string without end-string. In addition, on lines 4, 8 and 10, the inline literal is not even detected because it is preceded by a Japanese character (including the full-width space). Probably, either unicode_delimiters needs to be expanded to include the full character set from languages such as Japanese and Chinese (can it do a Unicode range?) or the patterns used to find the start and end of inlined strings need to be. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3166907&group_id=38414 |
From: SourceForge.net <no...@so...> - 2011-06-27 12:14:27
|
Bugs item #3166907, was opened at 2011-01-28 07:20 Message generated for change (Settings changed) made by milde You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3166907&group_id=38414 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Invalid Priority: 5 Private: No Submitted By: G Biggs (gbiggs) Assigned to: Nobody/Anonymous (nobody) Summary: rst parser doesn't handle languages without spaces Initial Comment: When parsing a reStructuredText document in a language without standard spaces, such as Japanese, inlining does not work as it should. The problem relates to the unicode_delimeters and end_string_suffix variables of the Inliner class in parsers/rst/states.py. When inline_obj() in that file checks for an end-of-inline match ("endmatch = end_pattern.search(string[matchend:])"), the RE fails because end_string_suffix doesn't handle the use of any character after the inlined string's suffix. Inline literals are demonstrated in the attached files. Even the full-width space used in East Asian languages such as Japanese and Chinese doesn't work (lines 10 and 12). Adding a new line or ASCII space before/after the inlined string allows it to be parsed normally. Parsing demo.txt with rst2html.py gives the following errors: demo.txt:6: (WARNING/2) Inline literal start-string without end-string. demo.txt:12: (WARNING/2) Inline literal start-string without end-string. demo.txt:14: (WARNING/2) Inline literal start-string without end-string. In addition, on lines 4, 8 and 10, the inline literal is not even detected because it is preceded by a Japanese character (including the full-width space). Probably, either unicode_delimiters needs to be expanded to include the full character set from languages such as Japanese and Chinese (can it do a Unicode range?) or the patterns used to find the start and end of inlined strings need to be. ---------------------------------------------------------------------- >Comment By: Günter Milde (milde) Date: 2011-06-27 12:14 Message: The behaviour is according to the specs: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#inline-markup Use escaped spaces (``\ ``) as a workaround to get no-spaced inline markup. Adding more Unicode characters to the recognized delimiters can be suggested in an enhancement request. CJK Punctuation seems a sensible choice. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3166907&group_id=38414 |
From: Guenter M. <mi...@us...> - 2011-01-31 07:58:54
|
On 2011-01-28, SourceForge.net wrote: > Summary: rst parser doesn't handle languages without spaces > Initial Comment: > When parsing a reStructuredText document in a language without standard > spaces, such as Japanese, inlining does not work as it should. The > problem relates to the unicode_delimeters and end_string_suffix > variables of the Inliner class in parsers/rst/states.py. When > inline_obj() in that file checks for an end-of-inline match > ("endmatch = end_pattern.search(string[matchend:])"), the RE fails > because end_string_suffix doesn't handle the use of any character after > the inlined string's suffix. This is the documented behaviour. It reduces the number of false positivs (allowing in-line markup inside words). > Inline literals are demonstrated in the attached files. Even the > full-width space used in East Asian languages such as Japanese and > Chinese doesn't work (lines 10 and 12). Adding a new line or ASCII > space before/after the inlined string allows it to be parsed normally. The "official" workaround is to use a protected space (``\ ``). > Probably, either unicode_delimiters needs to be expanded to include the > full character set from languages such as Japanese and Chinese (can it > do a Unicode range?) or the patterns used to find the start and end of > inlined strings need to be. IMV, the full-width space (and maybe other space variants from "General Punctuation") could be accepted around inline markup. I cannot say whether adding the full JCK character set is an improvement as it would require escaping the inline markup characters inside a Chinese/Japanese/Korean word. Günter |