Text.Regex.Lazy / Bugs / #4 Strange behaviour with some non-ASCII symbols

#4 Strange behaviour with some non-ASCII symbols

Milestone: v1.0 (example)

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2015-09-23

Created: 2014-08-26

Creator: Valentin Shirokov

Private: No

Strange error when working with non-ascii symbol: russian letter "п" ("pe")

Text.Regex> let x = mkRegex "(а|б|в|г|д|е|ё|ж|з|и|к|л|м|н|о)" --First half of russian alphabet
Text.Regex> matchRegex x ""
Nothing

Text.Regex> let x = mkRegex "(а|б|в|г|д|е|ё|ж|з|и|к|л|м|н|о|п)" --Next letter
Text.Regex> matchRegex x ""
*** Exception: user error (Text.Regex.Posix.String died: (ReturnCode 13,"Invalid preceding regular expression"))

Text.Regex> let x = mkRegex "(а|б|в|г|д|е|ё|ж|з|и|к|л|м|н|о|[п])" --Wrapped it
Text.Regex> matchRegex x ""
Nothing

Text.Regex> print 'п'
'\1087'

Discussion

Yuji Nishida - 2015-09-23

Let me add other example in Japanese.

There is a space between "aaa" and "bbb", but the first one has Japanese space whose character code in UTF8 is E38080. The match result should not be different.

Prelude Text.Regex.Posix> "<tag>aaa　bbb ccc</tag>" =~ "</tag>" :: Bool
False
Prelude Text.Regex.Posix> "<tag>aaa bbb ccc</tag>" =~ "</tag>" :: Bool
True

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Strange behaviour with some non-ASCII symbols

Group

Searches

Help

#4 Strange behaviour with some non-ASCII symbols

Discussion