Strange behaviour with some non-ASCII symbols
Status: Beta
Brought to you by:
laboratoryman
Strange error when working with non-ascii symbol: russian letter "п" ("pe")
Text.Regex> let x = mkRegex "(а|б|в|г|д|е|ё|ж|з|и|к|л|м|н|о)" --First half of russian alphabet
Text.Regex> matchRegex x ""
NothingText.Regex> let x = mkRegex "(а|б|в|г|д|е|ё|ж|з|и|к|л|м|н|о|п)" --Next letter
Text.Regex> matchRegex x ""
*** Exception: user error (Text.Regex.Posix.String died: (ReturnCode 13,"Invalid preceding regular expression"))Text.Regex> let x = mkRegex "(а|б|в|г|д|е|ё|ж|з|и|к|л|м|н|о|[п])" --Wrapped it
Text.Regex> matchRegex x ""
NothingText.Regex> print 'п'
'\1087'
Let me add other example in Japanese.
There is a space between "aaa" and "bbb", but the first one has Japanese space whose character code in UTF8 is E38080. The match result should not be different.
Prelude Text.Regex.Posix> "<tag>aaa bbb ccc</tag>" =~ "</tag>" :: Bool
False
Prelude Text.Regex.Posix> "<tag>aaa bbb ccc</tag>" =~ "</tag>" :: Bool
True