Often when cutting and pasting Maxima expressions from other programs (gmail, Word, etc.), various invisible characters get introduced and confuse the Maxima parser:
ex: 23;
incorrect syntax: 23 is not an infix operator
ex: 23;
(In the github Markdown editor, these characters show up as a red dot in edit mode)
In this example, there is an invisible zero-width space character () before 23
, but the error message is mysterious. Other such characters include the zero-width joiner, the zero-width non-joiner, etc. Fortunately, Maxima does treat the non-breaking space (NBSP / ) as a space.
There are three reasonable possibilities here: * Give an error * Ignore * Treat it as a space
The error option is fail-safe: code won't inadvertently mean what it wasn't meant to mean.
But the ignore and space options are more convenient most of the time, since they'll probably (!) do what was intended.
Complication: What about these characters within a quoted string?
I think such characters should be treated as spaces by the parser. Also they should be preserved in quoted strings.
I open bug #4039 about a related topic, punctuation characters.
I'm working on a patch to treat Unicode space characters as whitespace. The characters to be handled are:
which I got from a list on the web (https://jkorpela.fi/chars/spaces.html).
This is going to make handling every non-whitespace character a little bit slower. I haven't investigated, but if it turns out the effect is too much, we can cut down the list. I suspect the effect on the speed of the parser won't be an issue, but I don't know that for sure yet.
If we avoid iterating over the Unicode string from the user for each of these characters separately replacing all of these characters with spaces should be quite fast.
Dowe already handle TAB characters correctly? And would it make sense to implode consecutive spaces into one when not in a string?
In wxMaxima I always to exempt strings from changing characters hoping that the next step isn't a parse_string().
Does it not work to add these characters to
*whitespace-chars*
?Fixed by commit 682395f, which does indeed just put the Unicode space characters on
*WHITESPACE-CHARS*
(for Unicode-aware Lisps; no change otherwise) in src/nparse.lisp. Closing this ticket as fixed.