Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#74 Syntax highlighting for JOE 3.5 regular expressions

Feature (36)
Derek Peschel

The included file, regex.jsf, shows how JOE's regular expression parser (regex.c) would interpret the current buffer if you copied the buffer to a "find" prompt. Eventually I would love to see JOE use similar highlight rules at the "find" prompt itself, and a smaller set at the "replace" prompt. But regex.jsf still needs testing and optimization. Also, JOE prompt histories hold a series of lines, but regex.jsf currently highlights one multiline buffer.

The colors are designed for a 16-color terminal with a white background. They avoid underline and reverse so all 256 characters can appear without ambiguity. I'm ignoring Unicode because I believe neither the syntax highlighter nor the real regex parser supports it.

The colors mean:

black (on my terminal)
characters that would match themselves

light gray (on my terminal)
backslashes in backslash sequences (meant to be an unobtrusive color)

characters after a backslash, where the sequence would match one copy of one character, like the a in \a or the 000 in \000

characters after a backslash, where the sequence would match a sequence of different characters, like the c in \c

characters after a backslash, where the sequence would anchor the search, like the $ in \$

light gray (on my terminal)
the [ in \[
the ] paired with the [
the inverse markers * or *
the dash - when characters or backslash sequences come before and after, or ] comes after

bold black, bold green
the characters before and after the dash

the + in \+

bright red
all characters that would make the match fail (which sometimes happens because a character comes at the end of the buffer, and sometimes because a backslash sequence is not recognized anywhere in the buffer). There is no guessing about the meaning of unrecognized sequences; text after one remains red.

Backslashes change color as you edit the buffer, but only to show that they are literal or cause failure or change the meaning of some characters afterward. Backslashes do not change color if they begin a sequence that's next to a dash in a character class, for example. That's deliberate, to keep the state machine from getting even bigger than it is now.

I am still writing documentation and test cases. regex.c has many special cases which I've tried to handle. Comments welcome.


  • Derek Peschel
    Derek Peschel

    Syntax highlighting for JOE 3.5 regular expressions