Thread: [joe] An alternative syntax-highlighting approach

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Just remembered something I meant to post a week or so ago.

I happened to be talking to Philip Hazel, who told me that he was intending
to release a new version of PCRE shortly (the Perl-Compatible Regular
Expressions library). A new feature he's going to include is the ability to
indicate partial matches. That is, if the expression runs out of input data
but the RE has matched so far, then it can indicate this condition. It's
intended for match-as-you-type applications.
http://www.pcre.org/

This suggests to me another approach which Joe could use for syntax
highlighting. In each state, you give a set of regular expressions with
colours and next state. As you type, you match expressions from an anchor
point; if more than one matches you colour using the first. When exactly one
expression matches and is a complete match, then you move the anchor point
on, switch state, and start again. In fact, for simple highlighters you
probably don't need a 'state' at all, just the anchor point.

A simple one-state XML highlighter might look like:

<?.*?>			PI
<!--.*-->		Comment
<!\[CDATA\[.*\]\]>	Cdata
<!.*>			Entity
</.*>			EndTag
<.*/>			EmptyTag
<.*>			StartTag
.			Content

A more advanced one might want to use states to do more intelligent checks:
e.g. for comments,

:State0
<!--			Comment		:State1

:State1
-->			Comment		:State0
--			Error		:State0
.			Comment		:State1

That in itself simplifies the design of highlighters considerably (and
pretty much eliminates the need for 'recolor'), but it also adds a lot of
expressive power. Using back reference matching, you could for example match
the perl construct s/foo/bar/ using

s(.).*\1.*\1

[which also matches s,foo,bar,  s@foo@bar@  etc]. If you want to colour the
individual parts differently then you'd also need a way to indicate it: e.g.

s(.)(.*)(\1)(.*)(\1)	Keyword Delim Data Delim Data Delim

Means "color the first parenthesised expression as Delim, the second as
Data, the third as Delim, the fourth as Data, the fifth as Delim, and
everything else as Keyword"

Precompiling the REs at startup time should still make this a pretty
efficient approach.

What do you think?

Regards,

Brian.

Thread: [joe] An alternative syntax-highlighting approach

joe-editor-general