Menu

Syntax Highlighters

Michiel Hendriks

Syntax Highlighters

To make the following text a bit more clear the following terms are used:

  • xslthl-config: the configuration file for xslthl where a language id is associated with a syntax highlighting configuration.
  • configuration: the syntax highlighting configuration, which is discussed here.
  • highlighter: a mechanism that recognizes a string of characters and associates a style with it.

A syntax highlighting configuration contains a list of highlighters to use. The order in which the highlighters are listed is important, there are a few exceptions.

A configuration starts with a <highlighters> element. Within in element <highlighter> elements are used for each highlighter that needs to be used. In xslthl 1.x there was also the <wholehighlighter> element, but starting from 2.x <highlighter> and <wholehighlighter> are considered to be the same.</wholehighlighter></highlighter></wholehighlighter></highlighter></highlighters>

<highlighters>
    <!-- highlighters to use -->
</highlighters>

Highlighter element

A <highlighter> element contains a single attribute: type, which defines the highlighter to use. Within the <highlighter> the parameters for the highlighter are defined. Parameters are simply elements often with text content for the value. Some parameters are switches and do not take any parameters. Whitespace is stripped from the text content of the parameters. If whitespace should be preserved in the value, enclose it within a CDATA section.</highlighter></highlighter>

For example <param> text has the value 'text'. And <param><![CDATA[ text ]]> has the value ' text '.

On this page you'll find all known highlighters and the parameters they take.

Parameters

  • style: all highlighters accept the style parameter, it defines which style is applied to the highlighted part. The style parameter is absolutely required, however, most highlighters have a default style so it is often not needed to include this parameter unless you want to apply a different style. The style name must be a valid XML name because the style will be used as the name of the element in the return of the highlight function (see processing xslthl results for more information). There are two special style names:

  • none: the matched part is not highlighted. This can be used to prevent the next highlighter to add a style.

  • hidden: parts with this style name will be excluded from the output.

The next few sections will discuss the available highlighters, the italic part in the section name is the name of the highlighter as you can use in the type attribute.

keywords highlighter

This highlighter is used for highlighting keywords of a language. Keywords start with a letter or underscore, followed by a sequence of letters, numbers, or underscores. A key word is at least 2 characters long.

Parameters

Default style: keyword

  • keywords: can be used multiple times to define multiple keywords. The parameter value contains the word to be considered a keyword.
  • ignoreCase: this is a switch, if present the keywords are considered to be case insensitive. Thus begin, Begin, BEGIN, etc. would all be recognized by the defined keyword begin. Without this switch the keywords are considered to be case sensitive.
  • beginChars: a string of characters which should also be considered as beginning part of an identifier. Together with partChars this will allow you to extend the range of characters which make up an identifier in the language. By default the Java functions 1[Character.isJavaIdentifierStart] and 2[Character.isJavaIdentifierPart] are used to find identifiers. (Since 2.1)
  • partChars: a string of characters which should also be considered as part of an identifier. (Since 2.1)
  • exclusiveChars: this is a switch, if present only the characters defined by beginChars and partChars will be considered as part of the identifiers. (Since 2.1)

Example

<highlighter type="keyword">
    <keyword>begin</keyword>
    <keyword>end</keyword>
    <keyword>if</keyword>
    <keyword>then</keyword>
    <keyword>else</keyword>
    <ignoreCase />
</highlighter>

multiline-comment highlighter

This highlighter is used for comments which can span multiple lines. Everything between the start and end is considered to be part of the comment.

Parameters

Default style: comment

  • start (required): the string sequence that defines how the multiline comment starts
  • end (required): the string sequence that defines how the multiline comment ends

Example

<highlighter type="multiline-comment">
    <start>/**</start>
    </end>*/<end>
    <style>doccomment</style>
</highlighter>

nested-multiline-comment highlighter

Just like the multiline-comment except that multiline comments can be nested. For example:

/* multine comment
   /* nested comment
   */
*/

This is accepted as a while by this highlighter where the normal multiline-comment highlighter will stop at the first occurance of "*/".

Parameters

Accepts the same parameters as the multiline-comment highlighter.

oneline-comment highlighter

This highlighter is used for comments that end at a new line.

Parameters

Default style: comment

  • start (required): the string sequence that starts a single line comment. Alternatively this parameter can be defined as direct value of the highlighter element.
  • lineBreakEscape: the string sequence used to escape a new line. When this sequence is encounter right before a new line the comment continues. This can be useful to define highlighters for preprocessor directives. This is used in the C highlighter configuration to recognize the following as a directive:
#define foo(a,b) \
    doStuff(a); \
    doStuff(b);

Example

<highlighter type="oneline-comment">
    <start>#</start>
    <lineBreakEscape>\</lineBreakEscape>
    <style>directive</style>
</highlighter>
<highlighter type="oneline-comment">
    //
</highlighter>

string highlighter

This highlighter is used for string recognition.

Parameters

Default style: string

  • string (required): the sequence that defines how a string starts.
  • endString: the sequence that defines how a string ends. If omitted the same value as string is used.
  • escape: the character used to escape certain characters in a string, often a backslash is used to escape characters.
  • doubleEscapes: a switch, if present a double occurance of the endString value produces an escape. This is used in the Delphi/Pascal highlighter configuration where a single quote is escaped using two single quotes: <nowiki>'string with an escape '' character'</nowiki>
  • spanNewLines: a switch, if present the string continues after a new line. In a lot of langauges a string does not continue after a newline is encountered (this is often a syntax error). This switch enables strings to continue after a new line was encountered.

Example

<highlighter type="string">
    <string>@"</string>
    <endString>"</string>
    <escape>\</escape>
    <spanNewLines/>
</highlighter>

heredoc highlighter

This highlighter is used for highlighting HEREDOC constructions.

Parameters

Default style: string

  • start (required): the character sequence that starts the heredoc construction. After this sequence comes the identifier that is used to end the heredoc.
  • quote: can be used multiple times. Defines a character that may be used to quote the heredoc identifier. Perl uses quotes to define certain special behavior in the processing of the heredoc content (i.e. enable or disable variable expansion). These quote elements make sure the proper heredoc identifier is recognized.
  • noWhiteSpace: a switch, if present no whitespace may be used between the start sequence and the identifier. In quite some cases the start sequence is also used in a different context. The absense of a space between the identifier and the start sequences triggers the heredoc construction.
  • looseTerminator: usually the heredoc identifier must start on the beginning of a new line in order to be considered the end of the heredoc construction. If this switch is present the heredoc highlighter will stop at the first occurance of the heredoc identifier.
  • flag: Flags that can be put after the start. This parameter can be used more than once. For example in the Bourne shell the following is valid start of a heredoc section: &lt;&lt;-FOO. In this case '-' would be a flag.

Example

<highlighter type="heredoc">
    <start>&amp;lt;&amp;lt;&amp;lt;</start>
</highlighter>

annotation highlighter

A highlighter used to recognize annotations (or attributes as they are called in .NET).

Parameters

Default style: annotation

  • start (required): a character sequence that defines how an annotation starts, for example in Java this is with an @
  • end: the character sequence that defines the end of an annotation, if not defined the annotation ends when a non alpha numeric (or underscore) character is encounters (usually whitespace).
  • valueStart: the character sequence that defines how annotation values start. These are usually paranthesis.
  • valueEnd: the character sequence that defines the end of the value section of an annotation. The value start and end characters can be nested. The annotation will not end until the value part is completely closed (matching number of valueStarts and valueEnd).

Example

<highlighter type="annotation">
    <start>[</start>
    <end>]</end>
    <valueStart>(</valueStart>
    <valueEnd>)</valueEnd>
</highlighter>

word highlighter

A highlighter that recognizes arbitrary words. It behaves much like the keyword highlighter except that it does not enforce any rules on the content of the word, even embedded white space is allowed (but not leading and trailing). This highlighter is quite slow because each entry in the list of words is evaluated with the current buffer. So avoid this highlighter when possible.

Parameters

Default style: none

  • word: the words to recognize, can be used multiple times
  • ignoreCase: case is ignored while recognizing words

Example

<highlighter type="word">
    <word>&amp;lt;?php</word>
    <word>&amp;lt;?=</word>
    <word>?&amp;gt;</word>
    <style>directive</style>
</highlighter>

regex highlighter

Performs highlighting based on regular expressions. In the 1.x branch of xslthl this was a so called "whole highlighter". Starting from version 2.0.0 beta 2 this highlighter is a normal highlighter. This means that it follows the normal ordering guidelines for highlighters.

Parameters

Default style: none

  • pattern: the regular expression pattern to use. See Java Pattern documentation for information on what regular expression features are supported.
  • flags: a comma delimited list of pattern flags. See the Java Pattern documentation for information about the supported flags. Use the name of the constants (not their integer value).

Example

<highlighter type="regex">
    <pattern>^(.+)(?==)</pattern>
    <flags>MULTILINE</flags>
    <style>attribute</style>
</highlighter>

number highlighter

A highlighter that recognizes numbers, this includes integers and floating points (depending on the settings). Only the characters between 0 and 9 are considered to be numbers. There are not other special requirements or limitations. For example 0123 is usually an octal, this is simply recognized.

Parameters

Default style: number

  • point: character used for the decimal point. If not declared no decimal points are accepted.
  • thousands: thousand separator
  • exponent: the string used for recognizing the exponent part of a floating point
  • pointStarts: a switch, when set the value defined as point can also be used to start a number. For example ".1234" would also be accepted as a number.
  • prefix: required start of a number, can be useful in the hexnumber highlighter to define how a hexadecimal number is started
  • suffix: an optional string that can be found after a number, can be define multiple times. This is often used to set the "size" of a integer or floating point.
  • ignoreCase: all strings parameters are case insensitive
  • letterNoFollow: If set, numbers may not contain a letter at the end. For example: 123kg will not be seen as a number. This was the default before version 2.1

Example

<highlighter type="number">
    <point>.</point>
    <pointStarts />
    <exponent>e</exponent>
    <suffix>ul</suffix>
    <suffix>lu</suffix>
    <suffix>u</suffix>
    <suffix>f</suffix>
    <suffix>l</suffix>
    <ignoreCase />
</highlighter>

hexnumber highlighter

A subclass of the number highlighter. It works exactly the same except that it recognizes the characters 0 to 9 and A to F as numbers (case insensitive).

Example

<highlighter type="hexnumber">
    <prefix>0x</prefix>
    <suffix>ul</suffix>
    <suffix>lu</suffix>
    <suffix>u</suffix>
    <suffix>l</suffix>
    <ignoreCase />
</highlighter>

xml highligher

The XML highlighter is a special highlighter. It is executed separately from the all other highlighters (and the very end). The XML highlighter recognizes both XML and SGML syntax, in fact it does not require well formed content.

This highlighter assigns the following styles to the following XML/SGML elements:

  • directive: for processing instructions
  • comment: for comments
  • tag: for normal elements
  • attribute: for tag attributes
  • value: for the values of attributes
  • doccomment: for the complete DOCTYPE element (including all parts of it)

CDATA sections are also recognized. CDATA tags will be highlighted as tag, the actual content will not be highlighted.

Parameters

The style parameter is not used for this highlighter.

  • styleElement: the style to use for the elements (tags). Defaults to: tag
  • styleAttribute: style to use for element attributes. Defaults to: attribute
  • styleValue: style to use for attribute values. Defaults to: value
  • styleComment: style to use for comments. Defaults to: comment
  • stylePi: style to use for processing instructions. Defaults to: directive
  • styleDoctype: style to use for the doctype and its contents. Defaults to: doctype
  • elementSet: allows you to override the style for certain elements based on their name. This parameter contains 3 sub-parameters:

  • element: the name of an element, can be used multiple times

  • style: the overridden style name
  • ignoreCase: a switch, when set the tag names are case insensitive, useful in case of HTML tags

  • elementPrefix: allows you to override the style for certain elements based on their prefix. This paremeter contains two sub-parameters:

  • prefix: the prefix of an element

  • style: the overridden style name

Example

<highlighter type="xml">
    <elementPrefix>
        <style>xslt</style>
        <prefix>xsl:</prefix>
        <prefix>xslt:</prefix>
    </elementPrefix>
</highlighter>

Plugin highlighter

Additional highlighters can be loaded via plugin mechanism. For example:

<highlighter type="java:net.sf.xslthl.plugins.AltOnelineComment" classpath="plugins/plugins.jar">##</highlighter>

For an example implementation see: http://xslthl.svn.sourceforge.net/viewvc/xslthl/trunk/xslthl/examples/sources/plugins/


Related

Wiki: Home
Wiki: Overview
Wiki: Processing xslthl results

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.