User Defined Languages
Notepad++ makes it possible to define "languages", or more precisely highlighting schemes. The original purpose of the feature is to enable proper highlighting of files in a programming language for which an internal Scintilla lexer is not available. Some users have reported they were succesfully using a custom defined language in areas unrelated to programming, like to-do lists. A directory of known User Defined Language Files is being maintained, enabling to share such files across all the user community..
As you can see, the syntax highlighting and the syntax folding are applied on the document userDefineLang.xml (on the left) thanks to the definition in the User Defined Language dialog (on the right).
Please note that:
- When the dialog is undocked, a slider is displayed so as to make its transparency adjustable;
- When displaying the default ("User Define Language") language, the Rename and Remove buttons are not displayed.
Built-in versus User Defined Languages
Some of the 'standard' or 'more common' programming languages are not built with this tool. For example, Batch. And they cannot because they often have very idiosyncratic syntax features which are nearly unique. Making them as something parametrisable like in the User Defined Languages panel seems both unrealistic and unconvenient, as it would lead to an even more complex interface. Parts of the bounds of what can be achieved appear in Limitations Of User Defined Languages.
If you want to modify a language that was not built with the User Define Dialog, then you use Settings -> Styler Configurator. Note, there are some features in the 'more common' languages you can't modify. For example, you can't add to the list of operators in Batch, nor can you add keyword categories.
Overriding a built-in language
If you want to replace a built-in language such as Batch, do the following.
- Go to Settings -> Preferences -> Language Menu. Disable Batch by moving it to the right hand panel. Now there is no definition for the '.bat' extension.
- Create a new language with User Define Dialog and specify the extension (.bat for batch files). Obviously, common limitaions of user defined languages will apply to yours.
The Language menu
The Language menu of Notepad++ lists user selected (by default, all) Built-in Languages, as well as any user defined languages found in userDefineLang.xml, plus imported user defined languages and those using external lexers. If there are duplicate entries in userDefineLang.xml, the last one wins.
Using the Settings -> Preferences -> Language Menu tab, you can select which built-in languages will appear on that menu. This is specially handy as the full menu is more than 50 item tall and may not fit on all monitors - you may check the Make language menu compact checkbox above the lists for better results. The languages are presented in the order they are loaded from langs.xml and stylers.xml.
User defined languages are all listed beneath the built-in ones, after a menu separator. They cannot be hidden. To temporarily cause a user defined language not to appear, cut and paste its data from userDefineLang.xml to some text file. The specific procedures that Editing Configuration Files involves apply.
How To Create or Modify User-Defined Languages
You get the tool for creating or changing your language from View -> User Define Dialog.... Be patient, it can take quite a while to load.
Because the dialog is very tall and is not expected to fit on an average monitor, it is recommended to dock it, and then use the vertical scrollbar to access contents which might not be visible. You will need to undock the dialog in order to close it when done. A screen resolution of 1280 x 1024 ensures most parts are visible.
Overview of User Define Language dialog
It consists of 2 parts : global functionalities part and definition part.
The global functionalities are for the global operations:
- Choosing the language.
- Creating a user language. The new name must not exist already.
- Renaming a user language. The new name must not exist already.
- Removing a user define language
- Deciding whether the language is case sensitive
- Declaring the file extensions it will be associated with. If such an extension is already bound to a built-in language, it will be ignored.
The definition part is about:
- Defining the default style, typically used to display identifiers
- Defining the folder symbols for the language.
- Defining the keyword lists for the language
- Defining the comment, string and character delimiters
- Defining operators.
"Defining" involves choosing background and foreground colors, as well as font face, sizee and attributes, much like on the Styler Configurator dialog.
The definition is split across several tabs for convenience. The global part remains visible at all times.
The default language
You can define your keywords, folder block and comment under the default User Define Language.
However, all the definition you made will be temporary under the User Define Language (ie. they will disappear in the next session). If you want to keep you definitions, you have to save it by clicking Save as... button. Once you give it a name and save it, you can use it or modify it afterward.
You can also associate file extensions with your defined language - so every time you open the files with the extension that you associated, the highlighting of your defined language will be applied on the document automatically.
Of course, you can rename your language, use it as a model (Save as...), or remove it.
Folder & Default panel
This panel allows users to define the default style, folder keywords and folder styles.
The default style is all non-defined styles (i.e. all the non-defined words). This is how identifiers will be rendered.
However, the background color of the white space, ie the common background over which text is being drawn, is a global property which is not set by language. You should use Settings -> Styler Configurator -> Global styles , Default style property for that. Or change the theme using the dropdown list at the top of that dialog box.
Fold words or symbols
The folder definition consists of two parts: the open folder definition and the close folder definition. They should work together as a pair.
The open folder words determine where a fold should start, and the older close words, where it should end. They must not appear on any keyword list for the language. Likewise, you must choose whether a symbol is to act as a fold point or be highlighted as a delimiter - the latter takes precedence.
Let us look again at the introductory screenshot. In the above figure, two blocks are defined by the keywords : blockBegin, blockEnd, if and fi. With the keywords defined, the User Language Define System is able to form the block(s) that user can fold or unfold.
Notice that if you define several keywords in the open folder definition or/and in the close folder definition, all the close folder symbols will close a fold regardless of the open folder symbol it starts with. In the given example, blockBegin and fi may form a block if you treat them as a pair.
Keywords lists panel
There are 4 groups of keywords : that means 4 styles are available for general use.
If Prefix mode is checked in a category, then all the words with the prefixes defined will be recognized. For example, with definition "bla", all the following words are recognized: "blapu" "blabla" "blablablabla" "blabc" etc.
Spaces don't count as being part of a word, although non alphabetic characters do. The exact set of rules is shown in #Operators Panel.
Comment & Numbers panel
In this panel, user can define the style of number and the style of comment (as well the comment symbol definition) . There are 2 kinds of comment that you can define in the User Language Define System - comment line and comment block. Different from the other styles, the comment style makes the style not only on the defined symbols, but also on the comment block or the comment line.
Numbers are highlighted whether in integral, decimal or sientific notation.
There are two rules for User Defined Language System that you should keep in mind :
- An elementary unit (a token) is always terminated by a white space, a TAB symbol ('\t'), a new line symbols ('\n'), an illegal character or an operator that you defined.
- All the symbol characters are a part of elementary unit (a token), unless they are defined as operators.
Keeping these 2 points in mind, we can understand easily the following example: The token "INTEGER" is recognized thanks to keywords definition (i.e. "INTEGER" is in the keyword lists). If '(' is not defined as an operator, the second "INTEGER" won't be recognized, because "(INTEGER" will be treated as an unit (or a token), and it is not in the keyword lists.
They traditionally enclose strings or characters, although they could be used for anything else.
Since delimiters are characters, most languages allow embedding them in strings or quoted characters. To avoid erroneous parsing of such embedded delimiters as actual delimiters, they may be preceded by a special character, the escape character (often a backslash ( \ ) ). You can enable this feature if your language includes it, and choose the escape character.
Once a language has been saved (actuallly, made known) there is no need to save it again. Any changes made on the dialog page are immediately saved. A nice effect of this is that if you have the appropriate file open in Notepad++, any changes made to the language are immediately reflected in the Notepad++ file window.
Sharing of highlighting preferences for either built-in or user created languages, is discussed in Syntax Highlighting Sharing.
Some languages have features that the User Define Language framework does not handle, like multipart keywords, keywords with non letters inside, escaped quotes, and more. Notepad++ can cause its embedded Scintilla controls to highlight source code in these languages properly, using external lexers, which are ordinary plugins with some extra functionality.
For generalities on plugin development, see Plugin Development, which also has detailed information on building lexers. The anatomy of a supposedly average internal lexer is exposed in Commenting LexPascal.cxx.