Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#168 Syntax Highlighting to Have Unlimited Number of Token types

open
nobody
core (195)
5
2007-09-23
2007-09-23
Ondra
No

Hello,

I found the fixed number of token types in jEdit very restrictive and unpleasant. When I have some combined code (HTML + PHP + evt. JavaScript + CSS), jEdit colors everything in the similar colors. But having blocks written in different language looking different would be so nice...

I studied jEdit's highlighting definition syntax and figured out that this is not limited by the highlighting system, but by the number of token types.

I've tried jEdit long time ago, then it was quite user unfriendly, crashing and run slowly on my then computer. Now I've found that with plugins, it has all features I look for to switch from my favorite but old editor, HomeSite (which's developement has already ended), EXCEPT for the genial syntax coloring of HomeSite... I will put a screenshot at http://ondra.zizka.cz/temp/HomeSite_screenshot.png . (Intentionally synthetized mix of all languages together, what is bad practice).

So, my feature request is:

As far as the "parsing" system is capable of the feature I ask for, and even the mode files would not have to be rewritten, I guess this is only a matter of the following:

Let's not have fixed set of token types; instead, let's track all token types of each mode and let it be configurable similarly to shortcuts:
1) Separate color configuration for each mode, and
2) Global default color config for certain token types (comment, keyword1, operator), which would be applied if the specific mode setting would be "use default for this token type".

Regards,
Ondra Žižka

Discussion

1 2 > >> (Page 1 of 2)
  • Ondra
    Ondra
    2007-09-23

    Screenshot of HomeSite highlighting, and separated file pane in Project View

     
  • Logged In: YES
    user_id=1477607
    Originator: NO

    I don't think we are limited by the number of tokens, although I agree that in principle there is no reason to limit the number of tokens.
    I've had a few discussions with some of the developers about syntax highlighting. I, too, suggested to do all that you ask in this feature request, and more. However, since the jEdit core is fairly complex already, I've written a plugin named SyntaxHelper. Currently, this plugin only makes it easy to configure the style of each token, by showing you the syntax highlighting option pane in a dockable window and letting it follow the caret and show you the style of the token under the caret. Unfortunately, it depends on the latest jEdit development version and can't be released until jEdit 4.3pre11 (or 4.3final) comes out.
    A future plan for this plugin is to enable mode-specific token styles and names. That is, each mode will be able to bind its own names for the existing token types, which will be more meaningful for the mode, and also suggest default styles for these token types. Users will be able to customize the global defaults as well as the mode-specific styles for each token type using the plugin. I don't want to introduce more complexity into jEdit for something that can easily be done in a plugin. I also thought of some revolutionary idea, where the plugin will let the user pick an arbitrary editor window on the screen (of some other editor) and "import" the style settings from that editor by querying the OS for the text style where possible (or apply some AI technique, but this gets out of scope for a SyntaxHelper plugin :-) and use it for the token type that jEdit would map this text to. But both of these are long-term plans.

     
  • Ondra
    Ondra
    2007-09-23

    Logged In: YES
    user_id=1053064
    Originator: YES

    Ah... and was that discussion public?

    And, if jEdit core devs decide to keep limited number of token types, could they at least add some? That shouldn't be that hard. My suggestion is below. I know that such solution is not much systemic, but better than nothing.

    When do you assume that plugin could be ready for use? I would love to help, but yet I do not know anything about jEdit plugins writing. If you had some task that common Java programmer could do, tell me.

    Suggested token types:

    Most important: DELIMITER for delimiters between different languages (E.G. HTML / PHP <? ?>, C++ / embeded SQL, etc.) Such delimiters are the very important when editing combined code.

    Also imporant: MARKUP2 - MARKUP8 for different colors for different XML/HTML tags - having the same colors for all tags is really ugly. See the attachment for an example of nice colored HTML and you simply must agree that such coloring is much prettier and much more lucid.

    FUNCTION2 (e.g. for PHP built-in functions)
    FUNCTION3 (e.g. for JavaScript -||-)
    FUNCTION4 (e.g. for CSS selectors)

    OPERATOR2 (e.g. for PHP's @ operator, which is kind of exceptional).
    OPERATOR3 (e.g. for PHP's . operator, which usualy appears between strings and should be more visible).

    KEYWORD 5 to 8 - each language has it's own keywords and it's own highlighting style (or should have)...

     
  • Logged In: YES
    user_id=1477607
    Originator: NO

    My discussions were not public. I actually suggested that each mode file defines the token types it uses, and that jEdit collects the token types from the mode files and enables their customization in the Global Options dialog. At least some of the token types have hard-coded semantics, so the token types can't be purely defined in the mode files.

    The plugin is currently very limited and only provides a user friendly way to customize the existing token type styles. In its current, limited version, it will only be available after the next release of jEdit (4.3pre11 or something like that). Thanks for your suggestion to help, maybe we can work together on this. I suggest to continue the discussion you started in the community site.

    Regarding token types: I don't think we are limited by the number of token types. jEdit defines 17 token types, do you know of a language that requires more?
    Regarding delimiters between languages: This is not a matter of token types. The concept is that a buffer is opened in an edit mode (a single edit mode). I don't know if jEdit currently handles several languages in the same buffer, as far as I know it does not (correct me if I'm wrong...), and if this is the case, a massive change is required in jEdit to support that and I doubt this will ever happen due to the complexity. However, I think it can be done, to some extent, in a plugin. I suggest to discuss this along with the other requested features on the community site until we reach some decision.

     
  • Logged In: YES
    user_id=1477607
    Originator: NO

    Sorry, I was completely mistaken in the last part. It turns out jEdit has support for mixed-language buffers which is quite nice. I still think that the number of tokens is sufficient, unless a single language requires more token types for itself. I think that jEdit cannot define global semantics to the fixed set of token types - the semantics is set by the mode files, and each mode files can use the token types for whatever semantics it wishes. That's why I suggested to enable mode-specific names for the token types which indicate the sematics of the types for the mode (to go along with mode-specific styles for the token types).

     
  • Ondra
    Ondra
    2007-09-26

    Screenshot of jEdit highlighting, and single file and folders pane in Project View

     
    Attachments
  • Ondra
    Ondra
    2007-09-26

    Logged In: YES
    user_id=1053064
    Originator: YES

    Continuing the discussion here to provide arguments for implementing this in one place.

    Yes, jEdit can handle multiple languages gracefully.

    And no, I think that current set of token types do *not* suffice:

    Originally, jEdit (as it seemed to me) was originally intended to edit single-language files, mainly Java and ocassionally the others. And the idea was, that the programmer likes all the languages look the same, e.g. operators in red, keywords in blue, etc.

    Then, after implementing very precious syntax highlighting system and few tweaks of it, it become able to parse multiple-language files, like the HTML + PHP + CSS + JavaScript quartet. And at this moment, having all languages look the same turned to be great disadvantage - sources are hard to scan and navigate through. Just try to open some mixed PHP file in jEdit and compare it with the attached image. I also attach the screenshot of jEdit. Notice how ugly the code is, compared to HomeSite's hilite.

    The main problem is that the token types are not differentiate by language - what leads to situation, when ALL HTML tags are simply said to be "MARKUP", what is absolutely insufficient, and also the PHP delimiters, which should be the most visible part of document, have the very same look! And other things, like all HTML arguments being colored the same way as PHP strings using LITERAL<n>, and so on and so on.

    Having token types differentiate by language would allow nice syntax highlighting and for web editing, jEdit would become my "weapon of choice" :)

    And, about the global stuff: That would be possible, using the KISS-like string matching. Then:
    PHP::String could define LITERAL1 as its default (definition in the mode file, of course).
    PHP::LineComment, PHP::BlockComment, JavaScript::LineComment, JavaScript::BlockComment, and HTML::Comment could define COMMENT1 as their default. CSS::BlockComment could use other, in example.

    That's almost whole my idea. Very simple in principle and I guess it would not take that much effort to implement it, as far as I had some affairs with syntax highlighting programming on several occasions. Better do it now, before too many plugins will have to be modified after such change ;-)
    File Added: jEdit_screenshot.png

     
  • Ondra
    Ondra
    2007-09-26

    Logged In: YES
    user_id=1053064
    Originator: YES

    For the case someone would implement some of functionality described, I should be able to rewrite the PHP mode file to test with.

     
  • Logged In: YES
    user_id=285591
    Originator: NO

    Hi, I'm sorry but I don't agree with you, I looked at the screenshots, and the difference between jEdit and Homesite is that in Homesite the html tags very basic syntax highlight.
    Homesites seems to choose a color for each tag and use this color for the entire tag and it's attributes.
    I don't think we should have different tokens for different languages. But it may be possible to have different style for the same token according to the language.
    Another idea, I don't know if it would be easy to do :
    add a background color when delegating to another edit mode.

     
  • Ondra
    Ondra
    2007-09-28

    Logged In: YES
    user_id=1053064
    Originator: YES

    First, that is not the only difference, either I've chosen bad example, or you looked badly.

    Second, HomeSite uses similar model of syntax hl - external files describing the syntax (compiled into an executable). Coloring depends on the parser, not HomeSite. The fact I am pointing at is that HomeSite can have unlimited number of tokens, thus the attributes like "onclick" could be parsed as JavaScript.

    Third, different styles according to the language would be a workaround; having the same set of tokens for all languages in the world is (imho) generally bad idea, as the languages themselves do not have the same tokens. Eg., for HTML, it would result in using different token types for different tags. Having just one MARKUP is just too less. And for other languages, more MARKUPs will remain unused.

    Fourth, what do you mean by "language"? If you ever opened the mode files, you could see that the languages are mixed together in one file and currently can not be differentiated, as they have just a name and no namespace. And if you mean "different mode files", that would not solve the HTML + PHP + ... mixed file coloring. The same is true for different background color for each edit mode - HTML and PHP are in the same file.

     
1 2 > >> (Page 1 of 2)