From: Benny B. <Ben...@gm...> - 2009-01-21 14:41:11
|
Hi, Stelio wrote: > I just found out about GeSHi; what a fantastic tool! Well done and > thanks to everyone involved in creating it. :-) Thanks ;-) > I have been working on producing a language file for DCS (a data > conversion tool for the actuarial modelling platform, Prophet). (Current > work in progress is located at: > "http://stelio.net/wiki/extensions/SyntaxHighlight_GeSHi/geshi/geshi/dcs.php"; > but it's not quite ready to submit yet.) I'll have a look at it at the weekend and might give some hints if you like. > The only issue I have is with embedded C code. It is possible to write C > code within a DCS program, where anything between the keywords > 'INSERT_C_CODE' and 'END_C_CODE' is interpreted as C. This is so that > any limitations in DCS's functionality can be overcome through the use > of the more flexible C language. This isn't possible with GeSHi 1.0.X series, but workls fine with GeSHi 1.1.X (upcoming 1.2.X release branch). Though the developement there has been stuck for some time now for various reasons. You might have a look at 1.1.X though. For 1.0.X you have to mask the C code out, probably with COMMENT_REGEXP. > At the moment I don't believe that GeSHi can cope with nested languages, > but it seems like a good thing to add in. After all, we have a huge list > of language files already available, and it would be counterproductive > to duplicate the contents of one language file with another. Perhaps > there could be an additional set of fields that works like REGEXPS, and > as well as defining a style, you also set a language field. Then if you > have a set of code in language 'foo' that contains embedded code for > language 'bar', the style for the 'bar' code should be the style as > defined in the 'foo' language file for that embedded language, > *overwritten* by any styles from the 'bar' language file. (This allows > you to highlight a section of embedded code - say with a background > colour or different font - whilst still consistently marking up its style.) See the 1.1.X release. It does this and even some more. > Thinking about it, there would be a possible issue with precedence. > 'foo' code that is commented out and includes embedded 'bar' code should > be treated as a comment using foo's comment style. 'bar' code embedded > within 'foo' code should not scanned for 'foo'-style comments; just > 'bar'-style comments. So to do this correctly, GeSHi would need to check > for nesting of comments and embedded code (I don't know if it already > includes similar functionality for other purposes?). As a specific > language example: DCS uses a semicolon to denote single-line comments, > DCS can include embedded C code, C code uses semicolons to mark the ends > of statements; any semicolons between the INSERT_C_CODE and END_C_CODE > should not be treated as DCS comments, but an entire > INSERT_C_CODE...END_C_CODE block that is commented out in DCS *should* > treated as a DCS comment. Well, actually this needs "context" support. And this is what GeSHi 1.0.X lacks, but 1.1.X introduces. Both branches thus are incompatible in regards to their language files. > (Another example I can think of is that you can embed assembly language > (in Intel syntax) in emulated BBC BASIC on the PC. I can dig out the > program at some point if needs be.) I have enough sources myself that mix languages ... > But in the absence of such functionality, I've been trying ways to > simply mark the embedded C code so that it is not treated like DCS code > (although the DCS IDE itself completely fails to do this and marks it up > as though it were DCS code - an interesting effect to see given that in > DCS a semicolon is used for comments and in C it is used to end each > statement). I have tried using regular expressions, multiline comments, > and a combination of the two, but nothing is working in the correct way. > Here are the results of my attempts so far... > > > Using regular expressions: > > 'REGEXPS' => array( > 0 => array( > GESHI_SEARCH => '(INSERT_C_CODE)(.*?)(END_C_CODE)', > GESHI_REPLACE => '\\2', > GESHI_MODIFIERS => 'si', > GESHI_BEFORE => '\\1', > GESHI_AFTER => '\\3' > ) > ), > > - 'INSERT_C_CODE' is correctly treated as a keyword for the purposes of > applying a style, but 'END_C_CODE' is not. > - Doesn't work if 'INSERT_C_CODE' and 'END_C_CODE' are on different > lines. I've not used PHP before, so perhaps there's something better > than (.*?) to denote any string of characters optionally split over > multiple lines. > - Styles are still applied to code identified as DCS keywords or comments. > REGEXPS are applied to non-string parts, i.e. as last resort. COMMENT_REGEXP are handled even before (multiline) comments. > > Using multiline comments: > > 'COMMENT_MULTI' => array('INSERT_C_CODE' => 'END_C_CODE'), > or > 'COMMENT_MULTI' => array('insert_c_code' => 'end_c_code'), > > - This is case insensitive on 'INSERT_C_CODE' (as required) but case > sensitive on 'END_C_CODE', only matching the exact expression. If trying > to use both of the above expressions in an array, only the first is > applied (and I wouldn't want to have to include every case variation > anyway: 8 letters in 'END_C_CODE' would be 256 expressions). > - The two keywords 'INSERT_C_CODE' and 'END_C_CODE' are treated as part > of the comment and styles are applied as though they were commented out. > Rather we want them to be treated as keywords. > Use COMMENT_REGEXP. That's what it's for ;-) > Using BOTH multiline comments and regular expressions: > > - Multiline comments take precedence, and the regular expression is > ignored. This is true even if the regular expression definition is above > the multiline comment definition in the language file. See above > If anyone has a way for me to mark the embedded C code in some way so > that it is not treated as DCS code, I would be grateful for the > assistance. Primarily I just want to have it marked in a single style so > that it isn't treated like DCS code. If there's a way to correctly mark > up the C code as well, that would be fantastic. See above > Thanks, > Stelio. > Regards, BenBE. |