To Menno Vogels :Update of the ASM parser

2013-12-02
2013-12-11
  • THEVENOT Guy

    THEVENOT Guy - 2013-12-02

    Hello Menno,

    May be it would be better to update your first contribution to an ASM parser, that you put in the general thread FunctionList Parsers: Additions & Suggestions, with the last state, after our discussion about the comment right regex !

    So, the last ASM parser should be, as below :

    <association langID = "32" id="ASM_function" />
    
    <parser
        id="ASM_function" displayName="Assembly" commentExpr="(?<=;).*?" >
        <function
            mainExpr="^[\t ]*[A-Za-z_$][A-Za-z0-9_$]*:"
            displayMode="$functionName" >
            <functionName>
                <nameExpr expr="[A-Za-z_$][A-Za-z0-9_$]*(?=:)" />
            </functionName>
        </function>
    </parser>
    

    Many thanks for your implication and all the tests, in order to find various parsers :-)

    Cheers,

    guy038

    P.S. :

    I suppose it would be clear to note, into comments, for each parser created :

    • The name of the author
    • The date of the version
    • A version number
     
    Last edit: THEVENOT Guy 2013-12-02
  • Menno Vogels

    Menno Vogels - 2013-12-02

    Hi guy038,

    Good point, I will add the my latest Assembly parser to the FunctionList Parsers: Additions & Suggestions thread.

    The question of the thread that lead to our discussion was not specifically about the correct comment regex and the assembly parser was only intended as an example.

    The question was about the flavor/type of regex-engine used in FunctionList as it directly determines which tokens are and which are not supported.

    Regards, Menno

     
  • THEVENOT Guy

    THEVENOT Guy - 2013-12-04

    Hi Menno,

    Oh yes ! Meanwhile, I just forgot your main idea, about that topic ! But, we may try to find out which flavour is used, by ourselves !? Doing tests with various forms of regexes, with the help of the excellent regex infos site of Jan Goywaerts, at the address below :

    http://www.regular-expressions.info/reference.html

    Just click on a regex category from Regex Reference, on the left part of the screen.

    Tomorrow, I'll begin to do some tests with .INI files, based on my tiny parser, relative to ..INI sections, described at the END of my previous post below :

    http://sourceforge.net/p/notepad-plus/discussion/331753/thread/627d7ac2/#9ff7

    May be, we'll be able to restrict the number of possible flavours ?

    See you soon,

    guy038

     
    Last edit: THEVENOT Guy 2013-12-04
  • Menno Vogels

    Menno Vogels - 2013-12-05

    FYI, I have been using the following text as .INI file contents for my parser test:

    ; comment text starts with a semi-colon at the start of the line.
    
    ;also valid comment text (Note: no space after semi-colon).
    
    # some implementations also accept the number-sign (#) as 
    #comment start indicator.
    
    ; use multi single line comment texts to create a
    ; multi line
    ; comment
    ; text.
    
    ; 'global' keys/properties can be defined before the start of 
    ; any section.
    key_name=key_value
    ; a.k.a.
    proprty_name=property_value
    
    ; trailing line comment is not supported, it will be interpreted 
    ; as part of the value in e.g.
    name=value    ; example
    ; the value will equal "value    ; example" (excluding the quotes)
    
    ; let's define a section
    [section]
    key_name=key_value
    ; a.k.a.
    proprty_name=property_value
    
    @=default_value (used in dot-reg files)
    "name"=value
    "name"="value"
    'name'=value
    'name'='value'
    
    ; A left/opening square bracket ([) indicates the start of new
    ; section and the end of the previous section.
    ; Nesting of sections is not supported, but can be emulated as
    ; follows:
    [section\sub_section\sub_sub_section]
    name=value
    
    ; or
    
    [section.sub_section.sub_sub_section]
    name=value
    
    ; or
    
    [another_section]
    sub_section.sub_sub_section.name=value
    
    ; End of File automatically defines end of last section.
    

    Regards, Menno

     
    Last edit: Menno Vogels 2013-12-05
  • THEVENOT Guy

    THEVENOT Guy - 2013-12-11

    Hello Menno,

    Sorry, but I wasn't at home this weed-end and other things keep me away from computer and N++ :(

    After some tests, with your .INI test file, I could draw some facts :


    Either comments, sections, keys and properties can be eventually preceded by any number of the four characters : Vertical Tabulation ( \x0B ), Form Feed ( \f ), Space or Tabulation ( \t )


    The more simple form to match sections of an .INI file could be ( Case A ) :

    <parser id="ini_section" displayName="INI Section"> commentExpr="^[\t \f\x0b]*[#;].*?$">
        <function
            mainExpr="^[\t \f\x0b]*\[\K.+?(?=\]$)"
            displayMode="$functionName">
        </function>
    </parser>
    

    Just note that the commentExpr expression wouldn't be necessary, as well !

    A more elaborate form, using the functionName, could be ( Case B ) :

    <parser id="ini_section" displayName="INI Section" commentExpr="^[\t \f\x0b]*[#;].*?$">
        <function
            mainExpr="^[\t \f\x0b]*\[.+?\]$"
            displayMode="$functionName">
            <functionName>
                <nameExpr expr="\[\K.+(?=\]$)"/>
            </functionName>
        </function>
    </parser>
    

    With the help of the Goywaerts's site, at the address below :

    http://www.regular-expressions.info/refadv.html

    I tried to find out the right flavour of the FunctionList parser. But, no chance ! It's really not clear at all :((

    For example, as the \K form, obviously, works, that means that it should be one of these 6 flavours : Perl from 5.10, PCRE from 7.2, PHP from 5.2.4, Delphi, R or Ruby from 2.0

    But, as all parts of alternatives, inside a lookbekind, can't differ in length, the only possible flavour seems to be Perl !?

    For example, in case B above, the regex of nameExpr can be written :

    <nameExpr expr="(?<=\[|§).+(?=\]$)"/>
    

    But, the syntax, below, doesn't work at all :

    <nameExpr expr="(?<=\[|§§).+(?=\]$)"/>
    

    Moreover, contrary to case B, in case A above, the use of a simple lookbehind in mainExpr, below, is NOT allowed, as well. Strange, isn't it !?

    mainExpr="^[\t \f\x0b]*(?<=\[).+?(?=\]$)"
    

    So I think that determination of the right flavour used, will be quite difficult :(


    As an exercise, here's a version that find, BOTH, sections and all the forms name in expressions name=value, if they don't begin with a single or double quote :

    <parser id="ini_section" displayName="INI Section" commentExpr="^[\t \f\x0b]*[#;].*?$">
        <function
            mainExpr="^[\t \f\x0b]*(\[.+?\]|[\w.]+=.+?)$"
            displayMode="$functionName">
            <functionName>
                <nameExpr expr="\[\K.+(?=\]$)|[\w.]+(?==)"/>
            </functionName>
        </function>
    </parser>
    

    Best Regards,

    guy038

     
    Last edit: THEVENOT Guy 2013-12-11

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks