Python function list parser

Artfunkel
2013-12-12
2015-02-02
  • Artfunkel

    Artfunkel - 2013-12-12

    This took ages due to a total lack of debug output and deficient documentation (the result of classRange's mainExpr needs to be the WHOLE class block), but it's working now.

    <parser id="py_function" displayName="Python class" commentExpr="(#.*?$|'''.*?('''|\Z))">
        <classRange mainExpr="(?<=^class ).*?(?=\n\S|\Z)">
            <className>
                <nameExpr expr="\w+(?=[\(|:])"/>
            </className>
            <function mainExpr="(?<=def ).+?(?=:)">
                <functionName>
                    <funcNameExpr expr=".*"/>
                </functionName>
            </function>
        </classRange>
        <function mainExpr="(?<=def ).+?(?=:)">
            <functionName>
                <funcNameExpr expr=".*"/>
            </functionName>
        </function>
    </parser>
    

    The function list doesn't do nested classes, but otherwise the parser above should pick up everything!

    Insert the element into %appdata%\Notepad++\functionList.xml, then add this line to associationMap to register the association:

    <association langID="22" id="py_function"/>
    
     
  • svenn

    svenn - 2014-01-29

    Thanks!

     
  • mitch rosefelt

    mitch rosefelt - 2014-02-16

    Thanks a million!
    Note. On Win7 x64, there is FunctionList.xml in "C:\Program Files (x86)\Notepad++" and also in "...AppDataRoaming\Notepad++". Placing the code in the version in "Program Files..". did not work, but Roaming did.

     
  • Nutznieser

    Nutznieser - 2014-07-28

    There exist a problem with python classes without functions.
    You can test it with the original python zipfile modul. Hopefully someone have a solution of that problem.

    Here is a slice of the zipfile modul. Follow classes will be ignored.

    class BadZipfile(Exception):
        pass
    
    class LargeZipFile(Exception):
        """
        Raised when writing a zipfile, the zipfile requires ZIP64 extensions
        and those extensions are disabled.
        """
    
     
  • itraveller

    itraveller - 2015-01-21

    Hi all,

    Here's a version of the parser configuration for Python, which includes:
    1) Strings is added to comments
    2) Detection of empty and nested classes
    3) Ability to go to the class header

    <parser id="python_function" displayName="Python" commentExpr="#.*?$|(('|\x22){3}).*?(\1|\Z)|('|\x22).*?\4">
    <!-- "comments|doc strings|strings" -->
        <classRange mainExpr="^class\K\h.+?(?=\n(def|class)\h|\Z)">
        <!-- class ranges are limited via first level headers -->
            <className>
                <nameExpr expr="\h+\K.+?(?=:)"/>
            </className>
            <function mainExpr="\A\h*\K\h\w+|^\h+def\h+\K.+?(?=:)|^\h+\Kclass\h.+?(?=:)">
            <!-- "class link|nested functions|nested classes" -->
            <!-- leading space is left to place the class link at the top of the list -->
                <functionName>
                    <funcNameExpr expr=".+"/>
                </functionName>
            </function>
        </classRange>
        <function mainExpr="^\h*def\h+\K.+?(?=:)|^\h+\Kclass\h.+?(?=:)">
        <!-- upper and nested functions|nested classes -->
            <functionName>
                <nameExpr expr=".+"/>
            </functionName>
        </function>
    </parser>
    

    Issues from the parser executable code:
    1) Comments are ignored only for 'classRange'.
    2) Only the top-level classes are separated by individual lists.

    I hope this version will be useful.

     
    Last edit: itraveller 2015-02-03
  • itraveller

    itraveller - 2015-01-21

    Here's another variant with displaying of the nesting structure:

    <parser id="python_function" displayName="Python" commentExpr="#.*?$|(('|\x22){3}).*?(\1|\Z)|('|\x22).*?\4"> <!-- "comments|doc strings|strings" -->
        <classRange mainExpr="^class\h+\K.+?(?=\n(def|class)\h|\Z)"> <!-- class ranges are limited via first level headers -->
            <className>
                <nameExpr expr="\w+"/>
            </className>
            <function mainExpr="\A\w.*?(?=:)|^\h+(def|class)\h.+?(?=:)"> <!-- "class link|nested functions and classes" -->
                <functionName>
                    <funcNameExpr expr=".+"/>
                </functionName>
            </function>
        </classRange>
        <function mainExpr="^\h*(def|class)\h.+?(?=:)"> <!-- upper and nested functions and classes -->
            <functionName>
                <nameExpr expr=".+"/>
            </functionName>
        </function>
    </parser>
    
     
    Last edit: itraveller 2015-02-03
  • THEVENOT Guy

    THEVENOT Guy - 2015-02-01

    Hello itraveller,

    I don't know Python language enough to give you reliable informations on your regexes relative to classes and functions

    However, concerning the regex, used for comments detection, I found out an other regex, which seems to give better results than your regex :-)


    First of all, I would like to point out a particularity of alternatives, in regular expressions :

    Consider the subject string, below :

    '''1234567890 ABCDEFGHIJK'''....'12345'
    

    And note the different behaviour of the two regexes :

    • '.*?'|'''.*?'''

    • '''.*?'''|'.*?'

    Obviously, the second one have the right behaviour !

    A second example :

    Let's suppose the simple text ABCDEF ABC DEF and the two regexes :

    • ABC|DEF|ABCDEF

    • ABCDEF|ABC|DEF

    The second regex is the only one that correctly find the 3 strings ABCDEF, ABC and DEF

    So, from these examples, we deduce that when an alternative is part of an other alternative, it must be located last. Indeed, as the regex engine will consider the first alternative, from left to right, which can have a positive match, the longest string ABCDEF must be placed first, to be detected !


    Secondly, you know that, in Python language, the two forms '''......''' and """......""" may, also, lie on several lines

    With the PCRE version of the regex engine, included since the v6.0 of Notepad++, we have the possibility to use the four modifiers (?i), (?m), (?s) and (?x) and their negative form (?-i), (?-m), (?-s) and (?-x)

    At present time, we'll just consider the (?s) modifier ( Single line ). Let's suppose we want to match the longest string between a 0 digit and a 9 digit. If, in addition,

    • the digits 0 and 9 must be on a SAME line, we'll use the regex 0.*9 or the regex
      (?-s)0.*9

    • the digits 0 and 9 may be on DIFFERENT lines, we'll use the regex (?s)0.*9


    Now, here is my regex for Python comments, which should be easy enough to understand, from the previous hints given !

    (?-s)#.*|(?s)(('|\x22){3}).*?(\1|\Z)|(?-s)('|\x22).*?\4

    This regex contains 3 alternatives :

    • (?-s)#.* matches any UNI-line sequence of characters, beginning with the sharp character ( # )

    • (?s)(('|\x22){3}).*?(\1|\Z) matches :

      • Any MULTI or UNI-line sequence or characters, enclosed between 3 simple or double quotes

      • Any MULTI or UNI-line sequence or characters between 3 simple or double quotes and the END of the current file

    • (?-s)('|\x22).*?\4 matches any UNI-line sequence of characters, enclosed between simple or double quotes

    Below, an test example to note the differences of behaviour, between my regex and yours :

    --------------------------------------------
    
    The 3 following cases are NOT matched
    
    "ABCDE
    GHIJK"
    
    'ABCDE
    GHIJK'
    
    ""ABCDEFGHIJK""
    ''ABCDEFGHIJK''
    
    --------------------------------------------
    
    ALL the following cases are MATCHED
    
    '''ABCDEFGHIJK'''
    """ABCDEFGHIJK"""
    
    '''ABCDEF#123456789GHIJK'''
    """ABCDEF#123456789GHIJK"""
    
    'ABCDEFGHIJK'
    "ABCDEFGHIJK"
    
    'ABCDEF#123456789GHIJK'
    "ABCDEF#123456789GHIJK"
    
    #'''ABCDEFGHIJK'''
    #"""ABCDEFGHIJK"""
    
    #'ABCDEFGHIJK'
    #"ABCDEFGHIJK"
    
    ''''''
    """"""
    
    '''
    '''
    
    """
    """
    
    '''ABC
    DEFG
    HIJK'''
    
    """ABC
    DEFG
    HIJK"""
    
    '''ABC#123
    456789DEFG
    HIJK'''
    
    """ABC#123
    456789DEFG
    HIJK"""
    
    This LAST case, at the END of the file
    
    '''ABC#123
    456789DEFG
    HIJK
    

    NOTE :

    For testing my regex, directly in Notepad++, I preferred to change the form &quot; into the form \x22, which has the advantage to be accepted, both, by N++ and in the functionList.xml file :-)

    So, your regex can be rewritten, as below :

    #.*?$|('|\x22).*?(\1)|((\1){3}).*?((\2)|/Z)

    Best Regards

    guy038

     
    Last edit: THEVENOT Guy 2015-02-01
    • itraveller

      itraveller - 2015-02-02

      Hi Guy038,

      Thank you for your valuable detailed comments!
      Unfortunately my hasty tests have not given a reason to pay attention to the such obvious mistakes:

      • Search nested sequence of fragments - it's just a classic of the genre! :\

      • As for the multi-line mode, then for some reason I was sure that it's enabled by default. Apparently I just moved it from a search string of the program itself. This is especially important remark for me.

      • Hexadecimal characters are really very useful for a parallel debugging. Thank you for the experience!

      Now everything fell into place. Once again, thank you very much!
      I introduced the necessary changes in my previous posts.

      Kind regards!

      itraveller