Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

pymlreplace Workarounds

Help
JesterEE
2011-03-11
2013-01-25
  • JesterEE
    JesterEE
    2011-03-11

    Hello all,

    I was making a script today that will remove all the comment lines from a source code file.  I know this is silly, but so is the compiler that reports errors with line numbers that don't include comment lines so I can't trace them.  I got it to work, but I had to do some workarounds that I wouldn't normally need to do in strict Python regex.  Here's my code:

    # Non-greedy deletion of comment lines
    # /* ... */   or   //
    # Tokens must start (and end) on a newline
    import re
    repl_str = "PYSCRIPT_ML_REPLACE_STRING"
    editor.beginUndoAction()
    editor.pymlreplace(r'(^/\*.*?^\*/)', repl_str, 0, re.DOTALL)
    editor.pyreplace(r'^' + repl_str + r'\r{0,1}?$\n', '', 0, Editor.INCLUDELINEENDINGS)
    editor.pyreplace(r'^//.*?\r{0,1}?$\n', '', 0, Editor.INCLUDELINEENDINGS)
    notepad.save()
    editor.endUndoAction()
    # Doesn't work - but this is a simpler and quicker operation
    # editor.pymlreplace(r'(^/\*.*?^\*/|^//.*?\r{0,1}?$)', '', 0, re.DOTALL)
    

    As you can see for the multiline replace, I need to put in something in place of my block comment and then go back on a second pass and remove that from the file.  Because of this, I also need to remove the single line comments on yet a third replace command.  I could probably condense the 2nd and 3rd replace, but I would rather just get rid of all 3, and go with the one I have commented.  I have 2 issues with that pymlreplace line:
    1. For some reason it doesn't catch all the lines that start with //, just a majority of them.
    2. Replacing with an empty character  with yield a NUL character in NPP when used in pymlreplace.  Further, NPP will go nuts inserting NUL characters everywhere and tie up the memory and CPU.  I could understand that, but the pyreplace command doesn't do the same thing with the empty character string, so I'm just confused.

    Any ideas would be greatly appreciated,
    -JesterEE

    PS. Here is some test cases

    // 
    // Comment Line
    //
    //
    var a; // Comment but not removed
    /*
    Block Comment Line 2
    Block Comment Line 3
    Block Comment Line 4
    Block Comment Line 6
    */
    //                      One more for good measure
    
     
  • Hi,

    Whoa.  Thanks for reporting this.  This was a bug in pymlreplace() (actually, 2 bugs), that caused the offsets to be invalid, and potentially caused the never-ending NUL character that you saw.

    I've fixed the bugs, and uploaded a testing version to http://github.com/davegb3/PythonScript/downloads  (0.9.0.2RC). 

    Your regex needs to be tweaked a little bit to get it to do exactly what you want, I think.  I'm not quite sure what "\r{0,1}?" means - that just says "\r maybe maybe" doesn't it? "\r?" should do fine.

    As $ is a non-capturing element, i.e. it sees the \n but doesn't "eat" it, you need a \n after the $ to ensure that the \n gets removed as well.  You then need a ? after the \n to cope with the last line, which obviously doesn't have a \r or \n on the end of it, but does have a "$" - as it's the end of the "string".  

    Hope that makes sense :)

    For me,

    editor.pymlreplace(r'(^/\*.*?^\*/|^//.*?\r?$\n?)', '', 0, re.DOTALL)
    

    works fine in this testing version.  I'd be really grateful if you could check this out and confirm it works for you.

    Many many thanks for reporting this, and sorry you got caught by a bug.  I'll add your sample as a unit test, then it shouldn't come up again.

    Cheers,
    Dave.

     
  • JesterEE
    JesterEE
    2011-03-12

    Dave,

    Thanks for looking into this, and I'm glad I can help the project.  I really like Python, and if I could use 1 dynamic programming language EVERYWHERE, I would.  You help me get one step closer to my dream and for that I thank you. :)

    Ok, so I just tried your version 0.9.0.2RC and I am still getting the same results with the NUL characters and having to force close NP++.  I'm not sure what else I can tell you to help debug further, so if there is something specific, please let me know.

    Here are my system specs.
    Windows XP SP3 in a VMWare Player Virtual Machine
    Notepad++ V5.8.7 Unicode

    I will try this on a Windows 7 Host PC later this weekend.

    I also took your above recommendations, tweaked a little on my own, and modified the regex with this:

    editor.pymlreplace(r'(^/\*.*?^\*/\s*\r?$\n?|^//.*?\r?$\n?)', '', 0, re.DOTALL)
    

    As a side note, I would recommend adding a blurb to documentation for the Editor.pysearch and Editor.pyreplace on preferred syntax style for compound regular expressions.  Something along the lines of all regex must be contained in a string container as show above.  If I do this instead (which someone might think to be legal by Python syntax) it will not do anything:

    editor.pymlreplace('^/\*.*?^\*/|^/\#.*?^\#/', repl_str, 0, re.DOTALL) # Will Work
    editor.pymlreplace('^/\*.*?^\*/' | '^/\#.*?^\#/', repl_str, 0, re.DOTALL) # Does Nothing
    

    -JesterEE

     
  • JesterEE
    JesterEE
    2011-03-12

    On my side note, the later example will error in the Console as strings don't have an OR (|) an or method.  So, someone debugging will see the error, but it's not apparent when you first dive in.

    -JesterEE

     
  • Hum… odd.  Your modified regex works perfectly for me (and correctly removes the extra line break).

    Just a stupid question, can you double check the about box says 0.9.0.2?  Assuming it does, could you test it on your example code in your original post?   If that works, maybe try to dig out a section from your real code that doesn't?

    I've tried a few combinations, and they all work for me with the new version. 

    Many thanks,
    Dave.

     
  • JesterEE
    JesterEE
    2011-03-12

    I feel silly … it does indeed work with the updated version.  It looks like I had a redundant installation in my directory and I replaced the wrong PythonScript.dll.  All fixed now!

    Thanks Dave!
    -JesterEE

     
  • No worries.  Good news it's fixed.  I'll see what else I can double check and then release it.

    Thanks again,
    Dave.