Menu

Intricacies of Regex Replace via Python Script

Help
2020-05-31
2020-06-04
  • mazeckenrode

    mazeckenrode - 2020-05-31

    New to Python Script (and Python itself) but been using Notepad++ for about 10 years now, and regular expressions (in mostly basic but occasionally semi-advanced ways) for about 4 years. I’ve recorded a few regex-replace macros in Notepad++, using the built-in macro record/play functionality, but as everybody here no doubt knows, they are absolute bears to understand and edit, so I looked for other options and thought I’d give Python Script a try. But I’m running into problems, even though I don’t think I’ve been trying anything particularly complicated. Example:

    I am often exporting folder content lists as text from my file manager of choice, Directory Opus, often for folders populated by multiple images (jpg, png, etc.). DOpus lets me opt to print additional information about the files, and I choose to have it print file size and resolution, which are separated from the filename and from each other by tabs:

    Test.png 398 740 x 2065 x 1

    For my usual purposes, I want the file list entries to look like this:

    “Test.png” (398) [740 x 2065 x 1]

    Using NPP’s built-in replace dialog, I’ve successfully accomplished what I wanted with the following strings:

    Find: ^(.*)?\.(jpg|jpeg|png)\t(.*)\t(.*)

    Replace: “\1.\2” \(\3\) [\4]

    When I recorded that using NPP’s built-in macro recorder, it recorded the replace string as follows:

    “\1.\2” \(\3\) [\4]

    Now trying to translate the above strings for use via Python Script. The sample script “Python Regex Replacements.py” that came with Python Script contains the following example code:

    editor.rereplace(r"([A-Z]{3})\1", r"\1")

    While the Python Script help does include the following example in the Introduction section:

    editor.pyreplace(r"^Code: ([A-Z]{4,8})", r"The code is \1")

    …in Editor Object > Helper Methods, it states the following:

    Editor.pyreplace(search, replace[, count[, flags[, startLine[, endLine]]]]) This method has been removed from version 1.0. It was last present in version 0.9.2.0

    In any case, I tried translating my own find and replace strings for Python Script use as follows:

    editor.rereplace(r"^(.*)?\.(jpg|jpeg|png)\t(.*)\t(.*)",r" “\1.\2” \(\3\) [\4]")

    That gave me the following result:

    “Test.png” (398) [740 x 2065 x 1]

    I also tried it WITHOUT the ‘r’ preceding the opening double-quote marks, with result as shown:

    “SOH.STX” (ETX) [EOT]

    (SOH/STX/ETX/EOT being control characters represented in NPP by those letter combos in white, on a black background.)

    Note that most of the text I’m processing is plain ANSI, and typographical single- and double-quote marks are included in the ANSI character set. Nevertheless, I did find reference in the Python Script help to using ‘u’ to designate unicode strings, and I’d like my script to still work for the occasional unicode textfile, so I tried this code:

    editor.rereplace(u"^(.*)?\.(jpg|jpeg|png)\t(.*)\t(.*)",u" “\1.\2” \(\3\) [\4]")

    Result:

    “SOH.STX” (ETX) [EOT]

    So I finally got the typographical quote marks to come through, but the filename, size and resolution are still bungled. What am I doing wrong?

    .

     

    Last edit: mazeckenrode 2020-05-31
    • Sasumner

      Sasumner - 2020-06-03

      I'm pretty sure you'll get some good help if you post your question on the "Community" site.
      The spam protection wouldn't let me put the link here.
      Look on Notepad++'s "question mark" menu for "Community"

       
      • mazeckenrode

        mazeckenrode - 2020-06-04

        Ok, thanks, I’ll try there.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.