Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.


pymlreplace - very slow

  • Good day,

    I started using Python scripts some weeks ago and I really like it. Now my problem:

    The function pymlreplace freaks out when I use it to replace every single \r\n with a long string including some other \r\n, for example with:

    editor.pymlreplace(r'\r\n', '</seg>\r\n</tuv>\r\n</tu>\r\n<tu creationdate=\"20110105T121031Z\" creationid=\"NICOLAS-TM\">\r\n<tuv lang=\"EN-US\">\r\n<seg>')

    If I use that, then only every second line will be taken into account.

    So what I did, since it didn't work, was to split this long stuff into many different editor.pymlreplace like below; it does work, but is than very very slow (about 100 lines per second for each editor.pymlreplace…). In that case, Python is so slow that I would better do it without scripts. Yet, my point is to optimize my workflow and I would like to use Python for scripts…

    editor.pymlreplace(r'\r\n', r'</seg>\r\n')
    editor.pymlreplace(r'(\r\n)', r'\1</tuv>\r\n')
    editor.pymlreplace(r'(</tuv>\r\n)', r'\1</tu>\r\n')
    editor.pymlreplace(r'(</tu>\r\n)', r'\1<tu creationdate=')
    editor.pymlreplace(r'creationdate=', 'creationdate="20110105T121031Z\"')
    editor.pymlreplace(r'({6}Z\")', r'\1 creationid="NICOLAS-TM">\r\n<tuv lang="EN-US">\r\n<seg>')

    Any solution on this? Are there limitations on Python script as for a length of string that can be handled in the replace? Or could I speed up the process by lifting some memories limitations?

    Thank you in advance.

  • There's a new version of PythonScript coming out this weekend, that fixes some bugs with pymlreplace.  That's will fix the "freaking out".  It *should* be a bit quicker too, but you'll have to see how it performs for you.

    The only limitations are the amount of available memory - I believe a Python string can be 2Gb in size, Scintilla would freak way before then!   The current version, due to Python internals makes two copies of the text.  The new version will take one copy.

    Incidentally, it looks like you're replacing XML - you might want to look at XSLT as a quicker and more reliable form for this kind of replacement.

    As for performance, see how you go with the new version (out today or tomorrow), but other than that there's no too much we can do, as the loop is already running at "C" speed.  However, obviously when you're using the fixed version, you'll be able to do it in one call, which will be much faster.