Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#371 Performance fix for XmlTagsMatcher from v6+

Next_release
closed
Don HO
UI (101)
9
2012-06-10
2012-05-12
No

As reported on the 6.1.2 announcement thread - https://sourceforge.net/projects/notepad-plus/forums/forum/331753/topic/5226372 (and previous 6.0+ announcement threads), the tag matching on large documents is noticably slower. In extreme cases (10-20Mb XML documents), it can take several seconds to move the cursor, as the XML tag matching algorithm must run, and performs reverse regex lookups, which are relatively slow with the new boost regex engine.

This patch is a rewrite of the main tag matching algorithm to use normal searching, and no regular expressions. Performance on a 12Mb document is instant for all tags. with the exception of the root element, where the search itself takes 4-5 seconds (ie. searching the document for </tag> takes a similar amount of time), so there doesn't seem to be any easy solution for that.

The new alogirthm also copes with a couple of extra cases that the old version didn't.

e.g. <tag with="nasty>attribute">text</tag>
According to the XML 1.0 Spec, this is valid XML and the > in the attribute value should be treated as character data.

Tags in CDATA sections are also ignored.

The patch zip file includes the patch itself, and the two affected files for easy integration. I'll be posting a binary to the forums to allow people to test this. I've tested as many cases as I can think of, but this really needs user testing.

Cheers,
Dave.

Discussion

    • assigned_to: nobody --> donho
     
  • Assigning to Don.

     
  • Don HO
    Don HO
    2012-05-19

    Thank you.
    It'll be in the next release.

     
  • Don HO
    Don HO
    2012-05-19

    • priority: 5 --> 9
     
  • Don HO
    Don HO
    2012-06-10

    • status: open --> closed