Menu

#3 MMAX2 GUI writes out invalid XML

open
nobody
GUI (2)
5
2011-11-09
2011-11-09
No

In my input "words" XML file, I have some invalid characters (e.g., <, >, etc.), which I deal with by wrapping the text of each word in CDATA tags. MMAX2 reads these in fine; however, if I modify the base data from within MMAX2, the resulting words XML file (created by MMAX2) does not escape these characters or wrap them in CDATA tags. So, when I go to read the words XML file back in again on a subsequent opening of MMA2, it fails with XML parsing errors. The proper way to handle this is to always write out valid XML.

Discussion

  • Matthew Gerber

    Matthew Gerber - 2011-11-09

    I am currently fixing this...looks like there is some simple logic in MAXX2Discourse.java that is supposed to handle this, but it's not correct. It checks whether the entire word is equal to "<", etc., but this doesn't cover cases like "<40". The simplest solution is to use CDATA tags. Will post the patch soon.

     
  • Matthew Gerber

    Matthew Gerber - 2011-11-09

    The diff I created was nasty due to my general editor format being different. Here is the relevant code (very simple):

    fw.write("<word " + currentAttributes.trim() + ">");
    Node childNode = currentWordNode.getFirstChild();
    String childText = childNode.getNodeValue();
    fw.write("<![CDATA[" + childText + "]]></word>\n");

    No need to do all the if-else checking.

     
  • Matthew Gerber

    Matthew Gerber - 2011-11-10
     
  • Matthew Gerber

    Matthew Gerber - 2011-11-10

    I've attached a patch for the fix mentioned below. The patch also includes a change such that the markables are written sorted by markable ID. This is handy when it comes to version control of annotation data.

     

Log in to post a comment.