Menu

#10 Strange behaviour handing comment and CDATA sections

v0.1.3
open
Lexer (4)
7
2014-08-14
2005-07-13
No

Hi.

I was trying to use Hotsax but I noticed some strange
behaviour when it came to handling of <!-- comments -->
and <![CDATA[]]> sections.

As a test I wrote a very simple ContentHandler &
LexicalHandler (attached) which just output the SAX
event sequence to sysout.

Test 1:

Input:
<x>
Text
<!-- comment -->
</x>

Resulting event sequence:
startDocument
startElement: [, x, ]
characters: [ Text ]
comment: [ comment ]
characters: [ Text ]
endElement: [, x, ]
endDocument

As you can see, the characters " Text " are getting
fired twice.

Test 2:

Input:
<x>
Text
<![CDATA[Cdata]]>
</x>

Resulting event sequence:
startDocument
startElement: [, x, ]
characters: [ Text ]
startCDATA
characters: [ Text ]
endElement: [, x, ]
endDocument

Again, very similar but with the added problem of none
of the CDATA data being fired at the content handler.

Discussion

  • Tom Fennelly

    Tom Fennelly - 2005-07-13

    Parser code - includes the Content/Lexical Handler implementation.

     
  • Tom Fennelly

    Tom Fennelly - 2005-07-13
    • priority: 5 --> 7
     
  • Simon Massey

    Simon Massey - 2012-01-05

    you have a carriage return before your end tag.

    if you tried:

    <x>
    Text
    <!-- comment --></x>

    <x>
    Text
    <![CDATA[Cdata]]></x>

    then you don't get the problem which is shown by:

    @Test
    public void testSimpleComment() throws Exception {
    String html = "<x>\n" +
    "Text\n" +
    "<!-- comment -->\n" +
    "</x>";

    final List<String> text = new ArrayList<String>();

    ContentHandler ch = new DefaultContentHandler(){

    @Override
    public void characters(char[] ch, int start, int length)
    throws SAXException {
    String t = new String(ch, start, length);
    text.add(t);
    }

    };

    parser.setContentHandler(ch);

    InputSource input = new InputSource(new StringReader(html));

    parser.parse(input);

    Assert.assertThat(text.size(), is (2));
    Assert.assertThat(text.get(0), is("\nText\n") );
    Assert.assertThat(text.get(1), is("\n") );

    }

    on the version of 0.1.2b which I am running.

     

Log in to post a comment.

MongoDB Logo MongoDB