Menu

#133 wrong parsing

closed-invalid
nobody
scanner (58)
5
2012-03-08
2012-02-21
M_D
No

Hello.
There is unfinished "<li" in output when parsing input:

input: <p>List</p>a<ul>b<li></li>c</ul>
output: <p>List</p>\n <p>a</p>\n <ul>b <li</li> c</ul>

Discussion

  • RBRi

    RBRi - 2012-03-06

    I have tried to reproduce your problem with the latest source from repository. But i can't see any problem. Please have a look at the attached patch. Maybe there is some error in my test.

     
  • RBRi

    RBRi - 2012-03-06

    Hm, no chance to add a file

    ### Eclipse Workspace Patch 1.0
    #P nekohtml
    Index: test/java/org/cyberneko/html/filters/WriterTest.java
    ===================================================================
    --- test/java/org/cyberneko/html/filters/WriterTest.java (revision 294)
    +++ test/java/org/cyberneko/html/filters/WriterTest.java (working copy)
    @@ -3,6 +3,8 @@
    import java.io.ByteArrayInputStream;
    import java.io.ByteArrayOutputStream;
    import java.io.InputStream;
    +import java.io.StringReader;
    +import java.io.StringWriter;

    import junit.framework.TestCase;

    @@ -18,29 +20,54 @@
    */
    public class WriterTest extends TestCase {

    - /**
    - * Regression test for bug: writer changed attribute value causing NPE in 2nd writer.
    - * http://sourceforge.net/support/tracker.php?aid=2815779
    - */
    - public void testEmptyAttribute() throws Exception {
    -
    - final String content = "<html><head>"
    - + "<meta name='COPYRIGHT' content='SOMEONE' />"
    - + "</head><body></body></html>";
    - final InputStream inputStream = new ByteArrayInputStream(content.getBytes());
    -
    - final XMLDocumentFilter[] filters = {
    - new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8"),
    - new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8")
    - };
    -
    + /**
    + * Regression test for bug: writer changed attribute value causing NPE in
    + * 2nd writer. http://sourceforge.net/support/tracker.php?aid=2815779
    + */
    + public void testEmptyAttribute() throws Exception {
    +
    + final String content = "<html><head>"
    + + "<meta name='COPYRIGHT' content='SOMEONE' />"
    + + "</head><body></body></html>";
    + final InputStream inputStream = new ByteArrayInputStream(
    + content.getBytes());
    +
    + final XMLDocumentFilter[] filters = {
    + new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8"),
    + new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8") };
    +
    // create HTML parser
    - final XMLParserConfiguration parser = new HTMLConfiguration();
    + final XMLParserConfiguration parser = new HTMLConfiguration();
    parser.setProperty("http://cyberneko.org/html/properties/filters", filters);

    - XMLInputSource source = new XMLInputSource(null, "currentUrl", null, inputStream, "UTF-8");
    -
    - parser.parse(source);
    + XMLInputSource source = new XMLInputSource(null, "currentUrl", null, inputStream, "UTF-8");
    +
    + parser.parse(source);
    inputStream.close();
    - }
    + }
    +
    + /**
    + * Regression test for bug 3490070
    + * http://sourceforge.net/support/tracker.php?aid=3490070
    + */
    + public void testUnfinishedLIOutput() throws Exception {
    + String string = "<html><head></head>"
    + + "<body>"
    + + "<p>List</p>a<ul>b<li></li>c</ul>"
    + + "</body></html>";
    +
    + final XMLParserConfiguration parser = new HTMLConfiguration();
    + final StringReader sr = new StringReader(string);
    + final XMLInputSource in = new XMLInputSource(null, "foo", null, sr, null);
    +
    + final StringWriter out = new StringWriter();
    + final XMLDocumentFilter[] filters = { new Writer(out, "UTF-8") };
    + parser.setProperty("http://cyberneko.org/html/properties/filters", filters);
    +
    + parser.parse(in);
    +
    + assertEquals(
    + "<HTML><HEAD></HEAD><BODY><P>List</P>a<UL>b<LI></LI>c</UL></BODY></HTML>",
    + out.toString());
    + }
    }

     
  • M_D

    M_D - 2012-03-08

    Hello. Thank you. I tested it with this code:

    Policy fragmentPolicy;
    fragmentPolicy = Policy.getInstance(<myconfig.xml>);
    fragmentPolicy.setDirective(Policy.OMIT_DOCTYPE_DECLARATION, "true");
    fragmentPolicy.setDirective(Policy.OMIT_XML_DECLARATION, "true");

    AntiSamy scanner = new AntiSamy(fragmentPolicy);

    String inputHtml = "<p>List</p>a<ul>b<li></li>c</ul>";
    CleanResults cr = scanner.scan(inputHtml);

    System.out.println(cr.getCleanHTML());

    The output was:
    <p>List</p>
    a<ul>b<li</li>c</ul>

    See the "<li"

     
  • RBRi

    RBRi - 2012-03-08

    Hi laubrino,

    your sample is no CyberNeko. Please strip down the sample to CyberNeko only code. There is not enough time for the developers to figure out, if this is a CyberNeko problem or one from the other libs. Looks like your sample is based on 'owaspantisamy' so the source is available and it is possible (for you) to verify, if the problem is from CyberNeko or introduces by some other libs.
    If you can reproduce the problem with only CyberNeko involved, i will try to fix it.

     
  • M_D

    M_D - 2012-03-08
    • status: open --> closed-invalid
     
  • M_D

    M_D - 2012-03-08

    Ups, I'm stupid. Sory for that.

     

Log in to post a comment.