I have tried to reproduce your problem with the latest source from repository. But i can't see any problem. Please have a look at the attached patch. Maybe there is some error in my test.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
your sample is no CyberNeko. Please strip down the sample to CyberNeko only code. There is not enough time for the developers to figure out, if this is a CyberNeko problem or one from the other libs. Looks like your sample is based on 'owaspantisamy' so the source is available and it is possible (for you) to verify, if the problem is from CyberNeko or introduces by some other libs.
If you can reproduce the problem with only CyberNeko involved, i will try to fix it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have tried to reproduce your problem with the latest source from repository. But i can't see any problem. Please have a look at the attached patch. Maybe there is some error in my test.
Hm, no chance to add a file
### Eclipse Workspace Patch 1.0
#P nekohtml
Index: test/java/org/cyberneko/html/filters/WriterTest.java
===================================================================
--- test/java/org/cyberneko/html/filters/WriterTest.java (revision 294)
+++ test/java/org/cyberneko/html/filters/WriterTest.java (working copy)
@@ -3,6 +3,8 @@
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
+import java.io.StringReader;
+import java.io.StringWriter;
import junit.framework.TestCase;
@@ -18,29 +20,54 @@
*/
public class WriterTest extends TestCase {
- /**
- * Regression test for bug: writer changed attribute value causing NPE in 2nd writer.
- * http://sourceforge.net/support/tracker.php?aid=2815779
- */
- public void testEmptyAttribute() throws Exception {
-
- final String content = "<html><head>"
- + "<meta name='COPYRIGHT' content='SOMEONE' />"
- + "</head><body></body></html>";
- final InputStream inputStream = new ByteArrayInputStream(content.getBytes());
-
- final XMLDocumentFilter[] filters = {
- new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8"),
- new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8")
- };
-
+ /**
+ * Regression test for bug: writer changed attribute value causing NPE in
+ * 2nd writer. http://sourceforge.net/support/tracker.php?aid=2815779
+ */
+ public void testEmptyAttribute() throws Exception {
+
+ final String content = "<html><head>"
+ + "<meta name='COPYRIGHT' content='SOMEONE' />"
+ + "</head><body></body></html>";
+ final InputStream inputStream = new ByteArrayInputStream(
+ content.getBytes());
+
+ final XMLDocumentFilter[] filters = {
+ new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8"),
+ new org.cyberneko.html.filters.Writer(new ByteArrayOutputStream(), "UTF-8") };
+
// create HTML parser
- final XMLParserConfiguration parser = new HTMLConfiguration();
+ final XMLParserConfiguration parser = new HTMLConfiguration();
parser.setProperty("http://cyberneko.org/html/properties/filters", filters);
- XMLInputSource source = new XMLInputSource(null, "currentUrl", null, inputStream, "UTF-8");
-
- parser.parse(source);
+ XMLInputSource source = new XMLInputSource(null, "currentUrl", null, inputStream, "UTF-8");
+
+ parser.parse(source);
inputStream.close();
- }
+ }
+
+ /**
+ * Regression test for bug 3490070
+ * http://sourceforge.net/support/tracker.php?aid=3490070
+ */
+ public void testUnfinishedLIOutput() throws Exception {
+ String string = "<html><head></head>"
+ + "<body>"
+ + "<p>List</p>a<ul>b<li></li>c</ul>"
+ + "</body></html>";
+
+ final XMLParserConfiguration parser = new HTMLConfiguration();
+ final StringReader sr = new StringReader(string);
+ final XMLInputSource in = new XMLInputSource(null, "foo", null, sr, null);
+
+ final StringWriter out = new StringWriter();
+ final XMLDocumentFilter[] filters = { new Writer(out, "UTF-8") };
+ parser.setProperty("http://cyberneko.org/html/properties/filters", filters);
+
+ parser.parse(in);
+
+ assertEquals(
+ "<HTML><HEAD></HEAD><BODY><P>List</P>a<UL>b<LI></LI>c</UL></BODY></HTML>",
+ out.toString());
+ }
}
Hello. Thank you. I tested it with this code:
Policy fragmentPolicy;
fragmentPolicy = Policy.getInstance(<myconfig.xml>);
fragmentPolicy.setDirective(Policy.OMIT_DOCTYPE_DECLARATION, "true");
fragmentPolicy.setDirective(Policy.OMIT_XML_DECLARATION, "true");
AntiSamy scanner = new AntiSamy(fragmentPolicy);
String inputHtml = "<p>List</p>a<ul>b<li></li>c</ul>";
CleanResults cr = scanner.scan(inputHtml);
System.out.println(cr.getCleanHTML());
The output was:
<p>List</p>
a<ul>b<li</li>c</ul>
See the "<li"
Hi laubrino,
your sample is no CyberNeko. Please strip down the sample to CyberNeko only code. There is not enough time for the developers to figure out, if this is a CyberNeko problem or one from the other libs. Looks like your sample is based on 'owaspantisamy' so the source is available and it is possible (for you) to verify, if the problem is from CyberNeko or introduces by some other libs.
If you can reproduce the problem with only CyberNeko involved, i will try to fix it.
Ups, I'm stupid. Sory for that.