There appears to be an error with nested svg tags. Unit test:
@Test
public void testSvg()
{
String html = "<!DOCTYPE html>\n"
+ "<html lang=\"en\">\n"
+ "<head>\n"
+ "</head>\n"
+ "<body itemscope itemtype=\"http://schema.org/WebPage\">\n"
+ "<svg xmlns=\"http://www.w3.org/2000/\">\n"
+ " <svg></svg>\n"
+ "</svg>\n"
+ "</body>\n"
+ "</html>";
new HtmlCleaner().clean(html);
html = "<!DOCTYPE html>\n"
+ "<html lang=\"en\">\n"
+ "<head>\n"
+ "</head>\n"
+ "<body itemscope itemtype=\"http://schema.org/WebPage\">\n"
+ "<svg xmlns=\"http://www.w3.org/2000/svg\">\n"
+ " <circle cx=\"50\" cy=\"50\" r=\"40\" stroke=\"black\" stroke-width=\"3\" fill=\"red\" />\n"
+ "</svg>\n"
+ "</body>\n"
+ "</html>";
new HtmlCleaner().clean(html);
html = "<!DOCTYPE html>\n"
+ "<html lang=\"en\">\n"
+ "<head>\n"
+ "</head>\n"
+ "<body itemscope itemtype=\"http://schema.org/WebPage\">\n"
+ "<svg xmlns=\"http://www.w3.org/2000/svg\">\n"
+ " <svg></svg>\n"
+ "</svg>\n"
+ "</body>\n"
+ "</html>";
new HtmlCleaner().clean(html);
}
The final call to clean() throws an exception because there is a second svg nested inside the first, and the string "svg" can also be found inside the xmlns parameter. Perhaps that string is being picked up by the cleaner and being erroneously added to the stack.
By contrast, the first call succeeds because that string is not in the parameter, and the second call succeeds because there is no second nested svg.
Thanks for the bug report Remi, I'll check it out.
Remi - I just tested this with 2.2.4 and I don't see an exception when I run the test? Which version did you see the error in?
Hi Scott, I can reproduce this on 2.24 (as opposed to 2.2.4)
Sorry, yes I meant 2.24!
I'm testing on Java 11, if that makes a difference.
Aha, it looks like I managed to fix this bug during the refactoring I performed after the 2.24 release. So if you use the current trunk build it will be fine; otherwise it'll be fixed when I make a new release.
If you wouldn't mind, can you confirm that its fixed for you using the current trunk?
Thanks, that's great. Looking forward to the next release!