Hi Scott, Thanks very much for the speedy investigation! Remi
Hi, I've found an HTML page which results in "INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified." when I try to use HtmlCleaner to create a DOM out of it. I've attached said webpage. Drilling down in the DomSerializer, there is a call to setAttribute with the attrName "dispariție.", and I imagine that accent might be causing the issue. Should it not be getting sanitised before that? Let me know if this is an invalid input or not. Thanks
Hi Martin, Thanks a lot for patching this. I now get expected behaviour! Kind regards, Remi
Hi, I've noticed that the Jericho Renderer doesn't include Button elements in its toString(). This is presumably because button is mapped to a RemoveElementHandler in Renderer. I would be interested to hear the rationale behind this, but more importantly, is there a way to override this behaviour on my end? You can reproduce with something as simple as: <html><body><button>My Button</button></body></html> Which will result in an empty string. Many thanks
Hi, I've noticed that the Jericho Renderer doesn't include Button elements in its toString(). This is presumably because button is mapped to a RemoveElementHandler in Renderer. I would be interested to hear the rationale behind this, but more importantly, is there a way to override this behaviour on my end? You can reproduce with something as simple as: <button>My Button</button> Which will result in an empty string. Many thanks
Running out of memory cleaning HTML
Thanks for clearing this up, Martin!
Query parameter names in hyperlinks being incorrectly decoded