Hi Andy. Thanks for reporting the issue. I see what you mean about sourceforge. I just noticed they removed all of the documentation from my project's website a couple of months ago without notification. I just fixed that. But no I don't have any intention of moving the project to github at this point in time. Firstly, you might like to try using the latest DEV version 3.5. There have been a few improvements and bug fixes to the Renderer class. You can download it here: http://jericho.htmlparser.net/temp/jericho-html-3.5-dev.zip...
Page Broken (500): https://sourceforge.net/p/jerichohtml/discussion/350025/moderate/save_moderation
Hi Ethan, Thank you for the suggestion. Yes I got a request for this already last year: https://sourceforge.net/p/jerichohtml/bugs/93/ The biggest barrier at the moment is the fact that I implemented a new major feature a few years ago (a web crawler API) but it remains poorly documented, and could probably use a couple of minor enhancements before it is officially released. That means all bug fixes since then have just gone into the DEV release: http://jericho.htmlparser.net/temp/jericho-html-3.5-dev.zip...
Typo in StreamEncodingDetector.isDifinitive
Haha thanks!
Hi Samuel. Thank you for doing all of this. Unfortunately this library is way down my priority list these days. I still use it in heaps of projects, and it still works well, and I still fix an occasional reported bug, but I haven't done an official release for years, and I can't see myself getting to it in the near future. When I do eventually release a new version, I will definitely incorporate your suggestions. Cheers Martin
P.S. When you want to include HTML in your post, you need to enclose it in a code block, otherwise the HTML is parsed and doesn't show properly. For example, your sample document should look like this: <html> <head> <meta http-equiv="Content-Type" content="html; charset=UTF-8"> </head> </html>
Hi Davy, The sample HTML you are feeding it doesn't specify a valid encoding and is therefore parsed correctly. Because the quotes are encoded, they are included in the value of the content attribute, which is why the end quote is interpreted as part of the encoding name. You say that the sample content occurs when it is "inserted into an iframe". I assume you mean it appears as the value of the iframe srcdoc attribute. In that case, your sample document should be the HTML containing the iframe,...