Doing HTML validation with this library
Brought to you by:
derrickoswald
I'm looking at your samples/unit tests, to figure out if HTMLParser can be used to validate if a body of text contains valid html. Is that a use case that you support?
I was looking at some of your unit tests that does this, and it seems to be doing this by counting expected nodes. Like
package org.htmlparser.tests.parserHelperTests;
class StringParserTests
public void testTextBug1() throws ParserException {
createParser("<HTML><HEAD><TITLE>Google</TITLE>");
parser.setNodeFactory (new PrototypicalNodeFactory (true));
parseAndAssertNodeCount(5);
// The fourth node should be a Text- with the text - Google
assertTrue("Fourth node should be a Text",node[3] instanceof Text);
Text stringNode = (Text)node[3];
assertEquals("Text of the Text","Google",stringNode.getText());
}
What I'm looking for, is code samples, or to see if your library can tell if html is valid or not.
// Most html would reject this
String html = "<title>We are blah</title>"
+ "Click Here "
+ ""
+ "";
A simple yea/nay (is this valid), or to maybe showcase the list of exceptions/warnings from the parsing.
I tried doing this, and it didn't detect the errors. Am I doing this wrong? Does the library already support this and I'm missing a sample on how to do this somehow?
btpang
--------
private MyFeedBack parserFeedback = new MyFeedBack();
private List<ParserException> exceptions = new ArrayList<>();
public void parseHtml(String html, String template) throws Exception {
try {
Lexer lexer = new Lexer(html);
Parser parser = new Parser(lexer, parserFeedback);
parser.setEncoding("UTF-8");
parser.parse(null);
} catch(ParserException e) {
exceptions.add(e) ;
}
}
@Getter
class MyFeedBack implements ParserFeedback {
private final List<String> warnings = new ArrayList<>();
private final List<String> errors = new ArrayList<>();
private final List<String> infos = new ArrayList<>();
@Override
public void info(String message) {
infos.add(message);
}
@Override
public void warning(String message) {
warnings.add(message);
}
@Override
public void error(String message, ParserException e) {
errors.add(message);
}
I don't think it would be suitable for your use-case.
The intent was to actually accept everything that could plausibly be understood, and output no warnings or errors if that was the case.
You do realize that the code hasn't been updated in 18 years, right?
So it's behind the evolution of HTML by at least that.
You should probably look elsewhere for a solution to your problem.