I am trying to use HTML parser to convert a HTML files to XML files,so I need to obtain contents of some tags(ie.tittle,href,..),If anyone have some documets ralated to this,send it to me please ? I have no idea how to use HTML parser to handle this ?tanks.
I am also working on something similar to ours. i need to extract some tags( td, anchor...).did u find any effective way to do that using HTML parser.i did the following for extracting see if this works for you
Parser parser = new Parser("http://");
NodeList nl = parser.parse(null);
NodeList a1 = nl.extractAllNodesThatMatch(new TagNameFilter("td"),true);//extracting tags by name "TD"//
TagNode tag= (TagNode)a1.elementAt(i);
tit=tag.toPlainTextString();// extracts the text into a string
System.out.println("td is " + tit);
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.