1.does not somehow recognize <PRE> tag, seems to me only predefined tags or something are ok
2.manifestation --> nl.elementAt(i).toPlainTextString(); returns "" , but it should not
2.sample code:
url = null;
url = new URL("http://ec.europa.eu/food/plant/propagation/catalogues/comcat_agri_2008/I.html");
parser = new Parser();
parser.setConnection((HttpURLConnection) url.openConnection());
//**************************************
// temp cure which is bad, but proves that
// contence beetween <PRE> </PRE> does not influence
// the bug -comment out this to see the prob
nl = parser.parse(null);
String hhh = nl.toHtml();
hhh = hhh.replaceAll("<PRE>", "<P>");
hhh = hhh.replaceAll("</PRE>", "</P>");
parser.setInputHTML(hhh);
//**************************************
tnf = new TagNameFilter();
//tnf.setName("PRE");
tnf.setName("P");
nl = parser.parse(tnf);
for(int i =0; i < nl.size(); i++)
{
// in case of original PRE this gets ""
// in case of forced p this gets real contence
String test = nl.elementAt(i).toPlainTextString();
}