Re: [Htmlparser-user] Dealing with *repeated* EncodingChangeException
Brought to you by:
derrickoswald
From: Antony S. <ant...@gm...> - 2006-03-23 23:38:11
|
Hi Subramanya, My solution to the same problem is Parser parser =3D new Parser(urlOb.openConnection()); NodeList nl =3D null; for (int i =3D 0; i < 4; i++) { // 4 decoding tries max try { nl =3D parser.parse(null); } catch (EncodingChangeException e) { String s =3D parser.getEncoding(); // use detected encoding log.fine("restarting parse with " + s + " for " + url); continue; } break; } If yours is better I'd like to use it. I hope someone who knows better can tell. I am sort of stuck doing bunch of other things at this time and not paying attention to this particular issue. Is mine buggy cause I don't call reset ? -Antony Sequeira > Code snippet below: > --------------------------------------------------------------------- > private static void IgnoreCharSetChanges(Parser p) > { > PrototypicalNodeFactory factory =3D new PrototypicalNodeFactory (); > factory.unregisterTag(new MetaTag()); > // Unregister meta tag so that char set changes are ignored! > p.setNodeFactory (factory); > } > > private static String ParseNow(Parser p, MyVisitor visitor) throws org= .htmlparser.util.ParserException > { > try { > System.out.println("START encoding is " + p.getEncoding()); > p.visitAllNodesWith(visitor); > } > catch (org.htmlparser.util.EncodingChangeException e) { > try { > System.out.println("Caught you! CURRENT encoding is " + p.get= Encoding()); > visitor.Init(); > p.reset(); > p.visitAllNodesWith(visitor); > } > catch (org.htmlparser.util.EncodingChangeException e2) { > System.out.println("CURRENT encoding is " + p.getEncoding()); > System.out.println("--- CAUGHT you yet again! IGNORE meta tag= s now! ---"); > visitor.Init(); > p.reset(); > IgnoreCharSetChanges(p); > p.visitAllNodesWith(visitor); > } > } > System.out.println("ENCODING IS " + p.getEncoding()); > return p.getEncoding(); > } > --------------------------------------------------------------------- > > If, in future versions of HTMLParser, the MetaTag class starts doing othe= r > important things in future besides setting text encoding, then, a new cla= ss > could be derived from the existing MetaTag class whose "doSemanticAction(= )" > code simply ignores char set changes for "content-type" meta tags and > calls super.doSemanticAction for others ... > > If there are gotchas in this technique, I would appreciate feedback on > that front too! > > Thanks, > > Best, > Subbu. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |