Re: [Htmlparser-user] Dealing with *repeated* EncodingChangeException
Brought to you by:
derrickoswald
|
From: Antony S. <ant...@gm...> - 2006-03-23 23:38:11
|
Hi Subramanya,
My solution to the same problem is
Parser parser =3D new Parser(urlOb.openConnection());
NodeList nl =3D null;
for (int i =3D 0; i < 4; i++) { // 4 decoding tries max
try {
nl =3D parser.parse(null);
} catch (EncodingChangeException e) {
String s =3D parser.getEncoding(); // use
detected encoding
log.fine("restarting parse with " + s + "
for " + url);
continue;
}
break;
}
If yours is better I'd like to use it. I hope someone who knows
better can tell. I am sort of stuck doing bunch of other things at
this time and not paying attention to this particular issue.
Is mine buggy cause I don't call reset ?
-Antony Sequeira
> Code snippet below:
> ---------------------------------------------------------------------
> private static void IgnoreCharSetChanges(Parser p)
> {
> PrototypicalNodeFactory factory =3D new PrototypicalNodeFactory ();
> factory.unregisterTag(new MetaTag());
> // Unregister meta tag so that char set changes are ignored!
> p.setNodeFactory (factory);
> }
>
> private static String ParseNow(Parser p, MyVisitor visitor) throws org=
.htmlparser.util.ParserException
> {
> try {
> System.out.println("START encoding is " + p.getEncoding());
> p.visitAllNodesWith(visitor);
> }
> catch (org.htmlparser.util.EncodingChangeException e) {
> try {
> System.out.println("Caught you! CURRENT encoding is " + p.get=
Encoding());
> visitor.Init();
> p.reset();
> p.visitAllNodesWith(visitor);
> }
> catch (org.htmlparser.util.EncodingChangeException e2) {
> System.out.println("CURRENT encoding is " + p.getEncoding());
> System.out.println("--- CAUGHT you yet again! IGNORE meta tag=
s now! ---");
> visitor.Init();
> p.reset();
> IgnoreCharSetChanges(p);
> p.visitAllNodesWith(visitor);
> }
> }
> System.out.println("ENCODING IS " + p.getEncoding());
> return p.getEncoding();
> }
> ---------------------------------------------------------------------
>
> If, in future versions of HTMLParser, the MetaTag class starts doing othe=
r
> important things in future besides setting text encoding, then, a new cla=
ss
> could be derived from the existing MetaTag class whose "doSemanticAction(=
)"
> code simply ignores char set changes for "content-type" meta tags and
> calls super.doSemanticAction for others ...
>
> If there are gotchas in this technique, I would appreciate feedback on
> that front too!
>
> Thanks,
>
> Best,
> Subbu.
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting langua=
ge
> that extends applications into web and mobile media. Attend the live webc=
ast
> and join the prime developer group breaking into this new coding territor=
y!
> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat=
=3D121642
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
|