From: Michelle H. <cs...@us...> - 2008-04-12 03:45:27
|
Here is the font problem. I use the parser to parse https://www.google.com/accounts/ServiceLoginBox?service=analytics&nui=1&hl=en&continue=http://www.google.com/analytics/home/%3Fet%3Dreset%26hl%3D. I get the text node "Sign in to Google Analytics with your", in the HTML code I see <font size="-1">Sign in to Google Analytics with your</font> But when I textnode.getParentNode().getNodeName(), I got TR which is the parent of font. Would you please help to check? Regards, Michelle On Sat, Apr 12, 2008 at 1:14 AM, Andy Clark <an...@cy...> wrote: > Michelle Hong wrote: > > I am quite interested in your neko parser. I now have serveral questions: > > > > 1. How do you handle the font and span? I try to parse a document use > DOMParser parser = new DOMParser(); > > parser.setFeature > > ("http://cyberneko.org/html/features/scanner/script/strip- > > comment-delims", true); > > parser.setFeature > > ("http://cyberneko.org/html/features/scanner/ignore- > > specified-charset",false); > > parser.setProperty > > ("http://cyberneko.org/html/properties/default- > > encoding", "UTF-8"); > > > > And I found that that after the parse. The font tag is going. Can I keep > all the tags? > > > > Are you saying that <font> tags are gone after > you parse a document? This should not happen. Can > you send a small sample document that demonstrates > the problem? > > > > 2. About the encoding. I find a popular hong Kong webpage which declare > the encoding is big5 however they use utf-8 in practice. Can you handle this > problem? > > > > If you don't want the parser to switch the > encoding when it finds a <meta> tag with a charset, > then you should use a Reader object to parse the > document. When you do this, you are responsible > for picking the correct encoding for reading. For > example: > > InputStream stream = new FileInputStream("index.html"); > Reader reader = new InputStreamReader(stream, "big5"); > > InputSource source = new InputSource(reader); > parser.parse(source); > > > > Thank you very much for your help. > > > > You're welcome. > > If you are going to have more questions about > NekoHTML, please send them to the mailing list > (nek...@li...) so that > everyone has a chance to answer (and learn from) > your questions. > > -AndyC > |