[Htmlparser-user] Re :Re: Tag Nodes not getting recognized...Please Help

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

&nbsp;&nbsp;&nbsp;&nbsp; Thanks a ton Derrick, for your message,&nbsp;your help is&nbsp;highly appreciable. I have tried earlier using parser.setEncoding(\"UTF-8\"), but it was also not working as expected. Today I have tried getting the content of the file in a string using, parser.setInputHTML(getContentsAsString(testFile)). But it also did not work. 
The only way it worked is that, if I open the HTML file outside in TextPad and saved it again with Encoding \'ANSI\', and then running my code with this new file.
Could you please suggest a way that I can do the above using htmlParser or by any other means? I tried reading the file a line at a time and using the following for the conversion.byte[] stringBytesUTF = line.getBytes(\"UTF-8\");ansiString = new String(stringBytesUTF, \"ANSI\")But it seems ANSI is not a valid argument. Any advice in this respect is highly valueble to me.
Thanking You,Kumar.On Sat, 28 Jul 2007 13:48:36 -0700 (PDT) htmlparser user list wroteIt appears the file is unicode, probably UTF-8, so you\'ll need to get the contents as a string yourself, or try parser.setEncoding (\"UTF-8\") before performing the parse. Some operating systems support a bye order mask prefix (like 0xFEFF) within the file to identify such files as other than plain ascii.----- Original Message ----From: k To: htm...@li...: Saturday, July 28, 2007 8:12:19 AMSubject: [Htmlparser-user] Tag Nodes not getting recognized...Please HelpHi All,&nbsp;&nbsp;&nbsp; First of&nbsp;all thanks very much for yourprecious time. I hope I will get&nbsp;help from here, as I have no other way.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; For more than 2 days, I was trying to parse (and process all nodes) one of my HTML file using differnt parsers available. But I was not able to get the Tag Nodes list only&nbsp;for this particular HTML file.&nbsp;When I tried to process this HTML file with HtmlPraser, it was not detecting the TagNodes, it was just detecting the whole html page as one TextNode.But when I try with other simple HTML files, it does detect TagNodes. Please kindly help me out from this issue. Not sure if my HTML file charecter set is different ? Or Should I choose any encoding options ?Here is my code: Also Attached is my HTML file.It&nbsp;has images but I am not attaching them.&nbsp;&nbsp;parser = new Parser(\"atest.htm\");&nbsp;&nbsp;&nbsp;&nbsp;for (NodeIterator i = parser.elements();i.hasMoreNodes();){&nbsp;&nbsp;&nbsp;&nbsp;processMyNodes(i.nextNode());&nbsp;&nbsp;&nbsp;}&nbsp;static void processMyNodes (Node node) throws ParserException {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (node instanceof TextNode) {e&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TextNode text = (TextNode)node;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.out.println (text.getText ());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (node instanceof RemarkNode) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RemarkNode remark = (RemarkNode)node;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else if (node instanceof TagNode) {&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TagNode tag = (TagNode)node;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NodeList nl = tag.getChildren ();&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (null != nl)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for (NodeIterator i =nl.elements (); i.hasMoreNodes (); )&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; processMyNodes (i.nextNode ());&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }&nbsp; }Kumar.-------------------------------------------------------------------------This SF.net email is sponsored by: Splunk Inc.Still grepping through log files to findproblems?&nbsp;&nbsp;Stop.Now Search log events and configuration files using AJAX and a browser.Download your FREE copy of Splunk now &gt;&gt;&nbsp;&nbsp;http://get.splunk.com/_______________________________________________Htmlparser-user mailing lis...@li...://lists.sourceforge.net/lists/listinfo/htmlparser-user