Re: [Htmlparser-user] Re :Re: Tag Nodes not getting recognized...Please Help
Brought to you by:
derrickoswald
|
From: Derrick O. <der...@ro...> - 2007-07-30 22:33:30
|
The ISO-8859-1 encoding contains ASCII, you might try that.
If there aren't any funny characters in the file it should work OK.
----- Original Message ----
From: k <km...@re...>
To: htm...@li...
Sent: Monday, July 30, 2007 8:24:07 AM
Subject: [Htmlparser-user] Re :Re: Tag Nodes not getting recognized...Please Help
Thanks a ton Derrick, for your message, your help is highly appreciable.
I have tried earlier using parser.setEncoding("UTF-8"), but it was also not working as expected. Today I have tried getting the content of the file in a string using, parser.setInputHTML(getContentsAsString(testFile)). But it also did not work.
The only way it worked is that, if I open the HTML file outside in TextPad and saved it again with Encoding 'ANSI', and then running my code with this new file.
Could you please suggest a way that I can do the above using htmlParser or by any other means? I tried reading the file a line at a time and using the following for the conversion.
byte[] stringBytesUTF = line.getBytes("UTF-8");
ansiString = new String(stringBytesUTF, "ANSI")
But it seems ANSI is not a valid argument.
Any advice in this respect is highly valueble to me.
Thanking You,
Kumar.
On Sat, 28 Jul 2007 13:48:36 -0700 (PDT) htmlparser user list wrote
It appears the file is unicode, probably UTF-8, so you'll need to get the contents as a string yourself, or try parser.setEncoding ("UTF-8") before performing the parse. Some operating systems support a bye order mask prefix (like 0xFEFF) within the file to identify such files as other than plain ascii.
----- Original Message ----
From: k
To: htm...@li...
Sent: Saturday, July 28, 2007 8:12:19 AM
Subject: [Htmlparser-user] Tag Nodes not getting recognized...Please Help
Hi All,
First of all thanks very much for your
precious time. I hope I will get help from here, as I have no other way.
For more than 2 days, I was trying to parse (and process all nodes) one of my HTML file using differnt parsers available.
But I was not able to get the Tag Nodes list only for this particular HTML file.
When I tried to process this HTML file with HtmlPraser, it was not detecting the TagNodes, it was just detecting the whole html page as one TextNode.
But when I try with other simple HTML files, it does detect TagNodes. Please kindly help me out from this issue.
Not sure if my HTML file charecter set is different ? Or Should I choose any encoding options ?
Here is my code: Also Attached is my HTML file.It has images but I am not attaching them.
parser = new Parser("atest.htm");
for (NodeIterator i = parser.elements();
i.hasMoreNodes();){
processMyNodes(i.nextNode());
}
static void processMyNodes (Node node) throws ParserException {
if (node instanceof TextNode) {e
TextNode text = (TextNode)node;
System.out.println (text.getText ());
}
if (node instanceof RemarkNode) {
RemarkNode remark = (RemarkNode)node;
}
else if (node instanceof TagNode) {
TagNode tag = (TagNode)node;
NodeList nl = tag.getChildren ();
if (null != nl)
for (NodeIterator i =
nl.elements (); i.hasMoreNodes (); )
processMyNodes (i.nextNode ());
}
}
Kumar.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find
problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|