Re: [Htmlparser-developer] HTML Comments/Remarks
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-12-12 05:21:26
|
Hi Sam, > Also, I solved my problem with the debugging output. The problem was > with the code I was using to output the final data. The print() command ... Oops.. > Thanks for the help. I think I would like to see the > toPlainTextString() method remain. Although I'm not quite sure of the > difference between HTMLRemarkNode.toString and > HTMLRemarkNode.toPlainTextString. This is actually based on your suggestion (eons back..) - toPlainTextString() is the uniform way of getting string representation of a page - meaningful and hopefully semantic data. I think you'd probably want to use toPlainTextString() instead of toString() - as toString() always gives some output for all the tags, while toPlainTextString() works only for specific ones like string nodes, link text and strings inside forms. It was also enabled earlier for comments, but was taken out last week. I am thinking of putting it back in. What this will mean is that if folks have commented tags - you will get that sort of data in your string filter. I think you can live with that (?) Also - I am thinking of a better approach - wherein, should one require pure strings within a comment, one could create a new parser, that operates on the contents of the string node (it would be an interesting approach to try..) Regards, Somik ----- Original Message ----- From: "Sam Joseph" <ga...@yh...> To: <htm...@li...> Sent: Wednesday, December 11, 2002 8:05 PM Subject: Re: [Htmlparser-developer] HTML Comments/Remarks > Hi Somik, > > Thanks for the help. I think I would like to see the > toPlainTextString() method remain. Although I'm not quite sure of the > difference between HTMLRemarkNode.toString and > HTMLRemarkNode.toPlainTextString. > > Trying out both in my code I see that toPlainTextString() seems to > generate a blank while toString() gives me the contents of the > remark/comment. To be specific about my objectives, I'm trying to > handle meta-data by the creative commons group which currently involved > placing a big chunk of rdf/xml in a remark within the page. I'm very > much hoping to be able to extract that comment verbatim and then pass it > over to my rdf/xml parser. > > I'll be happy as long as I can achieve that. > > Also, I solved my problem with the debugging output. The problem was > with the code I was using to output the final data. The print() command > was being called on links and meta-tags, and the way that ant formatted > things it made it look like the associated System.out calls were being > made during the parsing process rather than at the end. Sorry about > that, all fixed now, so don't worry about looking at the code that I > sent you in my previous email. > > Thanks again for all your help. I'm looking forward to fully > integrating HTMLParser with NeuroGrid over the next two days. > > CHEERS> SAM > > Somik Raha wrote: > > >Hi Sam, > > HTMLRemarkNode is a special class -it is not a > >scanner. > > It is registered by default - so you dont have to do > >anything - just check if the node object is a remark > >node. > > > > However, last week, I removed the > >toPlainTextString() implementation as it often a lot > >of HTML code is commented out, and I thought it might > >interfere with a simple string representation of a > >page. If that is not the case and you need to use > >toPlainTextString(), pls let us know, and we should > >put that functionality back in. > > > >Regards, > >Somik > >--- Sam Joseph <ga...@yh...> wrote: > > > > > >>Hi Somik > >> > >>Sorry to ask so much this week, but I was wondering > >>it there some operation for picking up HTML comments > >>using the HTMLParser (<!-- a comment -->) or are > >>they automatically ignored? > >> > >>I can see from the API that there is HTMLRemarkNode, > >>but I can't see any similar tag or scanner. Must a > >>special tag/scanner be created to handle > >>comments/remarks? > >> > >>Thanks in advance. > >> > >>CHEERS> SAM > >> > >> > >> > >> > >> > >> > >> > >> > >------------------------------------------------------- > > > > > >>This sf.net email is sponsored by: > >>With Great Power, Comes Great Responsibility > >>Learn to use your power at OSDN's High Performance > >>Computing Channel > >>http://hpc.devchannel.org/ > >>_______________________________________________ > >>Htmlparser-developer mailing list > >>Htm...@li... > >> > >> > >> > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > >__________________________________________________ > >Do you Yahoo!? > >Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > >http://mailplus.yahoo.com > > > > > >------------------------------------------------------- > >This sf.net email is sponsored by: > >With Great Power, Comes Great Responsibility > >Learn to use your power at OSDN's High Performance Computing Channel > >http://hpc.devchannel.org/ > >_______________________________________________ > >Htmlparser-developer mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: > With Great Power, Comes Great Responsibility > Learn to use your power at OSDN's High Performance Computing Channel > http://hpc.devchannel.org/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |