htmlparser-user Mailing List for HTML Parser (Page 84)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Aminudin K. <ami...@mi...> - 2003-02-14 03:42:45
|
Hi, I'm sure somebody else had asked this before . My program gets HTML code from others website. This is simply could be done without problem if it is not redirect website. What could I do if it is a redirect website ? It only returns some META tags. I know that we can analyze META tags but do we have any other smart solution ? (Maybe there is an existing module to do this task so I don't have to reinvent the wheel) Any fast and simple solution to this kind of problem ? TQ |
From: Aminudin K. <ami...@mi...> - 2003-02-14 02:03:42
|
Problem solved , sorry :) Aminudin Khalid wrote: > Hi, > > I have successfully parsed my HTML codes and did some modification to > the Strings which is the content of website . However I still failed > to insert back some of the modification I made to HTMLTag's instance. > > In HTMLTag , I noticed that there is a method named setText(String > text). I managed to HTMLTag::getText() and did some modification on > the return string but I failed to insert back the modification with > setText(String text). Calling setText(String text) didn't make any > effect on the final output of HTML code. > > What is your solution ? :) > > Thanks :) > > > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Mohd. Aminudin bin Mohd. Khalid Linux Software Engineer Open Source Development Mimos Berhad, Malaysia http://opensource.mimos.my |
From: Aminudin K. <ami...@mi...> - 2003-02-14 00:30:16
|
Hi, I have successfully parsed my HTML codes and did some modification to the Strings which is the content of website . However I still failed to insert back some of the modification I made to HTMLTag's instance. In HTMLTag , I noticed that there is a method named setText(String text). I managed to HTMLTag::getText() and did some modification on the return string but I failed to insert back the modification with setText(String text). Calling setText(String text) didn't make any effect on the final output of HTML code. What is your solution ? :) Thanks :) |
From: <wf...@ma...> - 2003-02-13 08:44:21
|
Sorry to bother the community with sth. rater off-topic. However I could=20 need some piece of Domino-related advice. Thank you. --=20 Mit freundlichen Gr=FC=DFen / Kind regards Wolfgang Flamme wf...@ma... |
From: Somik R. <so...@ya...> - 2003-02-12 06:44:31
|
Dhaval Udani wrote: > At this point of time we may not be able to understand different > requirements of different applications. As a rule we must not keep > anything binding in the parser. Everything should be read/write. Itsupto > the user of the API to use it properly. As such we keep seeing new ares > where the parser is getting deployed. Alongwith this come new > requirements and new unforeseen needs. Its best to keep it open such > that anyone can go and change any attribute/tag/node etc. I'd be inclined to agree with you - to have a setAttribute method on HTMLTag. But I'd like to add that I do not agree in general about adding stuff that may or may not be used - that is a code smell by itself (in fact, one of the code smells in Fowler's book talks about unnecessary setter methods). This always adds up to maintenance problems later. To accomodate future needs, I think we need to be agile (I hope we are agile enough for our community). In this situation though, we might save some duplication by allowing the setting of the attribute in HTMLTag, and it might be a good feature to have. Regards, Somik |
From: Somik R. <so...@ya...> - 2003-02-12 06:39:54
|
> But... I just said I want to modify the <frame> location, not the form ! I just think there can be a problem when you have an <input> with type=image, because this can be treated as an <img> tag (with a relative path)... Sorry bout that.. for some strange reason, I read that as form tag! We'll ship with this change very soon. Regards, Somik |
From: Elodie T. <et...@in...> - 2003-02-11 07:14:38
|
> > Yes of course- thats why you are allowed to modify urls and image locations. > But of what benefit would it be to remap the form url - bcos you'd need a > real server serving your requests - unless you have local server to map to.. > I am curious about your scenario for form location modification. But... I just said I want to modify the <frame> location, not the form ! I just think there can be a problem when you have an <input> with type=image, because this can be treated as an <img> tag (with a relative path)... > > > And when you use a frame, the src attribute can be to a relative path, so > I need to modify it too. > > Yes, I agree this should be allowed. > > Regards, > Somik Regards, Elodie |
From: <dha...@or...> - 2003-02-11 05:30:12
|
Hi all, My 2 cents worth on this issue. > Yes of course- thats why you are allowed to modify urls and image locations. > But of what benefit would it be to remap the form url - bcos you'd need a > real server serving your requests - unless you have local server to map to.. > I am curious about your scenario for form location modification. At this point of time we may not be able to understand different requirements of different applications. As a rule we must not keep anything binding in the parser. Everything should be read/write. Itsupto the user of the API to use it properly. As such we keep seeing new ares where the parser is getting deployed. Alongwith this come new requirements and new unforeseen needs. Its best to keep it open such that anyone can go and change any attribute/tag/node etc. Dhaval |
From: Somik R. <so...@ya...> - 2003-02-11 05:03:07
|
> I already answered, too, when you asked me to give a scenario. Sorry, I seem to have missed your earlier mail.. > In my case, I work in a portal where you can import any kind of files. > The filesystem is so that the "logical paths" are different from the "physical paths" : for example, /my_html_files/index.html is in "real" /rep1024/fic1035.html. > So, when you want to visualize HTML files from this portal, the 'href' ans 'src' paths aren't valid anymore, so I must take them all, "translate" them and replace them. Yes of course- thats why you are allowed to modify urls and image locations. But of what benefit would it be to remap the form url - bcos you'd need a real server serving your requests - unless you have local server to map to.. I am curious about your scenario for form location modification. > And when you use a frame, the src attribute can be to a relative path, so I need to modify it too. Yes, I agree this should be allowed. Regards, Somik |
From: Elodie T. <et...@in...> - 2003-02-10 07:17:59
|
> I thought I had already replied to this - my question > to you is - why do you want to change the frame > location ? I already answered, too, when you asked me to give a scenario. In my case, I work in a portal where you can import any kind of files. The filesystem is so that the "logical paths" are different from the "physical paths" : for example, /my_html_files/index.html is in "real" /rep1024/fic1035.html. So, when you want to visualize HTML files from this portal, the 'href' ans 'src' paths aren't valid anymore, so I must take them all, "translate" them and replace them. And when you use a frame, the src attribute can be to a relative path, so I need to modify it too. > If you can answer this convincingly, the > next release of the parser will contain this feature. I hope I answered convincingly enough ! ;o) > Regards, > Somik Best regards, Elodie |
From: Mohd-Taqiyuddin Z. <mt...@ec...> - 2003-02-08 16:52:22
|
hi there, i know this may sound stupid, I want a program that when reading <li> or <l0> tag it would add something into the elements in the HTMLNode vector such as "1" and increment it whenever it sees the tag. another question is how i can use Translate class to translate all the code such   into "." and that kind of stuff. |
From: Somik R. <so...@ya...> - 2003-02-08 06:03:25
|
Try : thewriter.write(HTMLParserUtils.removeEscapeCharacters(node.toPlainTextStrin g())); That should make it better. Regards, Somik ----- Original Message ----- From: "ChennaDulla" <che...@go...> To: <htm...@li...> Sent: Friday, February 07, 2003 6:26 AM Subject: [Htmlparser-user] format problem of text file after convertion of html to text file > hi i downloded htmlparser1.2 zip and i put htmlparser.jar > file under lib on my server and org folder under > web_inf ... it is wokring fine to convert html to text file > but the problem is format of text file ... > When i see text file after convertion the format is worst .. > why is the happending like that ... no certain format by > the time writing inot text file ... > here is the code i am using to convert html to text file ... > > import org.htmlparser.util.HTMLEnumeration; > import org.htmlparser.util.HTMLParserException; > import org.htmlparser.HTMLNode; > import org.htmlparser.HTMLParser; > import java.io.*; > import java.util.Properties; > > public class StringExtractor { > // String htmlFile = "/export/a.html"; > public StringExtractor() { > } > public void extractStrings(String htmlFile) throws > HTMLParserException { > try{ > HTMLParser parser = new HTMLParser > (htmlFile); > BufferedWriter thewriter = new BufferedWriter > (new FileWriter("/export/d.txt")); > HTMLNode node; > StringBuffer results= new StringBuffer(); > for (HTMLEnumeration e = parser.elements > ();e.hasMoreNodes();) { > node = e.nextHTMLNode(); > thewriter.write(node.toPlainTextString > ()); > } > thewriter.close(); > }catch(IOException e) { System.out.println > ("error in ConvertJspToHtml.java==="+e ); } > } > > } > > what changes i have to do to see html file in readable > format .. if i run above file it the text file is generating but > the format doesn't look good ... > Any help on this please ... > I am sending the one file as attachment .. i am getting > output in text file like that. ... > > thanks. > > > > -----Original Message----- > > From: htm...@li... > > [mailto:htm...@li...] On Behalf Of > > dha...@or... > > Sent: Thursday, February 06, 2003 11:47 PM > > To: htm...@li... > > Subject: RE: [Htmlparser-user] strip comments HTML source > > > > << File: BDY.RTF >> << File: BDY.RTF >> > |
From: Somik R. <so...@ya...> - 2003-02-07 19:26:48
|
I thought I had already replied to this - my question to you is - why do you want to change the frame location ? If you can answer this convincingly, the next release of the parser will contain this feature. Regards, Somik --- Elodie Tasia <et...@in...> wrote: > Hi, > > I noticed that some of the Tag classes have a method > that permit to modify (or I guess they do) the > "source" attribute (like href or src). These methods > are, for example : setBaseURL, setImageURL, > setLink... > > It seems perfect to me, as I have to modify all > relative path in a html... but I can't find method > that set source location in a frame tag, nor in an > input tag (when type=image). > > What can I do ? Would it be too complex for me if I > tried to add such a method in the HTMLFrameTag class > ? > > And for the InputTag class ? > > Regards, > Elodie > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = > Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Somik R. <so...@ya...> - 2003-02-07 19:25:27
|
--- Aminudin Khalid <ami...@mi...> wrote: > Could u guys help me analyzing what is wrong in the > following codes. In > HTMParser there is a method called > *visitAllNodesWith(visitor)* . The > argument's type is *HTMLVisitor. *However, the > following class use > StringTranslatingVisitor which extends HTMLVisitor > as an argument. JAVAC > keeps complaining me about this. > > Your help is appreciated. Thanks > > p/s : Notice that I've commented out "import > org.htmlparser.visitors". You have to import like this : import org.htmlparser.visitors.*; or import org.htmlparser.visitors.HTMLVisitor; Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: ChennaDulla <che...@go...> - 2003-02-07 14:27:44
|
hi i downloded htmlparser1.2 zip and i put htmlparser.jar file under lib on my server and org folder under web_inf ... it is wokring fine to convert html to text file but the problem is format of text file ... When i see text file after convertion the format is worst .. why is the happending like that ... no certain format by the time writing inot text file ... here is the code i am using to convert html to text file ... import org.htmlparser.util.HTMLEnumeration; import org.htmlparser.util.HTMLParserException; import org.htmlparser.HTMLNode; import org.htmlparser.HTMLParser; import java.io.*; import java.util.Properties; public class StringExtractor { // String htmlFile = "/export/a.html"; public StringExtractor() { } public void extractStrings(String htmlFile) throws HTMLParserException { try{ HTMLParser parser = new HTMLParser (htmlFile); BufferedWriter thewriter = new BufferedWriter (new FileWriter("/export/d.txt")); HTMLNode node; StringBuffer results= new StringBuffer(); for (HTMLEnumeration e = parser.elements ();e.hasMoreNodes();) { node = e.nextHTMLNode(); thewriter.write(node.toPlainTextString ()); } thewriter.close(); }catch(IOException e) { System.out.println ("error in ConvertJspToHtml.java==="+e ); } } } what changes i have to do to see html file in readable format .. if i run above file it the text file is generating but the format doesn't look good ... Any help on this please ... I am sending the one file as attachment .. i am getting output in text file like that. ... thanks. > -----Original Message----- > From: htm...@li... > [mailto:htm...@li...] On Behalf Of > dha...@or... > Sent: Thursday, February 06, 2003 11:47 PM > To: htm...@li... > Subject: RE: [Htmlparser-user] strip comments HTML source > > << File: BDY.RTF >> << File: BDY.RTF >> |
From: <dha...@or...> - 2003-02-07 09:10:40
|
Your import line should be =A0 import org.htmlparser.visitors.*; =A0 -----Original Message----- From: aminudin [mailto:ami...@mi...] Sent: Friday, February 07, 2003 2:39 PM To: htmlparser-user Cc: aminudin Subject: Re: [Htmlparser-user] HTML parser for HTML translation =20 =20 Hi, =20 You're right, HTMLVisitor does exist in=A0 htmlparser.jar . =A0Many strange things happened during compilation but I've=A0 managed=A0 to reduce some errors. =20 Could u guys help me analyzing what is wrong in the following codes. =A0In HTMParser there is a method called visitAllNodesWith(visitor) . The argument's=A0 type is HTMLVisitor. =A0However, the following class use StringTranslatingVisitor which extends HTMLVisitor as an argument. JAVAC keeps complaining me about this. =20 Your help is appreciated. Thanks =20 p/s : =A0Notice that I've commented out =A0"import org.htmlparser.visitors". I couldn't compile if I include this line. (Any reason/ idea ?) =A0 =A0 =A0 =A0 =A0 =A0 FYI, my development platform is Linux. =20 =20 =20 --------------- Error ------------------------------------------ StringTranslatingVisitor.java:45: visitAllNodesWith(org.htmlparser.visitors.HTMLVisitor) in org.htmlparser.HTMLParser cannot be applied to (StringTranslatingVisitor) =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 parser.visitAllNodesWith(visitor); =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ^ 1 error ----------------------------------------------------------------- =20 -----------------------------------JAVA Code ------------------------- import org.htmlparser.HTMLParser; import org.htmlparser.HTMLRemarkNode; import org.htmlparser.HTMLStringNode; import org.htmlparser.tags.HTMLEndTag; import org.htmlparser.tags.HTMLTag; =20 import org.htmlparser.util.HTMLParserException; //import org.htmlparser.visitors; =20 =20 public class StringTranslatingVisitor extends HTMLVisitor =A0=A0=A0 StringBuffer htmlData =3D new StringBuffer(); =A0=A0=A0=20 =A0=A0=A0 public void visitStringNode(HTMLStringNode stringNode)=20 =A0=A0=A0 String yourStuff=3D"htmlTrans"; =A0=A0=A0 // Perform modifications here. =A0=A0=A0 // finally, add to htmlData =A0=A0=A0 htmlData.append(yourStuff); =A0=A0=A0=20 =A0=A0=A0=20 =A0=A0=A0 public void visitEndTag(HTMLEndTag endTag)=20 =A0=A0=A0 htmlData.append(endTag.toHTML()); =A0=A0=A0=20 =20 =A0=A0=A0 public void visitTag(HTMLTag tag)=20 =A0=A0=A0 htmlData.append(tag.toHTML()); =A0=A0=A0=20 =20 =A0=A0=A0 public String getHtml()=20 =A0=A0=A0 return htmlData.toString(); =A0=A0=A0=20 =A0=A0=A0=20 =A0=A0=A0 public void visitRemarkNode(HTMLRemarkNode remarkNode)=20 =A0=A0=A0 htmlData.append(remarkNode.toHTML()); =A0=A0=A0=20 =20 =A0=A0=A0 public static void main(String args[]) =20 =A0=A0=A0=20 =A0=A0=A0 try =A0=A0=A0 =A0=A0=A0 HTMLParser parser =3D new HTMLParser( "http://www.= yahoo.com"); =A0=A0=A0 =A0=A0=A0 parser.registerScanners(); =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 StringTranslatingVisitor visitor =3D= new StringTranslatingVisitor(); =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 parser.visitAllNodesWith(visitor); =20 =A0=A0=A0 catch (HTMLParserException e) =A0=A0=A0 =A0=A0=A0 System.out.println("error :) "); =A0=A0=A0=20 =A0=A0=A0=20 =A0=A0=A0=20 =20 --------------------------------------------------------------------- ------ =20 =20 =20 |
From: Aminudin K. <ami...@mi...> - 2003-02-07 09:04:12
|
Hi, You're right, HTMLVisitor does exist in htmlparser.jar . Many strange things happened during compilation but I've managed to reduce some errors. Could u guys help me analyzing what is wrong in the following codes. In HTMParser there is a method called *visitAllNodesWith(visitor)* . The argument's type is *HTMLVisitor. *However, the following class use StringTranslatingVisitor which extends HTMLVisitor as an argument. JAVAC keeps complaining me about this. Your help is appreciated. Thanks p/s : Notice that I've commented out "import org.htmlparser.visitors". I couldn't compile if I include this line. (Any reason/ idea ?) FYI, my development platform is Linux. --------------- Error ------------------------------------------ StringTranslatingVisitor.java:45: visitAllNodesWith(org.htmlparser.visitors.HTMLVisitor) in org.htmlparser.HTMLParser cannot be applied to (StringTranslatingVisitor) parser.visitAllNodesWith(visitor); ^ 1 error ----------------------------------------------------------------- -----------------------------------JAVA Code ------------------------- import org.htmlparser.HTMLParser; import org.htmlparser.HTMLRemarkNode; import org.htmlparser.HTMLStringNode; import org.htmlparser.tags.HTMLEndTag; import org.htmlparser.tags.HTMLTag; import org.htmlparser.util.HTMLParserException; //import org.htmlparser.visitors; public class StringTranslatingVisitor extends HTMLVisitor{ StringBuffer htmlData = new StringBuffer(); public void visitStringNode(HTMLStringNode stringNode) { String yourStuff="htmlTrans"; // Perform modifications here. // finally, add to htmlData htmlData.append(yourStuff); } public void visitEndTag(HTMLEndTag endTag) { htmlData.append(endTag.toHTML()); } public void visitTag(HTMLTag tag) { htmlData.append(tag.toHTML()); } public String getHtml() { return htmlData.toString(); } public void visitRemarkNode(HTMLRemarkNode remarkNode) { htmlData.append(remarkNode.toHTML()); } public static void main(String args[]){ try{ HTMLParser parser = new HTMLParser("http://www.yahoo.com"); parser.registerScanners(); StringTranslatingVisitor visitor = new StringTranslatingVisitor(); parser.visitAllNodesWith(visitor); }catch (HTMLParserException e){ System.out.println("error :) "); } } } --------------------------------------------------------------------------- |
From: Elodie T. <et...@in...> - 2003-02-07 07:22:26
|
Hi, I noticed that some of the Tag classes have a method that permit to modify (or I guess they do) the "source" attribute (like href or src). These methods are, for example : setBaseURL, setImageURL, setLink... It seems perfect to me, as I have to modify all relative path in a html... but I can't find method that set source location in a frame tag, nor in an input tag (when type=image). What can I do ? Would it be too complex for me if I tried to add such a method in the HTMLFrameTag class ? And for the InputTag class ? Regards, Elodie |
From: Somik R. <so...@ya...> - 2003-02-07 05:41:50
|
Aminudin Khalid writes: > Currently I am testing HTMLParser for my HTML translation engine. FYI, I > am using the latest integration module , Version 1.3 dated 3 February, 2003. > > I had problem when using htmlparser.jar , it couldn't find > HTMLVisitor(I mean org.htmlparser.visitors) but it could find > HTMLParser. Does this means that HTMLVisitor is not included in the > pre-compiled binary that comes along with the integration release ? It is - I just cross-checked, HTMLVisitor is very much a part of the release. Pls verify again. (It is in lib/htmlparser.jar) Regards, Somik |
From: <dha...@or...> - 2003-02-07 04:48:05
|
No no.....I would want all the conmments in a css/javascript file or code to be stripped out when the file is sent for deployment to the production site. This would decrease the size of the file substantially and allow faster loading of the same. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 -----Original Message----- From: joshua [mailto:jo...@in...] Sent: Thursday, February 06, 2003 10:20 PM To: htmlparser-user Cc: joshua Subject: Re: [Htmlparser-user] strip comments HTML source > I would love that and something similar for css and javascript files as > well. Are you saying you'd like to have any css/javascript data in an html page stripped out, so all you have it plain html? regards jk ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: ChennaDulla <che...@go...> - 2003-02-06 22:17:22
|
Hi, I downloded htmlparser1.2 zip and i put htmlparser.jar and under lib on my server and working fine to convert html to text file .. but the prob is the convertion of text file format doesn't look good ... i am sending the text file as attachment generated from html file .. Here is the code i am using to convert html to text file ... import org.htmlparser.util.HTMLEnumeration; import org.htmlparser.util.HTMLParserException; import org.htmlparser.HTMLNode; import org.htmlparser.HTMLParser; import java.io.*; import java.util.Properties; public class StringExtractor { public StringExtractor() { } public void extractStrings(String htmlFile) throws HTMLParserException { try{ HTMLParser parser = new HTMLParser(htmlFile); BufferedWriter thewriter = new BufferedWriter(new FileWriter("/export/home/mailfiles/d.txt")); HTMLNode node; StringBuffer results= new StringBuffer(); for (HTMLEnumeration e = parser.elements();e.hasMoreNodes();) { node = e.nextHTMLNode(); thewriter.write(node.toPlainTextString()); } thewriter.close(); }catch(IOException e) { System.out.println("error in ConvertJspToHtml.java==="+e ); } } } Thanks, Chenna Dulla, GoneHome Inc. 1278 SouthMain St. Canton, Ohio - 44720 tel: 330-649-9258 (W) 440-605-1628 (R) |
From: Joshua K. <jo...@in...> - 2003-02-06 16:48:23
|
> I would love that and something similar for css and javascript files as > well. Are you saying you'd like to have any css/javascript data in an html page stripped out, so all you have it plain html? regards jk |
From: Mohd-Taqiyuddin Z. <mt...@ec...> - 2003-02-06 14:44:16
|
hi, i would like to get help on how to extract certain text on the site. I want to extract sets of question and answers from a site and the relevant information are put in a table which is then put in a form. could you please guide me on this. below is the url. http://developer.java.sun.com/developer/Books/certification/testyourself.html thank you. |
From: <dha...@or...> - 2003-02-06 09:27:02
|
I would love that and something similar for css and javascript files as well. -----Original Message----- From: aminudin [mailto:ami...@mi...] Sent: Thursday, February 06, 2003 2:50 PM To: htmlparser-user Cc: aminudin Subject: [Htmlparser-user] strip comments HTML source Hi, Is there any way / class that could strip all comments from HTML source and produce plain and clean HTML source without any comment . Thanks ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Aminudin K. <ami...@mi...> - 2003-02-06 09:15:02
|
Hi, Is there any way / class that could strip all comments from HTML source and produce plain and clean HTML source without any comment . Thanks |