HTML Parser / Discussion / Help: HTML saving

angelo78 - 2005-02-19

Hi,
I want to extract links from a specific url and
save his html to a file.
I can't find a way to do that with downloading only one time the html.
Can someone please tell me how to do that?
I want that the program will download only one time
and than save the html and extract the links
(with the prefix of the url from i downloaded it).
Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2005-02-20
  
  There is an example of what you want to do in org.htmlparser.parserapplications.SiteCapturer
  
  That example uses custom tags, but the principal of printing out a list of nodes with toHtml() is the same in any case:
  
  // get a node list somehow, either iterating or with a filter
              NodeList list = parser.parse (null);
              try
              {
                  out = new PrintWriter (new FileOutputStream (file));
                  for (int i = 0; i < list.size (); i++)
                      out.print (list.elementAt (i).toHtml ());
                  out.close ();
              }
              catch (FileNotFoundException fnfe)
              {
                  fnfe.printStackTrace ();
              }
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- angelo78 - 2005-02-21
  
  Can I parse a file but give to the parser or to a linkTag
  the URL prefix I want?
  Thanks
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Derrick Oswald - 2005-02-21
    
    I believe the SiteCapturer example program does what you want, except the prefix is set for local storage. You should be able to modify this.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HTML saving

Forums

Help

HTML saving document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

HTML saving