In your "Web Ripper - Modifying Links and Image Locations" sample, there is no writeToFileOnDisk(node.toHTML()); method.
Does it mean I have to use the write method of a file FileWriter in the loop :
for (HTMLEnumeration e = parser.elements(); e.hasMoreNodes();) {
node = e.nextHTMLNode();
writeToFileOnDisk(node.toHTML());
}
Is it enough ?
Could you write a full working sample ?
rgds
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
You could check out the latest parser - there has been some amount of refactoring, and writing to disk is a more uniform activity (check the sample programs again).
Regards,
Somik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried with 1.2 version and didn't succeed to do the following:
1) download html code from http://www.google.com
2) parse this html code
3) (and the step that doesn't work for me)
re-construct the html code that has been parsed
Nothing is displayed in my crawl function between "entering reconstruction" and "exiting reconstruction" prints.
Is it possible to re-construct the html file from what has been parsed (from what is still in memory) ?
What is the way to do so ?
rgds,
M.Beauvais
Here is my crawl method
/**
* Crawl using a given crawl depth.
*
* @param crawlDepth Depth of crawling
* @exception HTMLParserException Description of the Exception
*/
public void crawl(int crawlDepth) throws HTMLParserException {
try {
crawl(parser, crawlDepth);
} catch (HTMLParserException e) {
throw new HTMLParserException("HTMLParserException at crawl(" + crawlDepth + ")", e);
}
System.out.println("entering reconstruction ");
RipperRenderer renderer = new RipperRenderer();
HTMLNode node;
for (HTMLEnumeration e = parser.elements(); e.hasMoreNodes(); ) {
node = e.nextHTMLNode();
node.print();
}
System.out.println("exiting reconstruction ");
}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In your "Web Ripper - Modifying Links and Image Locations" sample, there is no writeToFileOnDisk(node.toHTML()); method.
Does it mean I have to use the write method of a file FileWriter in the loop :
for (HTMLEnumeration e = parser.elements(); e.hasMoreNodes();) {
node = e.nextHTMLNode();
writeToFileOnDisk(node.toHTML());
}
Is it enough ?
Could you write a full working sample ?
rgds
Hi,
There is no writeToFileOnDisk(..) bcos you are supposed to write that method as per your application. The code is simply a demonstration.
Bytway, I'd advise you to hang on a bit before using this example - some API changes in HTMLLinkTag and HTMLImageTag will be out next week (Sunday).
Regards,
Somik
Hi,
You could check out the latest parser - there has been some amount of refactoring, and writing to disk is a more uniform activity (check the sample programs again).
Regards,
Somik
Hello again,
I tried with 1.2 version and didn't succeed to do the following:
1) download html code from http://www.google.com
2) parse this html code
3) (and the step that doesn't work for me)
re-construct the html code that has been parsed
Nothing is displayed in my crawl function between "entering reconstruction" and "exiting reconstruction" prints.
Is it possible to re-construct the html file from what has been parsed (from what is still in memory) ?
What is the way to do so ?
rgds,
M.Beauvais
Here is my crawl method
/**
* Crawl using a given crawl depth.
*
* @param crawlDepth Depth of crawling
* @exception HTMLParserException Description of the Exception
*/
public void crawl(int crawlDepth) throws HTMLParserException {
try {
crawl(parser, crawlDepth);
} catch (HTMLParserException e) {
throw new HTMLParserException("HTMLParserException at crawl(" + crawlDepth + ")", e);
}
System.out.println("entering reconstruction ");
RipperRenderer renderer = new RipperRenderer();
HTMLNode node;
for (HTMLEnumeration e = parser.elements(); e.hasMoreNodes(); ) {
node = e.nextHTMLNode();
node.print();
}
System.out.println("exiting reconstruction ");
}
I don't see you using node.toHTML(). Instead of node.print(), use System.out.println(node.toHTML());
Regards,
Somik