I'm using a code like yahoo-canon example... The code is working fine, but I'm
trying to modify it so it wont need maxloops anymore... In fact, I want to
parse all "next" pages, and I dont know how much pages are there ...
org.webharvest.exception.ScriptException: Error during script execution: Parse error at line 1, column 102. Encountered: <EOF>
at org.webharvest.runtime.scripting.BeanShellScriptEngine.eval(Unknown Source)
at org.webharvest.runtime.templaters.BaseTemplater.execute(Unknown Source)
at org.webharvest.runtime.processors.TemplateProcessor.execute(Unknown Source)
at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
........
I admit that I'm not sure about the if-else structure, but I tried using
return, (), {}, ... with no success... Can you please help me fixing this :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, I'll reformulate my question, If I want to parse all pages but I don't
know how many pages are there, how can I use the function without specifying
maxloops attribute?
In other words, I want the while loop to stop when nextLinkUrl returned by
nextXPath is empty
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Try to compare it with string1.equals(string2), because == comparator compare
references, not strings.
or !sting1.equals(string2) instead of != in your case.
But I solved same problem with maxloop, in way that I get the field on web
page which says how many results are there and then calculate maxloop. You can
use this, but first try to fix your code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Number of results is a reasonable solution in most cases. However, it does not
work for Google Search, where the number of results is uninformative "Page 58
of about 226,000,000 results (0.77 seconds)". Most pages have about 10 results
and never more than 15, which would imply 22.6 million results.
That's why the general solution is preferable. I am not having any luck with
it either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using a code like yahoo-canon example... The code is working fine, but I'm
trying to modify it so it wont need maxloops anymore... In fact, I want to
parse all "next" pages, and I dont know how much pages are there ...
I want it to keep parsing till nextPath var is empty... I modified functions
code here: http://web-harvest.sourceforge.net/samples.php?num=0 to:
But I still having errors:
I admit that I'm not sure about the if-else structure, but I tried using
return, (), {}, ... with no success... Can you please help me fixing this :)
Well, I'll reformulate my question, If I want to parse all pages but I don't
know how many pages are there, how can I use the function without specifying
maxloops attribute?
In other words, I want the while loop to stop when nextLinkUrl returned by
nextXPath is empty
This modified function never stops !
http://pastebin.com/0qqNDbvb
Try to compare it with string1.equals(string2), because == comparator compare
references, not strings.
or !sting1.equals(string2) instead of != in your case.
But I solved same problem with maxloop, in way that I get the field on web
page which says how many results are there and then calculate maxloop. You can
use this, but first try to fix your code.
Well, that's what I did indeed... I lookup how much pages are there before
starting to parse them, but it's not always easy :-/
Unfortuantely, the other method still not working even when using
!sting1.equals(string2)... Too bad...
Thanks anyway for your help :)
Sorry .equals and == is different in Java, my mistake :) For JavaScript is
fine.
I used to get the number of results, and then parse it and divide it with
number of results per page, and get the next higher value.
Number of results is a reasonable solution in most cases. However, it does not
work for Google Search, where the number of results is uninformative "Page 58
of about 226,000,000 results (0.77 seconds)". Most pages have about 10 results
and never more than 15, which would imply 22.6 million results.
That's why the general solution is preferable. I am not having any luck with
it either.