getProcessReport() - user_abort and abort_reason not returning correct values
Status: Beta
Brought to you by:
huni
Hai,
When you
The function getProcessReport() will return:
When in fact it should (I think so, but I'm not sure if this is behavior by design) return:
In my project I need it to show that the user aborted the crawling process, even when the process has already ended because it has no more URL's to do, so I've edited line 606 of PHPCrawler.class.php (where the problem is located) to this:
(excuse the formatting)
Just wanted to let you know, could you perhaps tell if this was by design or a bug?
Anonymous
Hi,
thanks a lot for this detailed report!
I think you are right, in that special case the crawler should report a user-abort.
Or to be even more concrete: The crawler should report BOTH, cause that's the case here, there's a user-abort AND - at the same time - there's nothing more to do (passedthough/no more URLs in the queue).
Problem: Right now the crawler isn't capable of reporting multiple abort-reasons.
What do you think?
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Hi, excuse me for the late response.
The crawler should report both indeed, but since it is not possible at this moment (and it would probably take some time to implement), I'd rather have it return the ABORTREASON_USERABORT than ABORTREASON_PASSEDTHROUGH.
This is because the negative return in handleDocumentInfo happens before it is done crawling the url/website, so ABORTREASON_USERABORT should have priority.
Last edit: Anonymous 2015-04-08