Im trying To activate the SQLite-cache in the crawler. But i run into problems.
settings is the same at the example.php that comes with the crawler i only added the setUrlCacheType:
I get these errors when running from cli on windows 7: php.exe -f C:\wamp\www\crawler\classes\external\PHPCrawl\example.php
Warning: unlink(C:\Users\me\AppData\Local\Temp/phpcrawl_tmp_53321384859087\cookiecache.db3): Permission denied in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 481
Warning: rmdir(C:\Users\me\AppData\Local\Temp/phpcrawl_tmp_53321384859087): Directory not empty in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 486
I read on http://cuab.de/spidering_huge_websites.html:
"Please note that the PHP PDO-extension together with the SQLite-driver (PDO_SQLITE) has to be installed and activated to use this type of caching."
when does this error(s) occur? At the end of the crawling-process?
It seems like just the "cleanup" at the end of a crawling-procces fails. For some reasons, the crawler created the sqlite-cookie-db-file correctly (cookiecache.db3), but then it's not allowed to delete it anymore at the end. Strange.
This has nothing to do with PDO-SQlite, its "just" a permission-thing.
Did you try to change the working-directory?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for fast reply. Doing bachelor project with the crawler :)
I did not change working directory.
Therefore i tried the following now:
Both
$crawler->setWorkingDirectory("tmp/");
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
Result in browser:
Exception: Error creating working directory 'tmp/phpcrawl_tmp_38321384868535\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
Result in commandline:
Error creating working directory 'tmp/phpcrawl_tmp_53681384868674\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
In top with no crawling result)
setUrlCacheType only
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
Result in browser:
Warning: unlink(C:\Windows\Temp/phpcrawl_tmp_38321384868771\urlcache.db3) [function.unlink]: Permission denied in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 481
Warning: rmdir(C:\Windows\Temp/phpcrawl_tmp_38321384868771) [function.rmdir]: Directory not empty in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 486
Result in commandline:
Warning: unlink(C:\Users\larsmqller\AppData\Local\Temp/phpcrawl_tmp_63721384868967\urlcache.db3): Permission denied in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 481
Warning: rmdir(C:\Users\larsmqller\AppData\Local\Temp/phpcrawl_tmp_63721384868967): Directory not empty in C:\wamp\www\kierkegaard\classes\external\PPCrawl\libs\PHPCrawlerUtils.class.php on line 486
Crawling, error in bottom
setWorkingDirectory only
$crawler->setWorkingDirectory("tmp/");
Result in browser:
Exception: Error creating working directory 'tmp/phpcrawl_tmp_38321384869100\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
Result in commandline:
Exception: Error creating working directory 'tmp/phpcrawl_tmp_31601384869252\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
In top with no crawling result)
I create the tmp dir: (setWorkingDirectory only)
Result in browser: It runs, and i see tmpfiles in tmp dir until crawler is finished
(run.png)
Result in commandline:
Uncaught exception 'Exception' with message 'Error creating working directory '/tmp/phpcrawl_tmp_70681384869580\'' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
(no crawling result.)
Both settings with the tmp dir i created:
Result in browser:
Exception: Error creating working directory '/tmp/phpcrawl_tmp_38321384869712\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
(no crawling result.)
Result in commandline:
Error creating working directory '/tmp/phpcrawl_tmp_70401384869773\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
(no crawling result.)
When running in browser it works :)
When running in command line i get this error:
Error creating working directory 'tmp/phpcrawl_tmp_71241384980199\' in C:\wamp\www\me\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
The settings i added:
// Set working directory
$crawler->setWorkingDirectory('tmp/');
// Set cache to harddisk instead of the memory - For crawling huge websites
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
For the browser test to work i need to create the "tmp" folder before it start working, i don't know if this is suppose to be done manually
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi peeeps :)
Im trying To activate the SQLite-cache in the crawler. But i run into problems.
settings is the same at the example.php that comes with the crawler i only added the setUrlCacheType:
$crawler = new MyCrawler();
$crawler->setURL("www.php.net");
$crawler->addContentTypeReceiveRule("#text/html#");
$crawler->addURLFilterRule("#.(jpg|jpeg|gif|png)$# i");
$crawler->enableCookieHandling(true);
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
$crawler->setTrafficLimit(200 * 1024);
$crawler->go();
I get these errors when running from cli on windows 7:
php.exe -f C:\wamp\www\crawler\classes\external\PHPCrawl\example.php
Warning: unlink(C:\Users\me\AppData\Local\Temp/phpcrawl_tmp_53321384859087\cookiecache.db3): Permission denied in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 481
Warning: rmdir(C:\Users\me\AppData\Local\Temp/phpcrawl_tmp_53321384859087): Directory not empty in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 486
I read on http://cuab.de/spidering_huge_websites.html:
"Please note that the PHP PDO-extension together with the SQLite-driver (PDO_SQLITE) has to be installed and activated to use this type of caching."
These are my settings in wamp:
http://picpaste.com/pics/aZc3Se9v.1384859627.jpg
PHP version 5.3.13
Can anybody tell me what i am doing wrong? :)
Lars.
Last edit: Anonymous 2013-11-19
Hi Lars,
when does this error(s) occur? At the end of the crawling-process?
It seems like just the "cleanup" at the end of a crawling-procces fails. For some reasons, the crawler created the sqlite-cookie-db-file correctly (cookiecache.db3), but then it's not allowed to delete it anymore at the end. Strange.
This has nothing to do with PDO-SQlite, its "just" a permission-thing.
Did you try to change the working-directory?
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Thank you for fast reply. Doing bachelor project with the crawler :)
I did not change working directory.
Therefore i tried the following now:
Both
$crawler->setWorkingDirectory("tmp/");
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
Result in browser:
Exception: Error creating working directory 'tmp/phpcrawl_tmp_38321384868535\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
Result in commandline:
Error creating working directory 'tmp/phpcrawl_tmp_53681384868674\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
In top with no crawling result)
setUrlCacheType only
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
Result in browser:
Warning: unlink(C:\Windows\Temp/phpcrawl_tmp_38321384868771\urlcache.db3) [function.unlink]: Permission denied in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 481
Warning: rmdir(C:\Windows\Temp/phpcrawl_tmp_38321384868771) [function.rmdir]: Directory not empty in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 486
Result in commandline:
Warning: unlink(C:\Users\larsmqller\AppData\Local\Temp/phpcrawl_tmp_63721384868967\urlcache.db3): Permission denied in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawlerUtils.class.php on line 481
Warning: rmdir(C:\Users\larsmqller\AppData\Local\Temp/phpcrawl_tmp_63721384868967): Directory not empty in C:\wamp\www\kierkegaard\classes\external\PPCrawl\libs\PHPCrawlerUtils.class.php on line 486
Crawling, error in bottom
setWorkingDirectory only
$crawler->setWorkingDirectory("tmp/");
Result in browser:
Exception: Error creating working directory 'tmp/phpcrawl_tmp_38321384869100\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
Result in commandline:
Exception: Error creating working directory 'tmp/phpcrawl_tmp_31601384869252\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
In top with no crawling result)
I create the tmp dir: (setWorkingDirectory only)
Result in browser:
It runs, and i see tmpfiles in tmp dir until crawler is finished
(run.png)
Result in commandline:
Uncaught exception 'Exception' with message 'Error creating working directory '/tmp/phpcrawl_tmp_70681384869580\'' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
(no crawling result.)
Both settings with the tmp dir i created:
Result in browser:
Exception: Error creating working directory '/tmp/phpcrawl_tmp_38321384869712\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
(no crawling result.)
Result in commandline:
Error creating working directory '/tmp/phpcrawl_tmp_70401384869773\' in C:\wamp\www\kierkegaard\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
(no crawling result.)
Lars.
Hi Lars again,
can't be a big problem, just permission problems and/or path-seperators. I'll take a look at it later on.
I strongly recommend you to use a Linux-OS together with phpcrawl (if possible for your work), that's what it was made for, it's stable there!
AND: You can use multiple processes out of the box for spidering websites, that will speed up things a LOT for you!
I'll let you know if i know more.
One more question: What windows-version do you use? (7/8? 32bit or 64 bit?)
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Thanks for the advise, i will look into that after the bachelor project :)
I'm using Windows 7 - 64bit (for now)
Hi Lars,
finally i figured out the problem.
To fix the problem:
In the cleanup()-method of PHPCrawler.class.php insert these two lines at the beginning (line 797):
$this->CookieCache = null;
$this->LinkCache = null;
So it should look like this:
Please let me know if it worked four you over there too.
I'm opening a bugreport for this, will get officially fixed in the next version.
THANKS for the report!
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
When running in browser it works :)
When running in command line i get this error:
Error creating working directory 'tmp/phpcrawl_tmp_71241384980199\' in C:\wamp\www\me\classes\external\PHPCrawl\libs\PHPCrawler.class.php on line 782
The settings i added:
// Set working directory
$crawler->setWorkingDirectory('tmp/');
// Set cache to harddisk instead of the memory - For crawling huge websites
$crawler->setUrlCacheType(PHPCrawlerUrlCacheTypes::URLCACHE_SQLITE);
For the browser test to work i need to create the "tmp" folder before it start working, i don't know if this is suppose to be done manually
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Try without $crawler->setWorkingDirectory('tmp/') (let it use the default systems tmp-dir), then it should work in browser and in CLI.
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Yep. Working :)
Awesome.
Thank you :)
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Glad i could help.
Goold luck for you work!