Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2016-04-20 | 1.3 kB | |
v0.3.7.tar.gz | 2016-04-20 | 2.2 MB | |
v0.3.7.zip | 2016-04-20 | 2.3 MB | |
Totals: 3 Items | 4.5 MB | 0 |
- ThreadBaseScheduler added to improve the performance of scheduler
- robots.txt supported!
- elasticsearch database backend supported!
- new script callback
on_finished
, http://docs.pyspider.org/en/latest/About-Projects/#on_finished-callback - you can now set the delay time between retries:
retry_delay is a dict to specify retry intervals. The items in the dict are {retried: seconds}, and a special key: '' (empty string) is used to specify the default retry delay if not specified. - dict parameters in crawl_config, @config will be merged (e.g. headers), thanks to @ihipop - add parameter
max_redirects
inself.crawl
to control maximum redirect numbers when doing the fetch, thanks to @AtaLuZiK - add parametervalidate_cert
inself.crawl
to ignore the error of server’s certificate. - new propertyetree
for Response,etree
is a cached lxml.html.HtmlElement object, thanks to @waveyeung - you can now pass arguments to phantomjs from command line or config file. - support for pymongo 3.0 - local.projectdb now accept a glob path (e.g. script/*.py) to load multiple projects from local filesystem. - queue size in the dashboard is not working for osx, thanks to @xyb - counters in dashboard will shown for stopped projects - other bug fix