Scrapy-Redis

You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version 0.3 changed the requests serialization from marshal to cPickle, therefore persisted requests using version 0.2 will not able to work on 0.3. The class scrapy_redis.spiders.RedisSpider enables a spider to read the urls from redis. The urls in the redis queue will be processed one after another, if the first request yields more requests, the spider will process those requests before fetching another url from redis.

Features

Distributed crawling/scraping
Distributed post-processing
Scrapy plug-and-play components
Python 2.7, 3.4 or 3.5 required
Redis >= 2.8 required
Scheduler + Duplication Filter, Item Pipeline, Base Spiders

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Scrapy-Redis

Scrapy-Redis Web Site

Other Useful Business Software

MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free

Rate This Project

User Reviews

Be the first to post a review of Scrapy-Redis!

Additional Project Details

Programming Language

Python

Related Categories

Python Browsers, Python Web Scrapers

Registered

2021-11-09

Similar Business Software

Gaffa

Gaffa is a web scraping and browser automation API that gives developers full, real-browser control with a single API call no headless browsers, proxies, CAPTCHA handling, or scaling infrastructure to manage. JavaScript rendering is handled by default, so pages load exactly as they would for a...

See Software
Oxylabs

Oxylabs is a market leader in web intelligence with enterprise-grade, ethical, and compliant solutions. Its proxy infrastructure spans one of the largest global networks, offering residential, ISP, mobile, datacenter, & dedicated datacenter proxies, along with Web Unblocker – an AI-driven...

See Software
Apify

Apify is a full-stack web scraping and automation platform helping anyone get value from the web. At its core is Apify Store, a marketplace with over 10,000 Actors where developers build, publish, and monetize automation tools. Actors are serverless cloud programs that extract data, automate...

See Software
NetNut

Get ready to experience unmatched control and insights with our user-friendly dashboard tailored to your needs. Monitor and adjust your proxies with just a few clicks. Track your usage and performance with detailed statistics. Our team is devoted to providing customers with proxy solutions...

See Software
Shift

Shift is a fully customizable browser where you can drag and drop apps, bars, and controls to create a central hub that adapts to however you work. Integrate 1,500+ apps, swap instantly between Spaces for work, side hustles or personal browsing, and stay logged into multiple accounts at once....

See Software
Kasm Workspaces

Kasm Workspaces streams your workplace environment directly to your web browser…on any device and from any location. Kasm uses our high-performance streaming and secure isolation technology to provide web-native Desktop as a Service (DaaS), application streaming, and secure/private web...

See Software

Report inappropriate content

Scrapy-Redis

Redis-based components for Scrapy

Get an email when there's a new version of Scrapy-Redis

Features

Project Samples

Project Activity

Categories

License

Follow Scrapy-Redis

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered