Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders. A common (and useful) convention to use for the version name is the revision number of the version control tool you’re using to track your Scrapy project code. For example: r23. The versions are not compared alphabetically but using a smarter algorithm (the same packaging uses) so r10 compares greater to r9, for example. Scrapyd is an application (typically run as a daemon) that listens to requests for spiders to run and spawns a process for each one. Scrapyd also runs multiple processes in parallel, allocating them in a fixed number of slots given by the max_proc and max_proc_per_cpu options, starting as many processes as possible to handle the load.
Features
- Scrapyd is a service for running Scrapy spiders
- It allows you to deploy your Scrapy projects and control their spiders using an HTTP JSON API
- Documentation available
- Scrapyd comes with a minimal web interface
- For monitoring running processes and accessing logs
- You can use ScrapydWeb to manage your Scrapyd cluster