crawler free download

Showing 13 open source projects for "crawler"

View related business solutions

Frameworks Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

WebMagic

A scalable web crawler framework for Java

WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting.

Downloads: 0 This Week

Last Update: 2025-02-10
See Project
2

Colly

Elegant Scraper and Crawler Framework for Golang

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling.

Downloads: 0 This Week

Last Update: 2025-03-27
See Project
3

Laravel Sitemap

Create and generate sitemaps with ease

...The destination of the sitemap should be specified by $path. If you don't want a crawled link to appear in the sitemap, just don't return it in the callable you pass to hasCrawled. You can also instruct the underlying crawler to not crawl some pages by passing a callable to shouldCrawl. You can configure the crawler used by the sitemap generator. The sitemap generator can execute JavaScript on each page so it will discover links that are generated by your JS scripts. You can enable this feature by setting execute_javascript in the config file to true.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
4

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python.

Downloads: 0 This Week

Last Update: 2023-07-19
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

ReconSpider

Most Advanced Open Source Intelligence (OSINT) Framework

...Reconnaissance is a mission to obtain information by various detection methods, about the activities and resources of an enemy or potential enemy, or geographic characteristics of a particular area. A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

Downloads: 0 This Week

Last Update: 2022-11-25
See Project
6

CEF Python

Python bindings for the Chromium Embedded Framework (CEF)

Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are...

Downloads: 8 This Week

Last Update: 2022-05-03
See Project
7

koa-isbot

Fast Middleware detect bot crawler for Koa

Koa detects robots. Fast Middleware detects bot crawler for Koa.

Downloads: 0 This Week

Last Update: 2024-01-18
See Project
8

ShadowSocksShare

Python ShadowSocks framework

This project obtains the shared ss(r) account from the ss(r) shared website crawler, redistributes the account and generates a subscription link by parsing and verifying the account connectivity. Since Google plus will be closed on April 2, 2019, almost all the available accounts crawled before come from Google plus. So if you are building your own website, please keep an eye on the updates of this project and redeploy using the latest source code.

Downloads: 0 This Week

Last Update: 2022-11-09
See Project
9

Catberry

Catberry is an isomorphic framework

...The entire architecture of the framework is built using the Service Locator pattern, which helps to manage module dependencies and create plugins, and Flux, for the data layer. Search crawler receives a full page from the server. The whole state of the application is restored from URL. Server-side progressive rendering based on node.js streams and parallel rendering of components in a browser. The framework is well-tested (code coverage is about 90%) and it is already used in production.

Downloads: 0 This Week

Last Update: 2022-12-01
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

DHT

BitTorrent DHT Protocol && DHT Spider.

...Set MaxNodes and BlackListMaxSize to fit yourself. DHT aims to implement the standard BitTorrent DHT protocol, not born for crawling the DHT network. NAT Traversal issue. You run the crawler in a local network. It will block ip which looks bad and a good ip may be misjudged.

Downloads: 0 This Week

Last Update: 2023-01-19
See Project
11

go_spider

An awesome Go concurrent Crawler(spider) framework

An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser.

Downloads: 0 This Week

Last Update: 2023-01-27
See Project
12

Macs CMS

** Guys I have built a much more powerful Fully Featured CMS system at: https://github.com/MacdonaldRobinson/FlexDotnetCMS Macs CMS is a Flat File ( XML and SQLite ) based AJAX Content Management System. It focuses mainly on the Edit In Place editing concept. It comes with a built in blog with moderation support, user manager section, roles manager section, SEO / SEF URL

Downloads: 0 This Week

Last Update: 2019-01-26
See Project
13

JavaWAC

Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder

Downloads: 0 This Week

Last Update: 2013-04-19
See Project