Showing 35 open source projects for "crawler"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 1
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    SiteOne Crawler

    SiteOne Crawler

    SiteOne Crawler is a website analyzer and exporter

    ...Watch a detailed video with a sample report for Astro. build website. This crawler can be used as a command-line tool (see releases and video), or you can use a multi-platform desktop application with a graphical interface (see a video about the app).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Crawler Detect

    Crawler Detect

    CrawlerDetect is a PHP class for detecting bots/crawlers/spiders

    Crawler Detect is a PHP library that detects bots, crawlers, and spiders by analyzing user-agent headers and comparing them against a constantly updated list of known crawlers. It's useful for analytics, rate-limiting, or displaying alternative content for automated tools. It is fast, lightweight, and easy to integrate into any PHP application.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives† and META nofollow tags. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Laravel Sitemap

    Laravel Sitemap

    Create and generate sitemaps with ease

    ...The destination of the sitemap should be specified by $path. If you don't want a crawled link to appear in the sitemap, just don't return it in the callable you pass to hasCrawled. You can also instruct the underlying crawler to not crawl some pages by passing a callable to shouldCrawl. You can configure the crawler used by the sitemap generator. The sitemap generator can execute JavaScript on each page so it will discover links that are generated by your JS scripts. You can enable this feature by setting execute_javascript in the config file to true.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    gitcrawl

    gitcrawl

    Local-first GitHub issue and pull request crawler

    gitcrawl is a repository crawling and indexing tool associated with the OpenClaw ecosystem, designed to collect, organize, and analyze Git-based project data at scale. The project focuses on automating repository inspection and enabling structured access to source code, commit history, and metadata for research, automation, or AI-assisted workflows. Its architecture likely emphasizes efficient crawling, local indexing, and integration with other OpenClaw tooling for autonomous analysis and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    ...After turning off the cross-domain switch, you can use various cross-domain functions (support cross-domain). Headless mode, which greatly saves resources and is suitable for crawlers (headless mode, be suitable for Web Crawler).
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    DocSearch

    DocSearch

    The easiest way to add search to your documentation

    Initially created to fulfill our own developers' needs, DocSearch quickly evolved into a successful community project. Over the years, we've explored new ways to address the complexities of search for the open-source community. DocSearch understands how the user input fits into the context of your project and instantly presents the most relevant content with fewer interactions than any other method. With a design very close to the native experience on mobile, we leverage users acquaintance...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    ...Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout. Read the documentation of the BrowserKit, DomCrawler, and HttpClient Symfony Components for more information about what you can do with Goutte. Goutte is a thin wrapper around the following Symfony Components: BrowserKit, CssSelector, DomCrawler, and HttpClient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ReconSpider

    ReconSpider

    Most Advanced Open Source Intelligence (OSINT) Framework

    ...Reconnaissance is a mission to obtain information by various detection methods, about the activities and resources of an enemy or potential enemy, or geographic characteristics of a particular area. A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    CEF Python

    CEF Python

    Python bindings for the Chromium Embedded Framework (CEF)

    Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19

    sponge

    A website crawler and links downloader command line tool

    sponge is a website crawler and links downloader command line tool
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    koa-isbot

    koa-isbot

    Fast Middleware detect bot crawler for Koa

    Koa detects robots. Fast Middleware detects bot crawler for Koa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ShadowSocksShare

    ShadowSocksShare

    Python ShadowSocks framework

    This project obtains the shared ss(r) account from the ss(r) shared website crawler, redistributes the account and generates a subscription link by parsing and verifying the account connectivity. Since Google plus will be closed on April 2, 2019, almost all the available accounts crawled before come from Google plus. So if you are building your own website, please keep an eye on the updates of this project and redeploy using the latest source code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Catberry

    Catberry

    Catberry is an isomorphic framework

    ...The entire architecture of the framework is built using the Service Locator pattern, which helps to manage module dependencies and create plugins, and Flux, for the data layer. Search crawler receives a full page from the server. The whole state of the application is restored from URL. Server-side progressive rendering based on node.js streams and parallel rendering of components in a browser. The framework is well-tested (code coverage is about 90%) and it is already used in production.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DHT

    DHT

    BitTorrent DHT Protocol && DHT Spider.

    ...Set MaxNodes and BlackListMaxSize to fit yourself. DHT aims to implement the standard BitTorrent DHT protocol, not born for crawling the DHT network. NAT Traversal issue. You run the crawler in a local network. It will block ip which looks bad and a good ip may be misjudged.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    frsi

    Fast Remote SVN Info

    ...Windows Users: This tool requires the subversion command line tools: https://sourceforge.net/projects/win32svn/ Credits: Subversion https://subversion.apache.org win32svn https://sourceforge.net/projects/win32svn/ fast-svn-crawler https://sourceforge.net/projects/fastsvncrawler/
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB