Showing 16 open source projects for "crawler"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    Nebula libp2p DHT

    Nebula libp2p DHT

    A libp2p DHT crawler, monitor, and measurement tool

    A libp2p DHT crawler and monitor that tracks the liveness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Polkadot, Kusama, Rococo, Westend networks and more. The crawler can store its results as JSON documents or in a postgres database - the --dry-run flag prevents it from doing either.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Pholcus

    Pholcus

    Distributed high-concurrency crawler software written in pure golang

    ...This software is only used for academic research, users need to abide by the relevant laws and regulations of their location, please do not use it for illegal purposes! Provide users with a certain Go or JS programming foundation with a heavyweight crawler tool that only needs to pay attention to rule customization and complete functions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    gitcrawl

    gitcrawl

    Local-first GitHub issue and pull request crawler

    gitcrawl is a repository crawling and indexing tool associated with the OpenClaw ecosystem, designed to collect, organize, and analyze Git-based project data at scale. The project focuses on automating repository inspection and enabling structured access to source code, commit history, and metadata for research, automation, or AI-assisted workflows. Its architecture likely emphasizes efficient crawling, local indexing, and integration with other OpenClaw tooling for autonomous analysis and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Modlishka

    Modlishka

    Powerful and flexible HTTP reverse proxy

    Modlishka is a powerful and flexible HTTP reverse proxy. It implements an entirely new and interesting approach of handling browser-based HTTP traffic flow, which allows to transparently proxy of multi-domain destination traffic, both TLS and non-TLS, over a single domain, without the requirement of installing any additional certificate on the client. What exactly does this mean? In short, it simply has a lot of potential, that can be used in many use case scenarios. Modlishka was written as...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Scope Sentry

    Scope Sentry

    Cyberspace asset mapping and vulnerability scanning platform

    ScopeSentry is an open source cybersecurity tool designed for cyberspace asset mapping and automated security analysis. It helps security researchers and penetration testers discover, monitor, and analyze internet-facing assets belonging to a target scope. ScopeSentry combines multiple reconnaissance and vulnerability assessment capabilities such as subdomain enumeration, port scanning, directory scanning, and sensitive information detection. ScopeSentry can automatically identify assets and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    crawlergo

    crawlergo

    Headless Chrome crawler for collecting URLs for vulnerability scans

    crawlergo is a browser-based web crawler designed to collect URLs and request data that can be used by web vulnerability scanning tools. It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real user interaction. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Hakrawler

    Hakrawler

    Fast Go web crawler for discovering URLs and web app endpoints

    hakrawler is a lightweight command-line web crawler built in Go that is designed to quickly discover URLs, endpoints, and assets within web applications. It is primarily used during the reconnaissance phase of security testing, bug bounty hunting, and penetration testing. It works by automatically crawling web pages and extracting links, JavaScript file locations, and other resources that may reveal additional attack surface or hidden functionality. hakrawler is implemented as a simple and efficient crawler using the Gocolly library, which allows it to perform fast and concurrent crawling of web pages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    gocrawl

    gocrawl

    Polite concurrent web crawler library for Go with flexible hooks

    gocrawl is a lightweight web crawling library written in the Go programming language that enables developers to build custom web crawlers and data extraction tools. gocrawl focuses on providing a minimal yet powerful crawling engine that can be easily extended and adapted for different web scraping or indexing tasks. It is designed to be polite when accessing websites by respecting crawling rules such as robots.txt policies and applying crawl delays for each host. It executes requests...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    ...After collecting these nodes, proxypool processes them by removing duplicates and verifying whether each node is functional. proxypool then provides a usable list of proxy nodes that have passed availability checks. proxypool supports several popular proxy protocols, allowing it to work with multiple types of proxy infrastructures. The behavior of the crawler and the sources it scans can be configured through configuration files, enabling users to customize how nodes are gathered and maintained. It also supports scheduled crawling to continuously update the proxy list and keep the pool current with newly discovered nodes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    crawler

    crawler

    A toolchain for bringing web2 to web3

    A toolchain for bringing web2 to web3. IPFS daemon should be launched. Download enwiki-latest-all-titles to crawler root dir. Basically, there are two main functions provided by crawler tool. The first one is to parse wiki titles and submit links between keywords and wiki pages. Also, crawler has separate command upload-duras-to-ipfs to upload files to local IPFS node. All DURAs are collected under single root unixfs directory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DHT

    DHT

    BitTorrent DHT Protocol && DHT Spider.

    ...Set MaxNodes and BlackListMaxSize to fit yourself. DHT aims to implement the standard BitTorrent DHT protocol, not born for crawling the DHT network. NAT Traversal issue. You run the crawler in a local network. It will block ip which looks bad and a good ip may be misjudged.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    go_spider

    go_spider

    An awesome Go concurrent Crawler(spider) framework

    An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB