Showing 245 open source projects for "crawler"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Abot

    Abot

    Fast and flexible C# framework for building customizable web crawlers

    Abot is an open source C# web crawler framework designed to help developers efficiently crawl and process web content. It focuses on speed, flexibility, and extensibility while handling the complex low-level tasks involved in web crawling. It manages essential components such as multithreading, HTTP requests, scheduling, and link parsing so developers can focus on processing the collected data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    gocrawl

    gocrawl

    Polite concurrent web crawler library for Go with flexible hooks

    gocrawl is a lightweight web crawling library written in the Go programming language that enables developers to build custom web crawlers and data extraction tools. gocrawl focuses on providing a minimal yet powerful crawling engine that can be easily extended and adapted for different web scraping or indexing tasks. It is designed to be polite when accessing websites by respecting crawling rules such as robots.txt policies and applying crawl delays for each host. It executes requests...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    CEF Python

    CEF Python

    Python bindings for the Chromium Embedded Framework (CEF)

    Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5

    Mowglee

    Mowglee - The Geo Crawler!

    Mowglee is a distributed, multi-threaded, asynchronous task execution based web crawler in Java.It is designed for geographic affinity and is highly modular.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    ...After collecting these nodes, proxypool processes them by removing duplicates and verifying whether each node is functional. proxypool then provides a usable list of proxy nodes that have passed availability checks. proxypool supports several popular proxy protocols, allowing it to work with multiple types of proxy infrastructures. The behavior of the crawler and the sources it scans can be configured through configuration files, enabling users to customize how nodes are gathered and maintained. It also supports scheduled crawling to continuously update the proxy list and keep the pool current with newly discovered nodes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    RED HAWK

    RED HAWK

    All-in-one reconnaissance and vulnerability scanning toolkit for sites

    RED HAWK is an open source command-line security tool designed for information gathering, vulnerability scanning, and web reconnaissance tasks. It combines multiple scanning and analysis capabilities into a single toolkit to help security researchers and penetration testers quickly analyze a target website. It can collect a wide range of information about domains, servers, and web applications, including network details, hosting configuration, and content management system detection. It also...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8

    PHP mini vulnerability suite

    Multiple server/webapp vulnerability scanner

    github: https://github.com/samedog/phpmvs
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    sponge

    A website crawler and links downloader command line tool

    sponge is a website crawler and links downloader command line tool
    Downloads: 0 This Week
    Last Update:
    See Project
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 10

    Universal Proxy Software 2.0

    Universal Proxy Software 2.0 is the most advanced proxy software.

    ...Features: Proxy Grabber Proxy Checker Proxy(IP) Changer Mega Proxy Grabber Mega Proxy Checker Auto Proxy Changer Proxy Editor Proxy Leecher Proxy Scrapper Mega Proxy Editor Mega Proxy Leecher Mega Proxy Scrapper Proxy Combiner Proxy Lookup Text Proxy Leecher Mega Proxy Combiner Mega Proxy Lookup Mega Text Proxy Leecher Mor Crawler Proxy Viewer Proxy URL Grabber Proxy Grabber: Proxy Grabber is one of the unique features of Universal Proxy Software.This grabber can grab https proxies upto 17k and socks 5 upto 37k and socks 4 also. Mega Proxy Checker Mega Proxy Checker is very fast and it can support a large amount of proxies. Auto Proxy Changer Auto Proxy Changer changes our ip automatically. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 11
    koa-isbot

    koa-isbot

    Fast Middleware detect bot crawler for Koa

    Koa detects robots. Fast Middleware detects bot crawler for Koa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    WebSploit Framework

    WebSploit Framework

    WebSploit is a high level MITM Framework

    WebSploit Advanced MITM Framework [+]Autopwn - Used From Metasploit For Scan and Exploit Target Service [+]wmap - Scan,Crawler Target Used From Metasploit wmap plugin [+]format infector - inject reverse & bind payload into file format [+]phpmyadmin Scanner [+]CloudFlare resolver [+]LFI Bypasser [+]Apache Users Scanner [+]Dir Bruter [+]admin finder [+]MLITM Attack - Man Left In The Middle, XSS Phishing Attacks [+]MITM - Man In The Middle Attack [+]Java Applet Attack [+]MFOD Attack Vector [+]ARP Dos Attack [+]Web Killer Attack [+]Fake Update Attack [+]Fake Access point Attack [+]Wifi Honeypot [+]Wifi Jammer [+]Wifi Dos [+]Wifi Mass De-Authentication Attack [+]Bluetooth POD Attack Project In Github : https://github.com/websploit
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    BotSlayer

    BotSlayer

    BotSlayer Community Edition

    BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. The tool is developed by the Observatory on Social Media at Indiana University --- the same lab that brought to you Botometer and Hoaxy. BotSlayer is not a tool to detect and remove likely social bots from your list of Twitter followers or friends. For that purpose, check out Botometer. If you just want to visualize the spread of some piece of information, consider Hoaxy....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    crawler

    crawler

    A toolchain for bringing web2 to web3

    A toolchain for bringing web2 to web3. IPFS daemon should be launched. Download enwiki-latest-all-titles to crawler root dir. Basically, there are two main functions provided by crawler tool. The first one is to parse wiki titles and submit links between keywords and wiki pages. Also, crawler has separate command upload-duras-to-ipfs to upload files to local IPFS node. All DURAs are collected under single root unixfs directory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Photon

    Photon

    Incredibly fast crawler designed for OSINT

    Photon is an extremely fast web crawler built specifically for OSINT and reconnaissance use cases. It is designed to extract URLs, endpoints, files, and other intelligence artifacts from target websites with minimal overhead. The crawler prioritizes speed and breadth, making it suitable for mapping web attack surfaces and discovering hidden resources. Photon is commonly used during early reconnaissance phases to build a comprehensive inventory of reachable assets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ShadowSocksShare

    ShadowSocksShare

    Python ShadowSocks framework

    This project obtains the shared ss(r) account from the ss(r) shared website crawler, redistributes the account and generates a subscription link by parsing and verifying the account connectivity. Since Google plus will be closed on April 2, 2019, almost all the available accounts crawled before come from Google plus. So if you are building your own website, please keep an eye on the updates of this project and redeploy using the latest source code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    pxer

    pxer

    Pixiv crawler userscript for downloading artwork and galleries easily

    Pxer is an open source tool designed to help users collect and download artwork from the Pixiv illustration platform. It is implemented primarily in client-side JavaScript and runs directly in the browser through a userscript environment, allowing it to integrate seamlessly with Pixiv pages. Pxer provides functionality to crawl and gather images, artwork metadata, and other related content from supported Pixiv pages. It is designed to be accessible even for users who are not developers,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    mzitu is a Python-based web crawling project designed to automatically download and organize image galleries from a specific photography site. It demonstrates how to build a scraper that navigates gallery pages, retrieves image links, and saves the images locally in a structured directory layout. It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    Headless Chrome Crawler

    Headless Chrome Crawler

    Distributed crawler powered by Headless Chrome

    ...Dynamic crawlers based on PhantomJS and Selenium work magically on such dynamic applications. However, PhantomJS's maintainer has stepped down and recommended to switch to Headless Chrome, which is fast and stable. This crawler is dynamic and based on Headless Chrome.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WeChatSogou

    WeChatSogou

    Python library to crawl and retrieve data from WeChat accounts

    ...It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. Internally, the library separates its functionality into several layers including an API interface, request handling, and response parsing components to organize the crawling workflow. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB