Showing 18 open source projects for "web spider"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    ...The API is built on top of urllib3 and lxml libraries. The Spider API to build asynchronous web crawlers. You write classes that define handlers for each type of network request. Each handler is able to spawn new network requests. Network requests are processed concurrently with a pool of asynchronous web sockets. Grab provides interface called Spider to develop multithreaded web-site scrapers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ReconSpider

    ReconSpider

    Most Advanced Open Source Intelligence (OSINT) Framework

    ...ReconSpider can be used by Infosec Researchers, Penetration Testers, Bug Hunters, and Cyber Crime Investigators to find deep information about their target. ReconSpider aggregate all the raw data, visualize it on a dashboard, and facilitate alerting and monitoring on the data. Recon Spider also combines the capabilities of Wave, Photon and Recon Dog to do a comprehensive enumeration of attack surfaces. Reconnaissance is a mission to obtain information by various detection methods, about the activities and resources of an enemy or potential enemy, or geographic characteristics of a particular area. A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    BlackWidow

    BlackWidow

    Python web scanner for OSINT gathering and OWASP vulnerability fuzzing

    BlackWidow is a Python-based web application scanning tool designed to crawl target websites and collect open-source intelligence (OSINT) while identifying potential security vulnerabilities. It functions as a web spider that systematically explores a site to gather valuable information such as URLs, dynamic parameters, subdomains, email addresses, and phone numbers associated with the target domain.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Photon

    Photon

    Incredibly fast crawler designed for OSINT

    ...Despite its speed focus, the tool still provides useful filtering and extraction capabilities for analysts who need structured results. Overall, Photon functions as a lightweight yet powerful reconnaissance spider for web intelligence gathering.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    SauceWalk Proxy Helper

    Enumeration and automation of file discovery for your sec tools.

    SauceWalk is a freeware(.exe)/Open Source(.py) tool for aiding in the enumeration of web application structure. It consists of 2 parts a local executable (walk.exe) and a remote agent. Walk.exe iterates through the local files and folders of your target web application (for example a local copy of Wordpress) and generates requests via your favourite proxy (for example burp suite) against a given target url. The remote agent can be used to identify target files and folders on a live system via a PHP script on the target server(ASP/JSP coming soon). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Web spider and SERP scrapper
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Monkey-Spider

    Moved to https://github.com/aikinci/monkeyspider

    The Monkey-Spider is a crawler based low-interaction Honeyclient Project. It is not only restricted to this use but it is developed as such. The Monkey-Spider crawles Web sites to expose their threats to Web clients.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    This plug-in for Google Desktop is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into Google Desktop. You must install Google Desktop prior to installing the plug-in.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB