Showing 15 open source projects for "crawler"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    crawler

    crawler

    Collection of JS reverse engineering examples for web scraping study

    ...Many examples illustrate techniques such as debugging scripts, intercepting requests, analyzing encrypted parameters, and understanding authentication flows. crawler also explores common anti-scraping defenses and demonstrates how developers can examine them through debugging tools and reverse engineering techniques.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    fess

    fess

    Open source enterprise search server for websites, files, and data

    ...Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. It supports indexing and searching across many document formats including office documents, PDFs, and compressed archives. It also provides a web-based administrative interface that allows administrators to configure crawling targets, manage indexing tasks, and adjust search settings from a graphical dashboard.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    ScrapBot 1.40 64bits

    ScrapBot 1.40 64bits

    Task automation software for accessing and manipulating website data.

    ScrapBot is a task automation software that allows you to access, authenticate, extract, and insert data on any website. The software utilizes JavaScript to execute tasks, eliminating the need for server or additional software installations. The system can control the accessed webpage through JavaScript, and the entire navigation can be viewed in the program window. The main.js script runs in a separate frame from the navigation frame but can access all page content without any restrictions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 10
    pxer

    pxer

    Pixiv crawler userscript for downloading artwork and galleries easily

    Pxer is an open source tool designed to help users collect and download artwork from the Pixiv illustration platform. It is implemented primarily in client-side JavaScript and runs directly in the browser through a userscript environment, allowing it to integrate seamlessly with Pixiv pages. Pxer provides functionality to crawl and gather images, artwork metadata, and other related content from supported Pixiv pages. It is designed to be accessible even for users who are not developers,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    lightcrawler

    lightcrawler

    Website crawler that audits site pages automatically with Lighthouse

    ...This allows developers to audit multiple pages of a site automatically instead of manually running Lighthouse on each individual page. Lightcrawler supports configuration through a JSON configuration file, enabling users to customize how the crawler operates and which Lighthouse audits should be executed. Settings such as crawl depth and the number of concurrent browser instances can be configured to control how aggressively the crawler scans a site. It was created as a developer utility to help identify issues across an entire website more efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    OpenWebSpider
    OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Methanol is a scriptable multi-purpose web crawling system with an extensible configuration system and speed-optimized architectural design. Methabot is the web crawler of Methanol.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Spider is web crawler written in the Java.Based on an Regular expression string the spider parses the internet for web pages matching this string and stores it in an MYSQL database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    studiMaps is a web based application for visualization and analysis of social networks. It consists of two software components: a web-crawler for getting data and the web based application for visualization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB