Showing 245 open source projects for "crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    haipproxy

    haipproxy

    Distributed proxy IP pool for web crawlers using Scrapy and Redis

    ...HAipproxy aims to maintain a high availability proxy pool with low latency so that scraping frameworks can rotate proxies efficiently and avoid blocking during large-scale data collection. Its architecture supports distributed deployment, allowing multiple crawler workers and validators to run across different machines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    The Teachingbox uses advanced machine learning techniques to relieve developers from the programming of hand-crafted sophisticated behaviors of autonomous agents (such as robots, game players etc...) In the current status we have implemented a well founded reinforcement learning core in Java with many popular usecases, environments, policies and learners. Obtaining the teachingbox: FOR USERS: If you want to download the latest releases, please visit:...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Gecco

    Gecco

    Lightweight Java web crawler framework with jQuery-style extraction

    Gecco is a lightweight web crawler framework written in Java that simplifies the process of building web scraping applications. It is designed to make crawler development straightforward by allowing developers to extract page elements using jQuery-style selectors rather than complex parsing logic. It integrates several well-known Java libraries and frameworks, including tools for HTTP requests, HTML parsing, JSON processing, and application development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    diskover

    diskover

    File system crawler and disk space usage software

    diskover is a file system crawler and disk space usage software that uses Elasticsearch to index your file metadata. diskover crawls and indexes your files on a local computer or remote storage server over network mounts. diskover helps manage your storage by identifying old and unused files and give better insights into data change "hotfiles", file duplication "dupes" and wasted space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    YouSeer is an open source search engine framework, which was built on top of other open source components. It’s part of the general SeerSuite framework. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Blind digger

    crawler manager

    blind-digger is project that integrate crawler's (imacro,selenum) with tool that control and manage it include ml controler and dynamic user interface by winbatch
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    lightcrawler

    lightcrawler

    Website crawler that audits site pages automatically with Lighthouse

    ...This allows developers to audit multiple pages of a site automatically instead of manually running Lighthouse on each individual page. Lightcrawler supports configuration through a JSON configuration file, enabling users to customize how the crawler operates and which Lighthouse audits should be executed. Settings such as crawl depth and the number of concurrent browser instances can be configured to control how aggressively the crawler scans a site. It was created as a developer utility to help identify issues across an entire website more efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Perl Web Scraping Project

    Perl Web Scraping Project

    Perl Web Scraping Project

    Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. Web scraping a web page involves fetching it and extracting from it.[1][2] Fetching is the downloading of a page (which a browser does when you view the page). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Catberry

    Catberry

    Catberry is an isomorphic framework

    ...The entire architecture of the framework is built using the Service Locator pattern, which helps to manage module dependencies and create plugins, and Flux, for the data layer. Search crawler receives a full page from the server. The whole state of the application is restored from URL. Server-side progressive rendering based on node.js streams and parallel rendering of components in a browser. The framework is well-tested (code coverage is about 90%) and it is already used in production.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    OpenWebSpider
    OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    DHT

    DHT

    BitTorrent DHT Protocol && DHT Spider.

    ...Set MaxNodes and BlackListMaxSize to fit yourself. DHT aims to implement the standard BitTorrent DHT protocol, not born for crawling the DHT network. NAT Traversal issue. You run the crawler in a local network. It will block ip which looks bad and a good ip may be misjudged.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    sourcegreed

    a java-based crawler

    a java-based crawler
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    WebCrawler

    get web page. include html、css and js files

    This tool is for the people who want to learn from a web site or web page,especially Web Developer.It can help get a web page's source code.Input the web page's address and press start button and this tool will find the page and according the page's quote,download all files that used in the page ,include css file and javascript files. The html file's name will be 'index.html' and other file's will use it's source name. Note:only support windows platform and http protocol.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Pathfinder Wiki-fr Crawler

    Pathfinder Wiki-fr Crawler

    Tous les sorts, les monstres, les dons et les objets magiques en VF

    Toutes les infos viennent du http://www.pathfinder-fr.org/Wiki/Pathfinder-RPG.MainPage.ashx Le logiciel permet aussi la création de liste de sorts détaillé, d'exportation de de chaque type de données.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    frsi

    Fast Remote SVN Info

    ...Windows Users: This tool requires the subversion command line tools: https://sourceforge.net/projects/win32svn/ Credits: Subversion https://subversion.apache.org win32svn https://sourceforge.net/projects/win32svn/ fast-svn-crawler https://sourceforge.net/projects/fastsvncrawler/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ToroSearch Search Engine
    ...You can add websites of your search engine or pages of your website, and you can search for websites on your own search machine or you can search for pages of your website. ATTENTION: This is not a crawler. It just lists websites or pages. Originally I hosted it myself, and nobody knew the source code. But now I don't have the time anymore to host and program it myself. And on SourceForge anyone can see it and change it for himself. I am still working on this project, so don't worry, I am still fixing errors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    WebCollector

    WebCollector is an open source web crawler framework based on Java.

    WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Products of the project: Java HTMLParser - VietSpider Web Data Extractor - Extractor VietSpider News. Click on "Show project details" to see more feature about each product.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    go_spider

    go_spider

    An awesome Go concurrent Crawler(spider) framework

    An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Node Crawler

    Node Crawler

    Web Crawler/Spider for NodeJS + server-side jQuery

    Most powerful, popular and production crawling/scraping package for Node, happy hacking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    KGP TnP Crawler

    KGP TnP Crawler

    Access Tnp Notices over internet

    This script solely written to crawl over notices of Training and placement center, Kharagpur. NOTE : Windows smart filter may block this exe. If it does click on more info , a new tab will show up beside ok, namely Run Anyway. In case of any enquiry or suggestion , Drop a mail to writetomansa@live.com
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB