Showing 245 open source projects for "crawler"

View related business solutions
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    zSearch is a simple python based crawler and search engine. Raw HTML are stored in bzip2 archives, the index is created using pylucene, and twsited is used to provide internal http server. Results are sent back as XML over HTTP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Crawler.NET is a component-based distributed framework for web traversal intended for the .NET platform. It comprises of loosely coupled units each realizing a specific web crawler task. The main design goals are efficiency and flexibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Course Crawler is an application to compile term-definition pair from multiple web glossaries into a centralized, stable, and searchable location.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GronoSpy is a WWW crawler which tries to extract knowledge based on the data from grono.net - a community portal.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SNT is a search engine for SMB and FTP shares with crawler running on Win32. Web interface is provided for searching files and browsing shares contents. Also provided shared films list with users rates and comments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    DisSearch is a crawler of FTP servers with web interface provided for searching files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Light network file search engine, is a crawler of FTP servers and SMB shares (Windows shares and UNIX systems running Samba). WWW Perl(Mason) interface is provided for searching files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Very simple single-thread crawler for web (http) written in perl. Supports links following rules and collections to grab information of visited pages (regexp based).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Larbin is a Web crawler intended to fetch a large number of Web pages, it should be able to fetch more than 100 millions pages on a standard PC with much u/d. This set of PHP and Perl scripts, called webtools4larbin, can handle the output of Larbin and p
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    JCrawler is a perfect cralwing/load-testing tool which is cookie-enabled and follows human crawling pattern (hit/second).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Fast File Search is a crawler of FTP servers and SMB shares (Windows shares and UNIX systems running Samba). WWW interface is provided for searching files. FFS is similar to FemFind but optimized for speed.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    HtmlTester is a software to test any active/passive web sites. It has Crawler to collect all urls and forms in config file - after that YOU have to type the data to test how your server/pages will react. Run the Tester and see results in the console.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DSE, Distributed Search Engine, is highly scalable open source component-based search engine for crawling and searching of the Web. It incapsulates a crawler, indexer, query manager, web front-end for query.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    WebLoupe is a java-based tool for analysis, interactive visualization (sitemap), and exploration of the information architecture and specific properties of local or publicly accessible websites. Based on web spider (or web crawler) technology.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    WWW Universal Tester is a Java application designed to gather information about WWW. She works as a spider (robot, crawler) and collets information about size of files used on the web, structure of connections between pages, on so on.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The Jobcrawler search engine is a research project in order to index the available applications on the internet. Our mission is to really help people who seek a job or employee on a one to one basis and rule mediators (job agencies) out.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Nomad is tiny but efficient search engine and web crawler. This works very good for searching with in the set of corporate websites on internet and/or intranet's HTML documents or knowledge repositories.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A new Web Crawler including sophisticated searching process especialized by language !
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Web Textual eXtraction Tools C++ Parallel web crawler, noun phrase idenification, Multi-lingual Part of Speech Tagging, Tarjan's Algorithm, Co-RelationShip Mappings...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Larbin is an HTTP Web crawler with an easy interface that runs under Linux. It can fetch more than 5 million pages a day on a standard PC (with a good network).
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB