Showing 3 open source projects for "webcrawler"

View related business solutions
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages within ics domain. visit function is called after the content of a URL is downloaded successfully. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    This is a simple webcrawler for FaceBook (TM) written in Java. The crawler will surf the public user pages (this means that you do not need to provide ann account) to reconstruct the friendship graph for further studies and analises
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Spidertron is a multithreaded web crawling API for web sites of moderate size (hundreds of thousands of pages) that allows you to focus not on the crawling but on processing of the information retreived.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB