Showing 23 open source projects for "crawling"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Reach Your Audience with Rise Vision, the #1 Cloud Digital Signage Software Solution Icon
    Reach Your Audience with Rise Vision, the #1 Cloud Digital Signage Software Solution

    K-12 Schools, Higher Education, Businesses, Restaurants

    Rise Vision is the #1 digital signage company, offering easy-to-use cloud digital signage software compatible with any player across multiple screens. Forget about static displays. Save time and boost sales with 500+ customizable content templates for your screens. If you ever need help, get free training and exceptionally fast support.
    Learn More
  • 1
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ...ACHE also automatically learns how to prioritize links in order to efficiently locate relevant content while avoiding the retrieval of irrelevant pages. While ACHE was originally designed to perform focused crawls, it also supports other crawling tasks, including crawling all pages in a given web site and crawling Dark Web sites (using the TOR protocol).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    YaCy Peer-to-Peer Search Engine

    YaCy Peer-to-Peer Search Engine

    Decentralized Web Search Engine

    YaCy is a free search engine that anyone can use to build search the internet (www and ftp) or to create a search portal for others (internet or intranet). The scale of YaCy is limited only by the number of users and can index billions of web pages. In p2p mode it is fully decentralized, all users of the search engine network are equal and it is not possible for anyone to censor the content of the distributed index.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Web Book Downloader

    Web Book Downloader

    Download websites as e-book: pdf, txt, epub.

    This application allows user to download chapters from website in 3 ways: - from table of contents; - from range: first chapter address, last chapter address; - by crawling from first chapter to n; In settings you can customize language, input(website encoding) for simplicity output is in the same encoding. If you want your language add new class into strings package, and new fields into Settings class and GUI menu(initialize method).
    Downloads: 2 This Week
    Last Update:
    See Project
  • Dun and Bradstreet Risk Analytics - Supplier Intelligence Icon
    Dun and Bradstreet Risk Analytics - Supplier Intelligence

    Use an AI-powered solution for supply and compliance teams who want to mitigate costly supplier risks intelligently.

    Risk, procurement, and compliance teams across the globe are under pressure to deal with geopolitical and business risks. Third-party risk exposure is impacted by rapidly scaling complexity in domestic and cross-border businesses, along with complicated and diverse regulations. It is extremely important for companies to proactively manage their third-party relationships. An AI-powered solution to mitigate and monitor counterparty risks on a continuous basis, this cutting-edge platform is powered by D&B’s Data Cloud with 520M+ Global Business Records and 2B+ yearly updates for third-party risk insights. With high-risk procurement alerts and multibillion match points, D&B Risk Analytics leverages best-in-class risk data to help drive informed decisions. Perform quick and comprehensive screening, using intelligent workflows. Receive ongoing alerts of key business indicators and disruptions.
    Learn More
  • 5
    Firing Range

    Firing Range

    Firing Range is a test bed for web application security scanners

    ...The project doesn’t just include simple XSS forms; it spans variants such as DOM-based issues, context-sensitive sinks, template mishandling, CSRF, open redirects, and mixed content problems. Each scenario is crafted to reflect how bugs appear in production—behind frameworks, in odd encodings, or across redirects—so scanners must demonstrate accurate crawling and context understanding. Because the behaviors are stable and documented, teams can run comparative tests over time and quantify regression or improvement in their pipelines. It’s equally useful for human training, giving analysts a safe playground to practice exploitation and triage skills.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages within ics domain. visit function is called after the content of a URL is downloaded successfully. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The Teachingbox uses advanced machine learning techniques to relieve developers from the programming of hand-crafted sophisticated behaviors of autonomous agents (such as robots, game players etc...) In the current status we have implemented a well founded reinforcement learning core in Java with many popular usecases, environments, policies and learners. Obtaining the teachingbox: FOR USERS: If you want to download the latest releases, please visit:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    Excel file analysis toolkit

    Lays out dependencies (links) between Excel files in a visual graph

    Do you have a haystack (legacy) of Excel files that reference each other and you can't see clear anymore? This tools crawls Excel files and builds up a visual graph. It shows which files might be outdated because they reference a file which was modified more recently. The graph is dynamic, you can choose between different layout algorithms and rearrange by hand. Allows you to quickly open the file, dump the graph, dump all links within a file, highlight all the cells that reference other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    WebCollector

    WebCollector is an open source web crawler framework based on Java.

    WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
    Downloads: 0 This Week
    Last Update:
    See Project
  • BoldTrail Real Estate CRM Icon
    BoldTrail Real Estate CRM

    A first-of-its-kind homeownership solution that puts YOU at the center of the coveted lifetime consumer relationship.

    BoldTrail, the #1 rated real estate platform, is built to power your entire brokerage with next-generation technology your agents will use and love. Showcase your unique brand with customizable websites for your company, offices, and every agent. Maximize lead capture with a modern, portal-like consumer search experience and intelligent behavior tracking. Hyper-local area pages, home valuation pages and options for rich lifestyle data keep customers searching with your brokerage as the local experts. The most robust lead gen tools on the market help your brokerage, teams & agents effectively drive new business - no matter their budget. Empower your agents to generate free leads instantly with our simple to use landing pages & IDX squeeze pages. Drive more leads with higher quality and lower cost through in-house tools built within the platform. Diversify lead sources with our automated social media posting, integrated Google and Facebook advertising, custom text codes and more.
    Learn More
  • 10
    Linkcrawler

    Linkcrawler

    Capable to "Crawl" a site and return a report of all links from it

    Java Desktop application capable to "Crawl" a site and return a report of the status of all the link present at the page, then it moves to another internal page and so on. LinkCrawlers provides a nice HTML5 report with the information of all link per WebPage, Easy to Read. This tool is useful for Web QA testers
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    ...It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    Jurpe

    Role Playing Game Engine

    JURPE (Java Universal Role Playing Engine) is a Java API to support the writing of new computer RPG games based on a skill based, three dice RPG system. It comes with a full featured game demo of a medieval warrior crawling into a monster filled dungeon
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Java examples for information retrievals covering themes like indexing, search, ranking, information extraction, regular expressions or crawling based on libraries such as Lucene. It provide support for learning information retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    HITSearchEngine

    You can quickly find want what you need in HIT campus

    HIT campus search engine system which clusters your search results into topics. The user will quickly find what you are looking for.​ This system includes four parts: 1. Web crawling 2. HTML parsing 3. Indexing 4. Searching. We use the open source software "Hetrix" as our cralwer, use the Lucene to build the index. In order to quickly find what you are looking for, we use carrot2 to help us cluster the search results into topics. We also write a script to fetch the websites in the campus everyday and update the index automatically.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    WikiCrawler allows a means for crawling through wiki websites (specifically Wikipedia). Currently I am looking for applications for this. Please give me your suggestions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Zero Player dungeon crawling game with possibility to program our own AI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Ex-Crawler
    Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Agent based Regional Crawler strategy implementation - gathers users' common needs and interests in a certain domain. It crawls based on these interests, instead of crawling the web without any predefined order.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Spidertron is a multithreaded web crawling API for web sites of moderate size (hundreds of thousands of pages) that allows you to focus not on the crawling but on processing of the information retreived.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    JCrawler is a perfect cralwing/load-testing tool which is cookie-enabled and follows human crawling pattern (hit/second).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A command line utility for automatic crawling and downloading of files on the internet
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Luanium

    A Lua-based crawling scripting language and leveraging selenium

    ...The trick here is to add the crawling commands into the Lua interpreter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next