For SaaS businesses to monetize payments through its turnkey PayFac-as-a-Service solution.
Exact Payments delivers easy-to-integrate embedded payment solutions enabling you to rapidly onboard merchants, instantly activate a variety of payment methods and accelerate your revenue — delivering an end-to-end payment processing platform for SaaS businesses.
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.
Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
This project is a java webspider (webcrawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
An automated website testing framework. Includes a utility to spider a site to determine content and a variety of testing plugins to ensure the content complies to validity and accessibility. A report is then generated with the results of the test.
Accurately convert voice to text in over 125 languages and variants by applying Google's powerful machine learning models with an easy-to-use API.
New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.
MuSE-CIR is a Multigram-based Search Engine and Collaborative Information Retrieval system. Written in Java /JSP, supports any JDBC connectable database - thoroughly tested only with OracleXE, and somewhat with MySQL, JSP on Apache Tomcat 5.5
Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can
Network monitoring and troubleshooting is hard. TotalView makes it easy.
This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol.
This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.
Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.
The DeDuplicator is an add-on module (plug-in) for the webcrawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
LogCrawler is an ANT task for automatic testing of web applications. Using a HTTP crawler it visits all pages of a website and checks the server logfiles for errors. Use it as a "smoketest" with your CI system like CruiseControl.
WebNews Crawler is a specific webcrawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
This project will provide a tool for users to get a better understanding of the content and structure of an existing website. It will do this by providing a customised webspider as well as extensions to the GUESS graph visualisation application.
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a webcrawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.
A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.
Robust featureful multi-threaded CLI webspider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.