Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.
Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Google() meets the Matrix. Red Piranha combines Lucene (Searching Ability), XML-RDF (ability to learn), Tomcat (for P2P Power) and Spring (Ease of use) to not only let you find anything, anywhere, but to actually understand what you are looking for.
WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
jGetFile is a command-line scriptable recursive file downloader for the web. Where other downloaders fail, jGetFile succeeds in downloading the files you want with simplicity and ease of use.
list2db reads digested email files generated by the mailman mailing list software and converts them into SQL for a relational database. The project also includes a PHP frontend for users to search and browse archived list emails.
Cross-platform searchable CD-ROM. Vicaya is a search engine and indexing tool for use on a local file system or CDROM, written in Java and based on Apache Nutch, and Tomcat. The goal is to replicate a website on a CD-ROM to be used on any platform.
Java program to extract postings and comments from http://www.livejournal.com (blog) into DB and view/classify/process it. LJ loader. Components to reuse: perl-like, but efficient Web pages scraper, trees analyzer, concurrent scheduler.
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.
Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.
myDbSearcher is a search engine for MySQL Databases. It is written in Java. It scans several tables on different databases. A XMLRPC-Server will give you access to the Index.
Currently it runs on http://www.idowa.de/ueberblick/suche/index_html
Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
Dr. Micheal Kay: "Saxon 8.7 is the first release to be released simultaneously by Saxonica on the Java and .NET platforms." MDP: Mission accomplished! Saxon for the .NET platform from Saxonica is now available and supported via the http://saxon.sf.net
Moonglow is a command-line application which queries data from a variety of sources and formats the results using user-defined Velocity templates. The data sources are generally XML web services, such as Google or Amazon.
Xapian is a Search Engine Library, written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian allows you to easily add advanced indexing and search facilities to your applications. See www.xapian.org for more information.
JoBo is a web site mirroring tool. It has a graphical UI but there is a also command line version. Supports robot exclusion protocol (but this can be disabled)