Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.
Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
A general purpose source code indexer and cross-referencer that provides web-based browsing of source code with links to the definition and usage of any identifier. Supports multiple languages. Up-to-date information in http://lxr.sourceforge.net
**CAUTION!** Releases are now issued on Codeberg due to legal reasons. See https://codeberg.org/ajlittoz/CB_LXRsource/releases
panFMP is a generic framework suitable for harvested XML metadata that is searchable through Apache Lucene without any additional RDBMS. Fields can be defined by XPath allowing for full text queries on all types of fields including numerical ranges.
The code was moved to Github: https://github.com/pangaea-data-publisher/panfmp
cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
Framework for search and display of heterogenous document collections.
NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates.
Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.
Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
Project moved to GitHub!
https://github.com/carrot2/carrot2
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. Carrot2 integrates very well with both Open Source and proprietary search engines.
Simple application for downloading pictures from Zerochan.net
Simplejava application for downloading high-quality pictures from Zerochan.net.
You can find images by size or a tag. It's simple. And flat.
All you need to do: download .jar file and run it with Oracle JVM
(or any another JVM supporting image decoding)
Auto Rescanning - Search Terms - Regularly Updated With New Features
==========
NOTE: (AS OF 11/05/2015)
4chan html structure has changed, full images are downloaded as well as the thumbnail. Fix coming shortly (after my exams are over) to stop the thumbnails from downloading.
==========
This is the first release of my 4chan image downloader. This downloader packs loads of great features such as the search ability. Check the features section and be sure to let me know if you want a feature added.
Coming Soon:
- Wiki, explaining in depth how to...
SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform that uses reasoning to semantically integrate disparate data and services on the web. Running live at http://sswap.info.
Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.
Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
Java/Swish-e bridge. This application is built arround a simple API and a Web container to provide access to the search facility (via web-services) and management/indexing (wep app).
The WhereIsNow Web Service Client Library project is a java library used to query the WhereIsNow webservices. You can freely embed it in your code to easily develop new clients and integrate the WhereIsNow features in your own applications.
Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.
Command line application written in Java useful for automation of downloading process and filtering contents of downloaded files. jDownloader uses simple script file to configure downloading and filtering processes.
JxtASK is a P2P system that is aimed to search, download and share academic content hosted on websites that will join the JxtASK community. Joining is simple: siteadmins must generate(even automatically)a XML catalog which describes the files.
Lude is an XML-RPC Lucene Daemon written in Java. Clients in any environment can create indexes, add/update/delete documents, and query the index through a simple XML-RPC API.
Panda Publisher is an easy way to run a website. Panda Publisher is different from other CMS's as it's main goals are to be light-weight, fast, standards compliant, meta-data rich and simple to use.
Create Content, not Code.
Group-CCS development Components, templates, tools, accessories, tutorial, modules, translations, documentation, codes, scripts, everything that can improve the work of who uses the powerful tool of development, CCS - CodeCharge Studio.
The goal of this project is to develop a fast, simple, robust and fully JCR (JSR-170) compliant Content Repository on top of a number of RDBMS.
A dual-licensed CMS, Mosaďka-CMS, will be developped on top of this repository by Logyka Technologies.
Develop a java API (JAR library, with an example web GUI) for content management. Simple but powerful, based on Apache Lucene project, it would be embeded on projects requiring content management.
This code supplies miniature pedagogical Java implementations of information retrieval, spidering, and text-processing software. It was initially developed for an introductory course on Intelligent Information Retrieval and Web Search in UT Austin.
Bookmark.inc is a PHP class that provides the essential functions to manage a bookmark. It provides an (multilingual) administrator interface, and allows the personalization of the bookmark visualization through a simple template system.