Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.
Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
A general purpose source code indexer and cross-referencer that provides web-based browsing of source code with links to the definition and usage of any identifier. Supports multiple languages. Up-to-date information in http://lxr.sourceforge.net
**CAUTION!** Releases are now issued on Codeberg due to legal reasons. See https://codeberg.org/ajlittoz/CB_LXRsource/releases
panFMP is a generic framework suitable for harvested XML metadata that is searchable through Apache Lucene without any additional RDBMS. Fields can be defined by XPath allowing for full text queries on all types of fields including numerical ranges.
The code was moved to Github: https://github.com/pangaea-data-publisher/panfmp
cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
Framework for search and display of heterogenous document collections.
NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates.
Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Project moved to GitHub!
https://github.com/carrot2/carrot2
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. Carrot2 integrates very well with both Open Source and proprietary search engines.
Simple application for downloading pictures from Zerochan.net
Simplejava application for downloading high-quality pictures from Zerochan.net.
You can find images by size or a tag. It's simple. And flat.
All you need to do: download .jar file and run it with Oracle JVM
(or any another JVM supporting image decoding)
Auto Rescanning - Search Terms - Regularly Updated With New Features
==========
NOTE: (AS OF 11/05/2015)
4chan html structure has changed, full images are downloaded as well as the thumbnail. Fix coming shortly (after my exams are over) to stop the thumbnails from downloading.
==========
This is the first release of my 4chan image downloader. This downloader packs loads of great features such as the search ability. Check the features section and be sure to let me know if you want a feature added.
Coming Soon:
- Wiki, explaining in depth how to...
SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform that uses reasoning to semantically integrate disparate data and services on the web. Running live at http://sswap.info.
Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.
Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
Java/Swish-e bridge. This application is built arround a simple API and a Web container to provide access to the search facility (via web-services) and management/indexing (wep app).
The WhereIsNow Web Service Client Library project is a java library used to query the WhereIsNow webservices. You can freely embed it in your code to easily develop new clients and integrate the WhereIsNow features in your own applications.
Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.
Command line application written in Java useful for automation of downloading process and filtering contents of downloaded files. jDownloader uses simple script file to configure downloading and filtering processes.
JxtASK is a P2P system that is aimed to search, download and share academic content hosted on websites that will join the JxtASK community. Joining is simple: siteadmins must generate(even automatically)a XML catalog which describes the files.
Lude is an XML-RPC Lucene Daemon written in Java. Clients in any environment can create indexes, add/update/delete documents, and query the index through a simple XML-RPC API.
Panda Publisher is an easy way to run a website. Panda Publisher is different from other CMS's as it's main goals are to be light-weight, fast, standards compliant, meta-data rich and simple to use.
Create Content, not Code.
Group-CCS development Components, templates, tools, accessories, tutorial, modules, translations, documentation, codes, scripts, everything that can improve the work of who uses the powerful tool of development, CCS - CodeCharge Studio.
The goal of this project is to develop a fast, simple, robust and fully JCR (JSR-170) compliant Content Repository on top of a number of RDBMS.
A dual-licensed CMS, Mosaďka-CMS, will be developped on top of this repository by Logyka Technologies.
Develop a java API (JAR library, with an example web GUI) for content management. Simple but powerful, based on Apache Lucene project, it would be embeded on projects requiring content management.
This code supplies miniature pedagogical Java implementations of information retrieval, spidering, and text-processing software. It was initially developed for an introductory course on Intelligent Information Retrieval and Web Search in UT Austin.
IGLU is a Java class library designed to facilitate sharing of code among Artificial Intelligence/Information Retrieval researchers to illustrate how various problems can be solved in Java. It is developed and maintained by the IGLU Research Group.