Elasticsearch is a distributed, RESTful search and analytics engine that lets you store, search and analyze with ease at scale. It lets you perform and combine many types of searches; it scales seamlessly, and offers answers incredibly fast with search results you can rank based on a variety of factors.
Elasticsearch can be used for a wide variety of use cases, from maps and metrics to site search and workplace search, and with all data types.
Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
Digital Learning Sciences (DLS) is a mission-centered, not-for-profit organization dedicated to improving learning through the use of digital content and tools.
This package contains different tools to add NLP capabilities for Lucene 4.x (it has been tested using Lucene version from 4.6.x to 4.8.1). Although it was originally developed for German, it is, mostly, language independent.
It allows the user to lemmatize words to be indexed, to weight termy ba their parts of speech (e.g. weighting nouns mor hevaily than pronouns), and to add synonyms taken from GermaNet or a list you provide to the search index and thereby increase recall of lucene.
Full SEO (Search Engine Optimization) app for Android.
Packing all necessary tools used for SEO including URL submitter, Search Engine Ping tool, Site Stability Analyzer, All Major Webmaster Toolkits (Google, Bing, Yahoo, Yandex, Baidu), Keyword Density Checker, Rank Checkers, Backlink Builders, Sitemap Submitter Social Media Metrics & Analysis tools, Mass traffic generator and Blogging tools & tips.
CoinSparc delivers the power of a fully fledged PC application to your smartphone.
This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
A collection of Java Servlets relating to searching. Use of these servlets should make future transitions between search appliances less painful as well as simplify the query parameters.
Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.
Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
The Cornell Web Lab Collaboration Server is a suite of tools and services for GUI-based extraction, analysis and sharing of archived web data. See http://weblab.infosci.cornell.edu/ and http://www.cs.cornell.edu/~weigel for details about the project.
(Almost) all a scholar in the Humanities needs (polytonic Greek fonts, stylistic and metrical analysis tools, search engines on TLG and PHI) concentrated in only one Linux Live CD, ready to use everywhere at home or at University, without installation
JLinkCheck is an Ant Task written in Java for checking links in websites. It is not just checking one single page, but crawling a whole site like a spider, generating a report in XML and (X)HTML. JReptator will be its succesor with many more features
Dr. Micheal Kay: "Saxon 8.7 is the first release to be released simultaneously by Saxonica on the Java and .NET platforms." MDP: Mission accomplished! Saxon for the .NET platform from Saxonica is now available and supported via the http://saxon.sf.net
i-Tor is a set of Tools and Technologies for Open Repositories, based on Linux, Java, MySQL, Mirage and other free components. It harvests OAI and turns databases into Open Archives. It includes similarity, backlinks and related search based on Lucene.
TM4J is a topic map engine implemented entirely in Java. Topic maps are a standard paradigm for the interchange of knowledge structures. This project aims to produce a complete suite of tools for creating, processing and publishing topic map information.
Group-CCS development Components, templates, tools, accessories, tutorial, modules, translations, documentation, codes, scripts, everything that can improve the work of who uses the powerful tool of development, CCS - CodeCharge Studio.
jCV is a powerful multilingual Web application designed for creating, searching and printing resumes.
jCV is 100% developed in Java using "best-of-breed" Open Source J2EE
frameworks (SOFIA) and reporting tools (JasperReports, iReport).
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
This project is a Dmoz RDF parser and utilities to allow you to manipulate, display, and navigate the Dmoz RDF data on your web site. It will make use of software at jakarta.apache.org and xml.apache.org to display the data and will attempt to tightly int
Voambolana (pronouce VOO-BOO-LUH-NUH) is an on-line dictionary that converts foreign languages to a native language. Voambolana uses SAX parser and XSLT transformer. The tools used includes Ant, Xerces, Xalan (XNI) and Apache from the Apache Group.
The BeeGram library is a portable open source search engine toolkit written in C. BeeGram provides a number of building blocks for the construction of powerful general-purpose text-based search tools.