text based free download

Showing 29 open source projects for "text based"

View related business solutions

Search Engines Java Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Elasticsearch

A Distributed RESTful Search Engine

Elasticsearch is a distributed, RESTful search and analytics engine that lets you store, search and analyze with ease at scale. It lets you perform and combine many types of searches; it scales seamlessly, and offers answers incredibly fast with search results you can rank based on a variety of factors. Elasticsearch can be used for a wide variety of use cases, from maps and metrics to site search and workplace search, and with all data types.

Downloads: 10 This Week

Last Update: 2026-05-12
See Project
2

OpenSearch

Open source distributed and RESTful search engine

OpenSearch is a distributed search and analytics engine based on Apache Lucene. After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indices, boost fields, rank results by score, sort results by field, and aggregate results. Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application, think Wikipedia or an online store.

Downloads: 1 This Week

Last Update: 2026-04-04
See Project
3

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 3 This Week

Last Update: 2025-10-27
See Project
4

ftserver-android

Self-hosted search engine with web service to share discoveries with

Full Text Search Engine for Android Mobile, Windows Desktop, Linux Server. You can use the KeyWord to find relative WebSites, dig in important information, search answers. It has a web server inside, use it to share discoveries with people. App's Source Codes included, can be freely distributed over the internet in an unchanged or changed form. Check the file size after downloaded the Android APK. https://sourceforge.net/projects/ftserver-android/files/ The Code Repository...

Downloads: 0 This Week

Last Update: 2023-07-07
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
5

panFMP

panFMP is a generic framework suitable for harvested XML metadata that is searchable through Apache Lucene without any additional RDBMS. Fields can be defined by XPath allowing for full text queries on all types of fields including numerical ranges. The code was moved to Github: https://github.com/pangaea-data-publisher/panfmp

Downloads: 0 This Week

Last Update: 2019-05-01
See Project
6

OpenSearchServer Search Engine

An open source search engine with RESTFul API and crawlers

OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...

31 Reviews

Downloads: 17 This Week

Last Update: 2018-08-26
See Project
7

cpDetector

cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.

Downloads: 5 This Week

Last Update: 2018-04-05
See Project
8

eXtensible Text Framework (XTF)

Framework for search and display of heterogenous document collections.

NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.

Downloads: 0 This Week

Last Update: 2019-07-29
See Project
9

IDRA InDexing & Retrieving Automatically

IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.

Downloads: 0 This Week

Last Update: 2014-05-14
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
10

Infofuze

Data migration/conversion library based on STX and XSLT transformation

Infofuze is a Java library and server application that can be used to transform and combine data from various sources into a specific XML or other text output format that can be stored or indexed.

Downloads: 0 This Week

Last Update: 2014-03-05
See Project
11

ONDEX Suite

Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.

Downloads: 0 This Week

Last Update: 2019-05-15
See Project
12

DocInfoRetriever

DocInfoRetriever is a Web_based document full-text search engine based on lucene. It allows you to search the contents and metadata of documents . Supported document formats, likes doc, xls, pdf, odt, jpg...etc.,and torrent files.

Downloads: 0 This Week

Last Update: 2013-04-02
See Project
13

Kneobase

Kneobase is an enterprise search engine, based upon the Lucene search engine and the Spring framework. It allows to perform full-text search across many different content sources. It is highly adaptable out-of-the-box and has a pluggable architecture.

Downloads: 0 This Week

Last Update: 2016-02-02
See Project
14

GHIRL

GHIRL is the Graph-based Heterogeneous Information Representation Language: a java library for representing, querying, and navigating graph- or network-based data structures.

Downloads: 0 This Week

Last Update: 2013-04-03
See Project
15

Egothor

Egothor is a high-performance, full-featured text search engine written entirely in Java. It is a technology suitable for nearly any application that requires full-text search.

Downloads: 0 This Week

Last Update: 2024-03-08
See Project
16

NGramJ

Provide a robust and efficient implementation of n-gram based classifiers to Java. N-Gram algorithms have shown to be surprisingly good at tasks like guessing the language/encoding from an arbitrary text file. And there are many more applications.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-17
See Project
17

iVia

iVia is an Internet subject portal or virtual library system. As a hybrid expert and machine built collection creation and management system, resources can be crawled and metadata and selected full-text can be automatically generated/extracted.

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
18

Hyper Estraier

Hyper Estraier is a full-text search system. It works as with Google, but based on peer-to-peer architecture. Using Hyper Estraier, we can construct a large-scaled search engine with cheap computers.

Downloads: 5 This Week

Last Update: 2013-04-25
See Project
19

Sentensa

SENTENSA Knowledge Miner is a platform independent tool for searching any text. SENTENSA uses robust methods of indexing and searching text, leveraging on experience from more than 20 years of information retrieval.

Downloads: 0 This Week

Last Update: 2013-04-05
See Project
20

Phorminx

(Almost) all a scholar in the Humanities needs (polytonic Greek fonts, stylistic and metrical analysis tools, search engines on TLG and PHI) concentrated in only one Linux Live CD, ready to use everywhere at home or at University, without installation

Downloads: 0 This Week

Last Update: 2013-04-05
See Project
21

Roosster.org

Roosster.org is a personal "on-demand" search engine. This means, it indexes only items/entries/files/URLs you explicitly tell it to index and provides a full-text-search over indexed items. Goto http://roosster.org/dev for all details.

1 Review

Downloads: 0 This Week

Last Update: 2013-03-07
See Project
22

Sperowider

Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
23

Jorne

The Jorne project develops software and open standards for linking Lojban text with WWW and Semantic Web metadata (e.g. RDF/N3, RSS, XML). Lojban is an artificial spoken and written language based on predicate logic.

Downloads: 0 This Week

Last Update: 2013-03-13
See Project
24

IP.Drilldown

The application will be able to provide further information about the location of a host by analyzing the senders IP address. It works like other localizer software and provides different types of visualisation (map, text).

Downloads: 0 This Week

Last Update: 2013-03-07
See Project
25

webExtractor

webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.

Downloads: 3 This Week

Last Update: 2014-06-26
See Project