Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "document library semantic search"

x

Sort By:

Relevance

OS

Windows 98
Linux 92
Mac 86
More...
BSD 54
ChromeOS 51
Desktop Operating Systems 2
Mobile Operating Systems 1

Category

Artificial Intelligence 53
Internet 27
Software Development 17
Business 16
Formats and Protocols 13
Scientific/Engineering 12
System 12
Education 10
Database 8
Text Editors 6
Communications 5
Multimedia 2
Security 2
Desktop Environment 1
Terminals 1

License

OSI-Approved Open Source 102
Other License 2
Creative Commons Attribution License 1
Public Domain 1

Translations

English 19
French 3
German 3
Italian 3
More...
Spanish 3
Chinese (Simplified) 2
Czech 2
Dutch 2
Arabic 1
Brazilian Portuguese 1
Chinese (Traditional) 1
Croatian 1
Hebrew 1
Japanese 1
Korean 1
Norwegian 1
Polish 1
Russian 1
Slovak 1
Slovene 1
Swedish 1
Turkish 1
Vietnamese 1

Programming Language

Java 39
Python 29
JavaScript 19
C++ 10
More...
PHP 9
TypeScript 6
Go 5
C 3
C# 3
Scala 3
Delphi/Kylix 2
JSP 2
Rust 2
Unix Shell 2
ActionScript 1
Groovy 1
Objective C 1
PL/SQL 1
Ruby 1
Visual Basic .NET 1
XSL (XSLT/XPath/XSL-FO) 1
Yacc 1

Status

Beta 19
Production/Stable 18
Alpha 9
Pre-Alpha 7
More...
Planning 6
Mature 2
Inactive 1

Showing 110 open source projects for "document library semantic search"

View related business solutions

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

Open Semantic Search

Open source semantic search and text analytics for large document sets

Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources.

Downloads: 4 This Week

Last Update: 12 hours ago
See Project
2

SemTools

Semantic search and document parsing tools for the command line

SemTools is an open-source command-line toolkit designed for document parsing, semantic indexing, and semantic search workflows. The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. ...

Downloads: 8 This Week

Last Update: 2026-03-13
See Project
3

Semantra

Multi-tool for semantic search

Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
4

RAG API

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.

Downloads: 3 This Week

Last Update: 2026-03-20
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 6 This Week

Last Update: 2025-07-01
See Project
6

Search-Index

A persistent, network resilient, full text search library

Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.

Downloads: 8 This Week

Last Update: 2025-03-12
See Project
7

Memvid

Video-based AI memory library. Store millions of text chunks in MP4

Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.

Downloads: 66 This Week

Last Update: 2026-03-13
See Project
8

paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx

...The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate metadata such as tags, titles, and categories automatically. It supports advanced workflows where documents can be analyzed contextually, enabling features like semantic search, summarization, and automated classification pipelines. The platform is particularly useful for individuals and organizations managing large volumes of paperwork, such as invoices, contracts, or records, as it reduces the need for manual data entry.

Downloads: 3 This Week

Last Update: 2026-03-19
See Project
9

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

...A key capability is its use of retrieval-augmented generation, which enables semantic search and natural language interaction across an entire document archive. Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.

Downloads: 4 This Week

Last Update: 2026-03-17
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

WeKnora

LLM framework for document understanding and semantic retrieval

WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware responses. ...

Downloads: 6 This Week

Last Update: 2 days ago
See Project
11

Kernel Memory

Research project. A Memory solution for users, teams, and applications

...The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
12

Haystack

Haystack is an open source NLP framework to interact with your data

Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! ...

Downloads: 16 This Week

Last Update: 2026-04-01
See Project
13

KnowNote

A local-first AI knowledge base & NotebookLM alternative

KnowNote is a local-first, open-source AI knowledge base and notebook application created as an Electron-based alternative to Google NotebookLM that emphasizes privacy, control, and simplicity. It lets users build an intelligent, searchable knowledge base from uploaded documents such as PDFs, Word files, PowerPoints, and web pages, and then interact with that content using LLM-powered chat, summarization, and reasoning tools. Unlike many NotebookLM alternatives that rely on Docker or cloud...

Downloads: 10 This Week

Last Update: 2026-01-30
See Project
14

Cherche

Neural Search

Search is fully compatible with the collaborative filtering library Implicit. It is advantageous if you have a history associated with users and you want to retrieve / re-rank documents based on user preferences.

Downloads: 6 This Week

Last Update: 2024-06-01
See Project
15

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to...

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
16

PageIndex

Document Index for Vectorless, Reasoning-based RAG

...This reasoning-driven retrieval aligns more naturally with how humans explore complex texts, improving relevance and traceability, especially in professional domains like financial reports, legal contracts, and technical manuals. The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.

Downloads: 0 This Week

Last Update: 6 days ago
See Project
17

PandaWiki

AI-powered open source platform for building intelligent wiki bases

PandaWiki is an open source knowledge base system designed to help users build intelligent documentation platforms powered by large language models. It combines traditional wiki functionality with modern AI capabilities, allowing teams and individuals to create and manage product documentation, technical manuals, FAQs, and blog-style knowledge resources. PandaWiki provides tools for managing knowledge bases through an administrative interface while also generating public-facing wiki sites...

Downloads: 9 This Week

Last Update: 6 days ago
See Project
18

SimpleMem

SimpleMem: Efficient Lifelong Memory for LLM Agents

SimpleMem is a lightweight memory-augmented model framework that helps developers build AI applications that retain long-term context and recall relevant information without overloading model context windows. It provides easy-to-use APIs for storing structured memory entries, querying those memories using semantic search, and retrieving context to augment prompt inputs for downstream processing. Unlike monolithic systems where memory management is ad-hoc, SimpleMem formalizes a memory lifecycle—write, index, retrieve, refine—so applications can handle user history, document collections, or dynamic contextual state systematically. It supports customizable embedding models, efficient vector indexes, and relevance weighting, making it practical for building assistants, personal agents, or domain-specific retrieval systems that need persistent knowledge.

Downloads: 1 This Week

Last Update: 2026-04-03
See Project
19

RAG from Scratch

Demystify RAG by building it from scratch

RAG From Scratch is an educational open-source project designed to teach developers how retrieval-augmented generation systems work by building them step by step. Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
20

ChatGPT Retrieval Plugin

The ChatGPT Retrieval Plugin lets you easily find personal documents

The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge...

Downloads: 1 This Week

Last Update: 2025-10-02
See Project
21

SAG

SQL-Driven RAG Engine

SAG is an open-source SQL-driven retrieval-augmented generation engine that dynamically constructs knowledge graphs during query processing. Instead of relying on a static knowledge graph prepared in advance, the system automatically builds relational structures between entities while processing user queries. Documents are first decomposed into atomic semantic events, which are then represented using multidimensional natural language vectors. These vectors allow the system to identify...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
22

Node.js Client For NLP Cloud

NLP Cloud serves high performance pre-trained or custom models

This is the Node.js client (with Typescript types) for the NLP Cloud API. NLP Cloud serves high-performance pre-trained or custom models for NER, sentiment analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, text generation, question answering, automatic speech...

Downloads: 8 This Week

Last Update: 2024-11-27
See Project
23

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

ModernBERT is an open-source research project that modernizes the classic BERT encoder architecture by incorporating recent advances in transformer design, training techniques, and efficiency improvements. The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
24

Elastiknn

Elasticsearch plugin for nearest neighbor search

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity searches using exact and approximate algorithms. Methods like word2vec and convolutional neural nets can convert many data modalities (text, images, users, items, etc.) into numerical vectors, such that pairwise distance computations on the vectors correspond to semantic similarity of the original data. Elasticsearch is a ubiquitous search solution, but its support for vectors is limited. This plugin fills the...

Downloads: 7 This Week

Last Update: 1 day ago
See Project
25

marqo

Tensor search for humans

A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...

Downloads: 5 This Week

Last Update: 2026-04-02
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

rag

medical

search engine offline html

document search engine

chatgpt

web database server

windows search free

mobi ebook reader

indexing

full-text search

Related Categories

Artificial Intelligence

Internet

Software Development

Business

Formats and Protocols

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise