Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Semantic Search Tools
Search Results

Search Results for "document"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 19
Mac 19
Windows 19
More...
BSD 14
ChromeOS 14

Category

Artificial Intelligence 21
Database 1
Internet 1
Security 1
Software Development 1
System 1

License

OSI-Approved Open Source 19

Programming Language

Python 11
JavaScript 3
TypeScript 3
C# 1
More...
Go 1
Java 1
Rust 1
Unix Shell 1

Status

Beta 1

Showing 21 open source projects for "document"

View related business solutions

Semantic Search Clear Filters & Widen Search

Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

SemTools

Semantic search and document parsing tools for the command line

SemTools is an open-source command-line toolkit designed for document parsing, semantic indexing, and semantic search workflows. The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead.

Downloads: 12 This Week

Last Update: 2026-03-13
See Project
2

Open Semantic Search

Open source semantic search and text analytics for large document sets

...It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.

Downloads: 6 This Week

Last Update: 6 days ago
See Project
3

Semantra

Multi-tool for semantic search

...The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs from the command line and automatically launches a local web interface where users can perform interactive searches and examine document passages related to a query. By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
4

RAG API

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

...It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.

Downloads: 2 This Week

Last Update: 2026-03-20
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
5

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 7 This Week

Last Update: 2025-07-01
See Project
6

Kernel Memory

Research project. A Memory solution for users, teams, and applications

...The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
7

KnowNote

A local-first AI knowledge base & NotebookLM alternative

KnowNote is a local-first, open-source AI knowledge base and notebook application created as an Electron-based alternative to Google NotebookLM that emphasizes privacy, control, and simplicity. It lets users build an intelligent, searchable knowledge base from uploaded documents such as PDFs, Word files, PowerPoints, and web pages, and then interact with that content using LLM-powered chat, summarization, and reasoning tools. Unlike many NotebookLM alternatives that rely on Docker or cloud...

Downloads: 6 This Week

Last Update: 2026-01-30
See Project
8

PandaWiki

AI-powered open source platform for building intelligent wiki bases

PandaWiki is an open source knowledge base system designed to help users build intelligent documentation platforms powered by large language models. It combines traditional wiki functionality with modern AI capabilities, allowing teams and individuals to create and manage product documentation, technical manuals, FAQs, and blog-style knowledge resources. PandaWiki provides tools for managing knowledge bases through an administrative interface while also generating public-facing wiki sites...

Downloads: 4 This Week

Last Update: 2026-04-08
See Project
9

Haystack

Haystack is an open source NLP framework to interact with your data

Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! Make use of and compare the latest pre-trained transformer-based languages models like OpenAI’s GPT-3, BERT, RoBERTa, DPR, and more. ...

Downloads: 11 This Week

Last Update: 2026-04-01
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
10

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to...

Downloads: 2 This Week

Last Update: 2026-03-04
See Project
11

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

ModernBERT is an open-source research project that modernizes the classic BERT encoder architecture by incorporating recent advances in transformer design, training techniques, and efficiency improvements. The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
12

SimpleMem

SimpleMem: Efficient Lifelong Memory for LLM Agents

...Unlike monolithic systems where memory management is ad-hoc, SimpleMem formalizes a memory lifecycle—write, index, retrieve, refine—so applications can handle user history, document collections, or dynamic contextual state systematically. It supports customizable embedding models, efficient vector indexes, and relevance weighting, making it practical for building assistants, personal agents, or domain-specific retrieval systems that need persistent knowledge.

Downloads: 0 This Week

Last Update: 2026-04-03
See Project
13

RAG from Scratch

Demystify RAG by building it from scratch

RAG From Scratch is an educational open-source project designed to teach developers how retrieval-augmented generation systems work by building them step by step. Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
14

ChatGPT Retrieval Plugin

The ChatGPT Retrieval Plugin lets you easily find personal documents

The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge...

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
15

Language Models

Explore large language models in 512MB of RAM

...It is particularly useful for educational purposes, as it demonstrates the fundamental mechanics of language model inference and prompt-based applications. The repository includes multiple example applications such as chatbots, document question answering systems, and information retrieval tools.

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
16

Paul Graham GPT

RAG on Paul Graham's essays

Paul Graham GPT is a specialized AI-powered search and chat app built on a corpus of essays from Paul Graham, giving users the ability to query and discuss his writings in a conversational way. The repo stores the full text of his essays (chunked), uses embeddings (e.g. via OpenAI embeddings) to allow semantic search over that corpus, and hosts a chat interface that combines retrieval results with LLM-based answering — enabling RAG (retrieval-augmented generation) over a fixed dataset. The...

Downloads: 0 This Week

Last Update: 2025-12-08
See Project
17

Hugging Face Transformer

CPU/GPU inference server for Hugging Face transformer models

Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....

Downloads: 1 This Week

Last Update: 2022-08-22
See Project
18

Vector AI

A platform for building vector based applications

Vector AI is a framework designed to make the process of building production-grade vector-based applications as quick and easily as possible. Create, store, manipulate, search and analyze vectors alongside json documents to power applications such as neural search, semantic search, personalized recommendations etc. Image2Vec, Audio2Vec, etc (Any data can be turned into vectors through machine learning). Store your vectors alongside documents without having to do a db lookup for metadata...

Downloads: 1 This Week

Last Update: 2023-04-10
See Project
19

DOSE

DOSE: a distributed platform for semantic elaboration that provides semantic services such as automatic annotation of web resources at the document substructure level, semantic search facilities, semantic annotation storage and retrieval.

Downloads: 0 This Week

Last Update: 2013-06-04
See Project
20

askaitools-community-edition

A cutting-edge search engine project tailored specifically for AI apps

Our mission is to revolutionize the way users discover AI products by providing the most accurate, comprehensive, lightning-fast, and intelligent search experience. Developers can effortlessly integrate their own data on top of this framework, enabling them to swiftly build specialized vertical search engines or internal document search systems for their organizations. Under the hood, AskAITools employs a hybrid search engine architecture, seamlessly combining keyword search (full-text search) and semantic search (vector search/embedding search) capabilities. By leveraging statistical data and weighted fusion techniques, it achieves a balance between relevance and popularity. ...

Downloads: 0 This Week

Last Update: 2024-07-18
See Project
21

bge-large-en-v1.5

BGE-Large v1.5: High-accuracy English embedding model for retrieval

...This model is part of the BGE (BAAI General Embedding) family and delivers improved similarity distribution and state-of-the-art results on the MTEB benchmark. It is recommended for use in document retrieval tasks, semantic search, and passage reranking, particularly when paired with a reranker like BGE-Reranker. The model supports inference through multiple frameworks, including FlagEmbedding, Sentence-Transformers, LangChain, and Hugging Face Transformers. It accepts English text as input and returns normalized 1024-dimensional embeddings suitable for cosine similarity comparisons.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project

Previous
You're on page 1
Next

Related Searches

rag

medical

chatgpt

reverse image search text

dose

web database server

Related Categories

Artificial Intelligence

Database

Internet

Security

Software Development

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise