Showing 24 open source projects for "documents"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Semantra

    Semantra

    Multi-tool for semantic search

    ...By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Haystack

    Haystack

    Haystack is an open source NLP framework to interact with your data

    ...Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! Make use of and compare the latest pre-trained transformer-based languages models like OpenAI’s GPT-3, BERT, RoBERTa, DPR, and more. Pick any Transformer model from Hugging Face's Model Hub, experiment, find the one that works. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    txtai

    txtai

    Build AI-powered semantic search applications

    ...Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    ChatGPT Retrieval Plugin

    ChatGPT Retrieval Plugin

    The ChatGPT Retrieval Plugin lets you easily find personal documents

    The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge base. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    SemTools

    SemTools

    Semantic search and document parsing tools for the command line

    ...The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. SemTools can parse documents, build semantic embeddings, and perform similarity searches across datasets, making it useful for research, knowledge management, and AI-assisted coding workflows. The toolkit is designed to work well with modern AI pipelines, particularly those involving large language models that require structured knowledge retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LEANN

    LEANN

    Local RAG engine for private multimodal knowledge search on devices

    ...By recomputing embeddings during queries and using compact graph-based indexing structures, LEANN can maintain high search accuracy while minimizing disk usage. It aims to act as a unified personal knowledge layer that connects different types of data such as documents, code, images, and other local files into a searchable context for language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    ...A key focus is hallucination control: each answer is verified against retrieved context, and responses are reworked when they are not sufficiently grounded in the source documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    UForm

    UForm

    Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

    UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained features but often neglect fine-grained ones. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    QMD

    QMD

    mini cli search engine for your docs, knowledge bases, etc.

    ...Designed to keep all search activity local, it combines classic full-text search techniques with modern semantic features such as vector similarity and hybrid ranking so that queries return not just literal matches but conceptually relevant results. Users can organize content into named collections, embed documents for semantic retrieval, and then perform keyword searches, semantic searches, or hybrid natural-language queries to quickly surface the most useful information across all indexed sources. Because the entire system runs on the user’s machine, privacy is preserved and there’s no risk of exposing sensitive content to outside providers.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Supermemory

    Supermemory

    Memory engine and app that is extremely fast, scalable

    Supermemory is an ambitious and extensible AI-powered personal knowledge management system that aims to help users capture, organize, retrieve, and reason over information in a manner that mimics human memory structures. The platform allows individuals to ingest text, documents, and other content forms, then uses advanced retrieval and embedding techniques to index and relate information intelligently so that users can recall relevant knowledge in context rather than just by keyword match. It often incorporates clustering, semantic search, and summarization modules to reduce cognitive load and surface key ideas, which makes it useful for research, study, writing, and long-term project tracking. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Eigenfocus

    Eigenfocus

    Self-Hosted - Project Management, Planning and Time Tracker

    Eigenfocus is an AI-powered personal knowledge management system that uses embeddings and semantic search to help users organize and retrieve ideas across documents. Designed for researchers and creatives, it enables deep linking between notes and supports querying based on meaning rather than keywords.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    RAG API

    RAG API

    ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

    rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Kernel Memory

    Kernel Memory

    Research project. A Memory solution for users, teams, and applications

    ...It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    mgrep

    mgrep

    A calm, CLI-native way to semantically grep everything, like code

    This project is a modern, semantic search tool that brings the simplicity of traditional command-line grep to the world of natural language and multimodal content, enabling users to search across codebases, documents, PDFs, and even images using meaning-aware queries. Built with a focus on calm CLI experiences, it lets you index and query your local files with semantic understanding, delivering results that are relevant to your intent rather than simple pattern matches, which is especially powerful in large or diverse projects. It also includes features such as background indexing to keep your search index up to date without interrupting your workflow and web search integration to expand the scope of queries beyond local files. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    KnowNote

    KnowNote

    A local-first AI knowledge base & NotebookLM alternative

    KnowNote is a local-first, open-source AI knowledge base and notebook application created as an Electron-based alternative to Google NotebookLM that emphasizes privacy, control, and simplicity. It lets users build an intelligent, searchable knowledge base from uploaded documents such as PDFs, Word files, PowerPoints, and web pages, and then interact with that content using LLM-powered chat, summarization, and reasoning tools. Unlike many NotebookLM alternatives that rely on Docker or cloud deployments, KnowNote runs natively on desktop platforms without complex setup, meaning all data stays local unless the user opts to integrate with self-managed or private LLM APIs. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    PandaWiki

    PandaWiki

    AI-powered open source platform for building intelligent wiki bases

    PandaWiki is an open source knowledge base system designed to help users build intelligent documentation platforms powered by large language models. It combines traditional wiki functionality with modern AI capabilities, allowing teams and individuals to create and manage product documentation, technical manuals, FAQs, and blog-style knowledge resources. PandaWiki provides tools for managing knowledge bases through an administrative interface while also generating public-facing wiki sites...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    RAG from Scratch

    RAG from Scratch

    Demystify RAG by building it from scratch

    ...Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into language model prompts. Each example is written with detailed explanations so that developers can understand the internal mechanics of semantic search and context-aware language generation. The repository emphasizes learning through direct implementation, allowing users to see how each component of the RAG architecture functions independently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    FlagEmbedding

    FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    ...The toolkit provides infrastructure for inference, fine-tuning, evaluation, and dataset preparation, enabling developers to train custom embedding models for specific domains or applications. It also includes reranker models that refine search results by re-evaluating candidate documents using cross-encoder architectures, improving retrieval accuracy in complex queries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CCIL
    A SOA framework for web content classification, clustering and automated interlinking of terms between documents. Will provide an expandable set of services such as semantic search, ranking, retrieval and classification of large scale web resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Vector AI

    Vector AI

    A platform for building vector based applications

    Vector AI is a framework designed to make the process of building production-grade vector-based applications as quick and easily as possible. Create, store, manipulate, search and analyze vectors alongside json documents to power applications such as neural search, semantic search, personalized recommendations etc. Image2Vec, Audio2Vec, etc (Any data can be turned into vectors through machine learning). Store your vectors alongside documents without having to do a db lookup for metadata about the vectors. Enable searching of vectors and rich multimedia with vector similarity search. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB