Showing 72 open source projects for "document indexing"

View related business solutions
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Nitrite Database

    Nitrite Database

    NoSQL embedded document store for Java

    Nitrite is an embedded NoSQL database for Java applications, offering lightweight document storage with indexing and query capabilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    SemTools

    SemTools

    Semantic search and document parsing tools for the command line

    SemTools is an open-source command-line toolkit designed for document parsing, semantic indexing, and semantic search workflows. The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    RavenDB

    RavenDB

    ACID Document Database

    A NoSQL document database designed for high-performance, real-time applications with built-in distributed capabilities.
    Downloads: 3 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    LightRAG

    LightRAG

    "LightRAG: Simple and Fast Retrieval-Augmented Generation"

    LightRAG is a lightweight Retrieval-Augmented Generation (RAG) framework designed for efficient document retrieval and response generation. It is optimized for speed and lower resource consumption, making it ideal for real-time applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    PoloDB

    PoloDB

    PoloDB is an embedded document database

    PoloDB is an embedded document-oriented NoSQL database that provides MongoDB-like functionality in a lightweight package, ideal for local storage in applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    bleve

    bleve

    A modern text indexing library for go

    ...Bleve includes general-purpose analyzers as well as pre-built text analyzers for the following languages, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani, Spanish, Swedish, Thai, and Turkish. Support for aggregating facet information across search results. Supported facet types include Terms Facet, Numeric Range Facet, and Date Range Facet. By indexing your data with bleve you gain the ability to compose query types such as Term, Phrase, Match, Match Phrase, Prefix, Conjunction, Disjunction, Boolean, Numeric and Date Ranges, as well as Query String. Industry standard tf-idf scoring with query time boosting. Includes support for highlighting matching text within document fragments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    fess

    fess

    Open source enterprise search server for websites, files, and data

    ...It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. It supports indexing and searching across many document formats including office documents, PDFs, and compressed archives. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Kivik

    Kivik

    Common interface to CouchDB or CouchDB-like databases for Go

    Kivik is a Go client library for interacting with CouchDB and PouchDB databases, providing an abstraction layer for NoSQL document storage and retrieval. It simplifies database operations for Go developers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    ODMantic

    ODMantic

    Sync and Async ODM (Object Document Mapper) for MongoDB

    Odmantic is an Object-Document Mapper (ODM) for MongoDB, designed for Python applications using Pydantic models, providing a seamless integration with type safety and validation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Morphia

    Morphia

    MongoDB object-document mapper in Java

    MongoDB Object Document Mapping for the JVM. Bidirectional mapping to and from the database. Transparently map your Java entities to MongoDB documents and back.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    WeKnora

    WeKnora

    LLM framework for document understanding and semantic retrieval

    WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware responses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    RAG API

    RAG API

    ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

    rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    OCRBase

    OCRBase

    MD/.JSON Document OCR and structured data extraction API

    ...The core output is designed for downstream automation, producing structured results like JSON according to user-defined schemas while also providing readable formats like Markdown for human review or indexing. It includes real-time job progress updates via WebSockets, which makes it easier to integrate into UIs, dashboards, or ingestion systems where users need feedback on long-running document processing.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    shuyuan

    shuyuan

    Reading book source

    ...The name suggests “academy” or “study hall,” and the tool aims to help users ingest, organize, and manage reading content — possibly offering features like text parsing, annotation, metadata generation, translation, or storage for later reference. The repository is set up to support document ingestion, indexing, and maybe some AI-aided summarization or lookup functions, which helps users convert large text corpora into a structured, searchable knowledge base. For learners, researchers, or avid readers, Shuyuan offers a way to bridge from plain text files or eBooks into a manageable, interactive resource — one where notes, references, and reading progress can be tracked. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    ccls

    ccls

    C/C++/ObjC language server supporting cross references & hierarchies

    Document symbols and approximate search of workspace symbol. Hover information. Diagnostics and code actions (clang FixIts). Semantic highlighting and preprocessor skipped regions.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    PageIndex is an innovative open-source framework that reimagines retrieval-augmented generation (RAG) by eliminating conventional vector similarity search and instead building hierarchical semantic indexes that mirror a document’s natural structure. Rather than chunking text and embedding it into a vector database, PageIndex constructs a tree-structured index — similar to a detailed, AI-enhanced table of contents — that a large language model can traverse to locate the most relevant sections...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    All-in-RAG

    All-in-RAG

    Big Model Application Development Practice 1

    ...The repository provides a structured learning path that covers both theoretical foundations and practical implementation steps for RAG systems. It explains the full development pipeline required to create knowledge-aware AI assistants, including data preparation, document indexing, vector embedding generation, and retrieval strategies. The project also explores advanced topics such as hybrid retrieval methods, query optimization, and evaluation techniques for improving system accuracy. Alongside theoretical explanations, the repository includes hands-on exercises and example projects that demonstrate how to build production-ready RAG systems. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Pathway AI Pipelines

    Pathway AI Pipelines

    Ready-to-run cloud templates for RAG

    Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    MongoDB Rust Driver

    MongoDB Rust Driver

    The official MongoDB Rust Driver

    ...The crate also includes BSON encoding and decoding that maps cleanly to Rust types, so developers can work with rich document structures while retaining Rust’s performance guarantees.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Kernel Memory

    Kernel Memory

    Research project. A Memory solution for users, teams, and applications

    Kernel Memory is an open-source reference architecture developed by Microsoft to help developers build memory systems for AI applications powered by large language models. The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MCM-ICM

    MCM-ICM

    Mathematical Contest resources

    MCM-ICM is a curated archive of Outstanding Winner (“O-奖/特等奖”) solution papers from the Mathematical Contest in Modeling and the Interdisciplinary Contest in Modeling, spanning the early 2000s through recent years. The repository is organized by year, with per-year folders that collect the top-ranked reports and, in later years, additional materials such as problem statements or problem notes when available. It has evolved from a single-maintainer project into a collaborative effort, with...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    elasticsearc-php

    elasticsearc-php

    PHP low-level client for Elasticsearch

    Introducing Elasticsearch DSL library to provide objective query builder for Elasticsearch bundle and elasticsearch-php client. You can easily build any Elasticsearch query and transform it to an array. This agnostic package is a lightweight wrapper on top of the Elasticsearch PHP client. Its main goal is to allow for easier structuring of queries and indices in your application. It does not want to hide or replace the functionality of the Elasticsearch PHP client. Feature complete, object...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB