document index free download

Showing 77 open source projects for "document index"

View related business solutions

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
1

Search-Index

A persistent, network resilient, full text search library

Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.

Downloads: 0 This Week

Last Update: 2025-03-12
See Project
2

Elasticsearch MCP Server

A Model Context Protocol (MCP) server implementation

This MCP server implementation provides interaction capabilities with Elasticsearch and OpenSearch, enabling functionalities such as document searching, index analysis, and cluster management through a set of tools.

Downloads: 0 This Week

Last Update: 2026-05-22
See Project
3

PageIndex

Document Index for Vectorless, Reasoning-based RAG

...The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.

Downloads: 1 This Week

Last Update: 2026-06-22
See Project
4

Sphinx

Main repository for the Sphinx documentation builder

...HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. Easy definition of a document tree, with automatic links to siblings, parents and children. General index as well as a language-specific module index. Automatic highlighting using the Pygments highlighter. Automatic testing of code snippets, the inclusion of docstrings from Python modules (API docs), and more.

Downloads: 10 This Week

Last Update: 2025-12-31
See Project
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
5

Sonic

Fast, lightweight & schema-less search backend

Sonic is a super fast and lightweight, schema-less search backend that can be used in place of super-heavy and full-featured search backends like Elasticsearch. It is able to normalize language search queries, auto-complete search queries and offer the most relevant results. Being an identifier index rather than a document index, when queried it provides IDs that can be used to refer to matched documents in an external database.

Downloads: 1 This Week

Last Update: 2 days ago
See Project
6

LlamaParse

Parse files for optimal RAG

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.

Downloads: 2 This Week

Last Update: 2026-02-13
See Project
7

bleve

A modern text indexing library for go

Import one package, build an index with three lines of code, query for documents with another three lines. Bleve includes general-purpose analyzers as well as pre-built text analyzers for the following languages, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani, Spanish, Swedish, Thai, and Turkish.

Downloads: 1 This Week

Last Update: 2026-04-30
See Project
8

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats.

Downloads: 4 This Week

Last Update: 2025-07-24
See Project
9

RAG API

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.

Downloads: 0 This Week

Last Update: 2026-04-21
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
10

ArangoDB JavaScript Driver

The official ArangoDB JavaScript driver

ArangoJS is the official JavaScript client for ArangoDB, a multi-model NoSQL database that supports document, key-value, and graph data models. This client provides a powerful yet simple API to interact with ArangoDB from Node.js or browser-based applications.

Downloads: 0 This Week

Last Update: 2026-06-02
See Project
11

Paperless-ngx

A community-supported supercharged version of paperless

Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Downloads: 17 This Week

Last Update: 2026-04-27
See Project
12

PixelRAG

The beginning of scalable pixel-native search

PixelRAG is a visual retrieval-augmented generation system that searches documents by how they look, not only by the text they contain. It renders web pages, PDFs, and images into screenshot tiles, then performs retrieval over those visual representations. This approach preserves layout, tables, charts, diagrams, infographics, and other visual structure that traditional HTML or text parsing can miss. The project includes tools for rendering, chunking, embedding, indexing, and serving visual...

Downloads: 15 This Week

Last Update: 2026-06-23
See Project
13

SimpleMem

SimpleMem: Efficient Lifelong Memory for LLM Agents

...It provides easy-to-use APIs for storing structured memory entries, querying those memories using semantic search, and retrieving context to augment prompt inputs for downstream processing. Unlike monolithic systems where memory management is ad-hoc, SimpleMem formalizes a memory lifecycle—write, index, retrieve, refine—so applications can handle user history, document collections, or dynamic contextual state systematically. It supports customizable embedding models, efficient vector indexes, and relevance weighting, making it practical for building assistants, personal agents, or domain-specific retrieval systems that need persistent knowledge.

Downloads: 0 This Week

Last Update: 2026-05-21
See Project
14

Kernel Memory

Research project. A Memory solution for users, teams, and applications

Kernel Memory is an open-source reference architecture developed by Microsoft to help developers build memory systems for AI applications powered by large language models. The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
15

elasticsearc-php

PHP low-level client for Elasticsearch

Introducing Elasticsearch DSL library to provide objective query builder for Elasticsearch bundle and elasticsearch-php client. You can easily build any Elasticsearch query and transform it to an array. This agnostic package is a lightweight wrapper on top of the Elasticsearch PHP client. Its main goal is to allow for easier structuring of queries and indices in your application. It does not want to hide or replace the functionality of the Elasticsearch PHP client. Feature complete, object...

Downloads: 1 This Week

Last Update: 2026-05-06
See Project
16

goquery

A little like that j-thing, only in Go

...Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), and detach()) have been left off. Also, because the net/HTML parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this. Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's fmt package), even though some of its methods are less than intuitive (looking at you, index()...).

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
17

AnyTXT Searcher

A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the...

14 Reviews

Downloads: 7,425 This Week

Last Update: 2026-06-15
See Project
18

ccls

C/C++/ObjC language server supporting cross references & hierarchies

...It starts indexing the whole project (including subprojects if exist) parallelly when you open the first file, while the main thread can serve requests before the indexing is complete. Saving files will incrementally update the index. Hierarchies, call (caller/callee) hierarchy, inheritance (base/derived) hierarchy, member hierarchy. Symbol rename. Document symbols and approximate search of workspace symbol. Hover information. Diagnostics and code actions (clang FixIts). Semantic highlighting and preprocessor skipped regions.

Downloads: 6 This Week

Last Update: 2025-08-15
See Project
19

PaperQA2

High accuracy RAG for answering questions from scientific documents

PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
20

LogicalDOC Document Management - DMS

smart and open source document management system

LogicalDOC is both document management and collaboration system. The software is loaded with many functions and allows organizing, index, retrieving, controlling and distributing important business documents securely and safely for any organization and individual. Gone are the days when companies used paper-based processes such as printing, mailing and manual filing of paper documents; our document management system replaces all of this with electronic procedures that allow your organization to reduce costs significantly. ...

36 Reviews

Downloads: 164 This Week

Last Update: 2025-08-11
See Project
21

Ladle

Develop, test and document your React story components faster

Ladle is a drop-in alternative to Storybook. It is a tool for developing and testing your React components in an environment that's isolated and faster than most real-world applications. Ladle also creates an index of your components, so you can easily test them through tools like Playwright. Ladle is compatible with the Component Story Format and Controls. It supports links, themes, right-to-left, source code, a11y (axe), typescript and flow out of the box. Powered by Vite, using esbuild,...

Downloads: 0 This Week

Last Update: 2025-11-04
See Project
22

Create Index from PDF

PDF Indexing Script: Searches PDF for words, records page numbers

This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly. It is designed to be case-insensitive, ensuring that variations in capitalization do not affect the search results.

Downloads: 0 This Week

Last Update: 2025-03-03
See Project
23

marqo

Tensor search for humans

A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
24

Cherche

Neural Search

Cherche allows the creation of efficient neural search pipelines using retrievers and pre-trained language models as rankers. Cherche's main strength is its ability to build diverse and end-to-end pipelines from lexical matching, semantic matching, and collaborative filtering-based models. Cherche provides modules dedicated to summarization and question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines. Search is...

Downloads: 0 This Week

Last Update: 2024-06-01
See Project
25

WA2L/WinTools

End User Tools for Windows.

End user utilities for the Windows operating system. The utilities can be called thru the "Send To" context menu when right-clicking on a file or directory in the explorer or thru the Windows "Start Menu". The package can be 'installed' portable and does not need admin rights. ◆ 𝗨𝗧𝗜𝗟𝗜𝗧𝗜𝗘𝗦 - https://sourceforge.net/projects/wa2l-wintools/files/ → README ◆ 𝗙𝗘𝗔𝗧𝗨𝗥𝗘𝗦 - https://wa2l-wintools.sourceforge.net/html/man1/wintools.1.html -...

Downloads: 21 This Week

Last Update: 3 days ago
See Project