Search Results for "document search engine" - Page 4

Showing 1175 open source projects for "document search engine"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning. Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.
    Downloads: 53 This Week
    Last Update:
    See Project
  • 2
    Robin

    Robin

    AI-powered tool for dark web OSINT search and investigation

    Robin is an AI-powered open source tool designed to assist investigators and researchers in conducting dark web OSINT (Open Source Intelligence) investigations. It combines automated dark web search capabilities with large language models (LLMs) to analyze and summarize information discovered across hidden services and Tor-based search engines. The tool helps refine investigative queries, collect results from multiple dark web sources, and filter relevant intelligence using AI-driven...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 3
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM Community Edition is a free Document Management System (DMS) that helps businesses control the production, storage, management and distribution of electronic documents, boosting effectiveness and productivity. It integrates document management, collaboration and advanced search into one easy-to-use solution, including administration tools for user roles, access control, security levels, activity logs and automation setup.
    Leader badge
    Downloads: 514 This Week
    Last Update:
    See Project
  • 4
    iText

    iText

    iText for Java represents the next level of SDKs for developers

    iText for Java represents the next level of SDKs for developers who want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit, and enhance PDF documents, iText can be a boon to nearly every workflow. iText Suite refers to the complete line of products comprising the open-source iText Core PDF library and its add-ons. The iText Suite is a fully-featured SDK for PDF development that allows you to seamlessly embed extensive PDF functionality into your software or workflows. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Web Archives

    Web Archives

    Browser extension for viewing archived and cached versions of websites

    ...When you grant access only to the current website, access must also be granted to each search engine in order to view search results. A handful of search modes are offered that serve different use cases. The search mode can be set independently for the context menu and the browser toolbar from the extension's options.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MyScaleDB

    MyScaleDB

    A @ClickHouse fork that supports high-performance vector search

    MyScaleDB is an open-source SQL vector database designed for building large-scale AI and machine learning applications that require both analytical queries and semantic vector search. The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. MyScaleDB enables developers to perform vector similarity searches using standard SQL syntax, eliminating the need to learn specialized vector database query languages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SeaGOAT

    SeaGOAT

    local-first semantic code search engine

    SeaGOAT is an open-source semantic code search engine designed to help developers explore and understand large codebases more efficiently. Instead of relying solely on traditional keyword search, it uses vector embeddings to represent the meaning of code and queries, allowing users to perform semantic searches that find relevant code even when the exact keywords are not present.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Elasticsearch MCP Server

    Elasticsearch MCP Server

    A Model Context Protocol (MCP) server implementation

    This MCP server implementation provides interaction capabilities with Elasticsearch and OpenSearch, enabling functionalities such as document searching, index analysis, and cluster management through a set of tools. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Nokogiri

    Nokogiri

    Tool to work with XML and HTML from Ruby

    Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java). Be secure-by-default by treating all documents as untrusted by default. Be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers. "Native...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    bleve

    bleve

    A modern text indexing library for go

    ...Includes support for highlighting matching text within document fragments.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Scira

    Scira

    AI-powered search engine that helps you find information

    Scira is an open source AI-powered search and research assistant designed to provide fast, conversational answers grounded in web and knowledge sources. The project combines a modern web interface with retrieval-augmented generation techniques to deliver responses that are both natural language friendly and evidence oriented. It is built for developers who want to deploy their own Perplexity-style or AI search experience without relying on proprietary hosted services. Scira emphasizes speed,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 14
    Desktop Commander MCP

    Desktop Commander MCP

    AI-powered MCP server for desktop file and terminal automation

    ...It integrates with clients like Claude Desktop to enable AI-driven workflows such as editing files, executing commands, and automating development tasks from a single conversational interface. Desktop Commander MCP builds on top of an MCP filesystem server and enhances it with powerful search, replace, and code editing capabilities tailored for real-world development environments. It allows users to run terminal commands with streaming output, manage long-running processes, and even execute code in memory without saving files. It also supports working with structured and document formats such as Excel, PDF, and DOCX, enabling AI to read, modify, and generate these files directly.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 16
    SILE

    SILE

    The SILE Typesetter — Simon’s Improved Layout Engine

    SILE is a typesetting system; its job is to produce beautiful printed documents. Conceptually, SILE is similar to TeX—from which it borrows some concepts and even syntax and algorithms—but the similarities end there. Rather than being a derivative of the TeX family SILE is a new typesetting and layout engine written from the ground up using modern technologies and borrowing some ideas from graphical systems such as InDesign.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    FlagEmbedding

    FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Thulite

    Thulite

    Web framework designed for speed, security, and SEO

    Thulite is an AI-powered search and recommendation engine that enhances search functionality in applications. It provides intelligent query processing, result ranking, and personalized recommendations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    KnowNote

    KnowNote

    A local-first AI knowledge base & NotebookLM alternative

    KnowNote is a local-first, open-source AI knowledge base and notebook application created as an Electron-based alternative to Google NotebookLM that emphasizes privacy, control, and simplicity. It lets users build an intelligent, searchable knowledge base from uploaded documents such as PDFs, Word files, PowerPoints, and web pages, and then interact with that content using LLM-powered chat, summarization, and reasoning tools. Unlike many NotebookLM alternatives that rely on Docker or cloud...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    gse

    gse

    Go efficient multilingual NLP and text segmentation

    Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others. Gse is implements jieba by golang, and try add NLP support and more feature. Support common, search engine, full mode, precise mode and HMM mode multiple word segmentation modes. Support user and embed dictionary, Part-of-speech/POS tagging, analyze segment info, stop and trim words. Support multilingual: English, Chinese, Japanese and others. Support Traditional Chinese. Support HMM cut text use Viterbi algorithm. Support NLP by TensorFlow (in work). ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Papis

    Papis

    Powerful and highly extensible command-line based document

    Papis is a powerful and highly extensible CLI document and bibliography manager. With Papis, you can search your library for books and papers, add documents and notes, import and export to and from other formats, and much much more. Papis uses a human-readable and easily hackable .yaml file to store each entry's bibliographical data. It strives to be easy to use while providing a wide range of features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Firebird

    Firebird

    Firebird server, client and tools

    ...It has been used in production systems, under a variety of names, since 1981. To enhance the Firebird functionality, IBSurgeon has sponsored the development and now released for public use the free open source "IBSurgeon Full Text Search UDR" to perform full-text search queries within SQL and PSQL. UDR works with Firebird 3 and 4, for Windows, there are ready-to-use binaries, for Linux, it is necessary to build the UDR. The UDR is based on Lucene++ engine, with all the powerful features required for full-text search and with very fast performance (build as native C++ library). ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Docusaurus

    Docusaurus

    Easy to maintain open source documentation websites

    Docusaurus is a project that makes maintaining, building and deploying open source documentation websites incredibly easy. Simple to set up and start, Docusaurus allows you to save time and focus on your documentation. All you have to do is write docs and blog posts with Markdown and Docusaurus will handle the rest of the website build process. Docusaurus comes with pre-configured localization, as well as all the key pages and sections you need to get started. It’s also customizable, so...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Supermemory

    Supermemory

    Memory engine and app that is extremely fast, scalable

    Supermemory is an ambitious and extensible AI-powered personal knowledge management system that aims to help users capture, organize, retrieve, and reason over information in a manner that mimics human memory structures. The platform allows individuals to ingest text, documents, and other content forms, then uses advanced retrieval and embedding techniques to index and relate information intelligently so that users can recall relevant knowledge in context rather than just by keyword match....
    Downloads: 1 This Week
    Last Update:
    See Project