Search Results for "document search engine" - Page 5

Showing 1175 open source projects for "document search engine"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    ...It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems. This architecture enables cost-efficient storage and elastic scaling for workloads that involve large datasets and complex queries. Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 3
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    carbone

    carbone

    Fast and simple report generator, from JSON to pdf, xslx, docx, odt

    Turn your JSON into PDF, DOCX, XLSX, PPTX, ODS and many more. Fast, Simple and Powerful report generator in any format PDF, DOCX, XLSX, ODT, PPTX, ODS, XML, CSV using templates and your JSON data as input.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    LEANN

    LEANN

    Local RAG engine for private multimodal knowledge search on devices

    LEANN is an open source system designed to enable retrieval-augmented generation (RAG) and semantic search across personal data while running entirely on local devices. It focuses on dramatically reducing the storage overhead typically required for vector search and embedding indexes, enabling efficient large-scale knowledge retrieval on consumer hardware. LEANN introduces a storage-efficient approximate nearest neighbor index combined with on-the-fly embedding recomputation to avoid storing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Social-Analyzer

    Social-Analyzer

    API, CLI, and Web App for analyzing and finding a person's profile

    Social Analyzer is an open source OSINT tool that helps investigators discover and analyze a person’s presence across a very large number of social media platforms. It provides a unified API, CLI, and web interface capable of scanning hundreds or thousands of sites for username matches and related metadata. The project includes modular detection and analysis components that users can enable depending on their investigative needs. It is commonly used in cybersecurity, digital forensics, and...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 7
    Papra

    Papra

    The minimalistic document archiving platform

    Papra is a minimalist document management and archiving platform created to help individuals and teams store, organize, and retrieve digital documents with simplicity and accessibility at its core. Papra provides basic yet essential capabilities like uploading files, managing archives, creating organizations for shared access, and performing full-text searches, all within a responsive and user-friendly interface that works across devices. The project’s focus on long-term storage and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    iText Community for .NET

    iText Community for .NET

    iText for .NET is the .NET version of the iText library

    iText for .NET is the .NET version of the iText library, formerly known as iTextSharp, which it replaces. iText represents the next level of SDKs for developers who want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit, and enhance PDFs.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 9
    Pathway AI Pipelines

    Pathway AI Pipelines

    Ready-to-run cloud templates for RAG

    Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    GooFuzz

    GooFuzz

    OSINT fuzzing tool using Google dorks to find exposed resources

    GooFuzz is an open source security tool designed to perform fuzzing using an OSINT-based approach by leveraging advanced Google search techniques. It is written in Bash and automates the use of Google Dorking queries to discover publicly accessible information related to a target domain. Instead of directly sending requests to the target server, GooFuzz gathers results through search engine indexing, allowing enumeration without leaving traces in the target’s server logs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Scid++

    Scid++

    A collection of patches for the upstream Scid vs PC project.

    Focusing on the application of the Chess Query Language in the problem domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ILSpy

    ILSpy

    .NET Decompiler with support for PDB generation, ReadyToRun, Metadata

    ILSpy is the open-source .NET assembly browser and decompiler. Visual Studio 2022 ships with decompilation support for F12 enabled by default (using our engine v7.1). In Visual Studio 2019, you have to manually enable F12 support. Go to Tools / Options / Text Editor / C# / Advanced and check Enable navigation to decompiled source. C# for Visual Studio Code ships with decompilation support as well. To enable, activate the setting "Enable Decompilation Support. ILSpy is distributed under the...
    Downloads: 337 This Week
    Last Update:
    See Project
  • 13
    RAG Web UI

    RAG Web UI

    RAG Web UI is an intelligent dialogue system based on RAG

    RAG Web UI is an open-source intelligent dialogue system built on retrieval-augmented generation technology, designed to enable users to create AI-powered question answering systems grounded in their own knowledge bases. It combines document retrieval with large language models to provide accurate, context-aware responses based on indexed data rather than generic model knowledge. The platform supports ingestion of multiple document formats, including PDFs, Word files, Markdown, and plain...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Inkdown

    Inkdown

    A WYSIWYG Markdown editor, improve reading and editing experience

    Inkdown (bluestone) is a Markdown reading, editing, and sharing tool. Almost fully compatible with the GitHub Flavored Markdown standard, while extending the Mermaid graphics and Katex formula, supporting light and dark styles, and somewhat different from other WYSIWYG editors, Inkdown does not pursue complete customization. Its core goal is comfortable reading, smooth editing of Markdown, and document sharing in the simplest way possible. As a document publisher, markdown source code mode...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Krixik

    Krixik

    Documentation for the Krixik Python client

    Small/specialized AI models are an oft-necessary complement—or alternative—to "big AI" offerings. However, infrastructure for small AI tends to be underwhelming, so building with specialized AI can be difficult, time-consuming, and even expensive. Iterating with different models, and particularly with different combinations of these models, can thus be rendered unfeasible.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Floki

    Floki

    Floki is a simple HTML parser that enables search for nodes using CSS

    Floki is a simple HTML parser that enables search for nodes using CSS selectors. Floki needs the :leex module in order to compile. Normally this module is installed with Erlang in a complete installation. By default, Floki uses a patched version of mochiweb_html for parsing fragments due to its ease of installation (it's written in Erlang and has no outside dependencies). fast_html is generally faster, according to the benchmarks conducted by its developers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    huggingface_hub

    huggingface_hub

    The official Python client for the Huggingface Hub

    The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    PandaWiki

    PandaWiki

    AI-powered open source platform for building intelligent wiki bases

    PandaWiki is an open source knowledge base system designed to help users build intelligent documentation platforms powered by large language models. It combines traditional wiki functionality with modern AI capabilities, allowing teams and individuals to create and manage product documentation, technical manuals, FAQs, and blog-style knowledge resources. PandaWiki provides tools for managing knowledge bases through an administrative interface while also generating public-facing wiki sites...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    RAG from Scratch

    RAG from Scratch

    Demystify RAG by building it from scratch

    RAG From Scratch is an educational open-source project designed to teach developers how retrieval-augmented generation systems work by building them step by step. Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Korvus

    Korvus

    Korvus is a search SDK that unifies the entire RAG pipeline

    Korvus is an open-source retrieval-augmented generation (RAG) pipeline designed to run entirely inside PostgreSQL, allowing developers to build AI search and knowledge systems directly within a database environment. The project consolidates the typical steps of a RAG pipeline—including embedding generation, document retrieval, reranking, and text generation—into a single query executed within the Postgres ecosystem. By leveraging PostgresML and vector extensions such as pgvector, Korvus eliminates the need for external microservices typically used for AI search architectures, reducing both system complexity and latency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Emacs for You (Emfy)

    Emacs for You (Emfy)

    A dark and sleek Emacs setup for general purpose editing

    This project provides a tiny .emacs file to set up Emacs quickly. This document provides a detailed description of how to set it up and get started with Emacs. Further this project also provides a tiny convenience command named em to start Emacs server and edit files using Emacs server. This helps in using Emacs efficiently. This script and its usage is explained in detail later in the Emacs Server and Emacs Launcher sections. If you are already comfortable with Emacs and only want to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    IPRanges

    IPRanges

    Daily updated lists of cloud, bot, and service IP ranges

    ...It also tracks IP ranges used by search engine bots and automated agents including Googlebot, Bingbot, and OpenAI’s GPTBot. Lists are published in both IPv4 and IPv6 formats and are regularly updated through automated processes to keep the data current. In addition to provider specific lists, the project also offers merged and combined datasets that aggregate ranges from multiple sources into a single file.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Mailspring

    Mailspring

    A faster, leaner email client and fork of Nylas Mail

    Mailspring is a fast and lean mail client that’s a new version of Nylas Mail. It’s maintained by one of Nylas Mail’s original authors, with plenty of great features and enhancements such as a new C++ sync engine that allows it to use 50% less RAM. Heavy dependencies have also been removed and the package manager rewritten for even greater speed. Written in TypeScript with Electron and React, it is cross-platform (macOS, Windows and Linux) and designed to be easy to extend.
    Downloads: 65 This Week
    Last Update:
    See Project
  • 24
    Sync Server

    Sync Server

    Secure, open-source platform for file storage, sharing, collaboration

    ...It provides a sleek web interface where teams or individuals can upload, organize, and share files with fine-grained access permissions, and its security-minded design includes things like multi-factor authentication and role-based controls to help protect sensitive documents. Sync-in supports real-time collaboration through integrations with office editors and activity tracking, and it enhances productivity with deep content search across a variety of file types and comprehensive document management capabilities. The platform is built with TypeScript and Node.js and is suitable for self-hosting on your own infrastructure using Docker or standard Node deployments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    ModernBERT

    ModernBERT

    Bringing BERT into modernity via both architecture changes and scaling

    ModernBERT is an open-source research project that modernizes the classic BERT encoder architecture by incorporating recent advances in transformer design, training techniques, and efficiency improvements. The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces...
    Downloads: 1 This Week
    Last Update:
    See Project