Showing 633 open source projects for "file text search"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Whoogle Search

    Whoogle Search

    A self-hosted, ad-free, privacy-respecting metasearch engine

    Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile. Autocomplete/search suggestions. POST request search and suggestion queries (when possible).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    MCP Everything Search

    MCP Everything Search

    An MCP server that provides fast file searching capabilities

    Everything Search MCP Server is an MCP server that provides fast file searching capabilities across Windows, macOS, and Linux. On Windows, it utilizes the Everything SDK; on macOS, it leverages the built-in mdfind command; and on Linux, it uses the locate or plocate command. ​
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    MCP Text Editor

    MCP Text Editor

    Provides line-oriented text file editing capabilities

    The MCP Text Editor Server provides line-oriented text file editing capabilities through a standardized API, optimized for integration with Large Language Models (LLMs). It enables efficient partial file access, minimizing token usage while ensuring safe concurrent editing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Text Embeddings Inference

    Text Embeddings Inference

    High-performance inference server for text embeddings models API layer

    Text Embeddings Inference is a high-performance server designed to serve text embedding models efficiently in production environments. It focuses on delivering fast and scalable embedding generation by leveraging optimized inference techniques and modern hardware acceleration. It is built to support transformer-based embedding models, making it suitable for tasks such as semantic search, clustering, and retrieval-augmented systems.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    ...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 61 This Week
    Last Update:
    See Project
  • 7
    clip-retrieval

    clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system

    clip-retrieval is an open-source toolkit designed to build large-scale semantic search systems for images and text by leveraging CLIP embeddings to enable multimodal retrieval. It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    yt-fts

    yt-fts

    Search all of YouTube from the command line

    yt-fts, short for YouTube Full Text Search, is an open-source command-line tool that enables users to search the spoken content of YouTube videos by indexing their subtitles. The program automatically downloads subtitles from a specified YouTube channel using the yt-dlp utility and stores them in a local SQLite database. Once indexed, users can perform full-text searches across all transcripts to quickly locate keywords or phrases mentioned within the videos. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 9
    kb

    kb

    A minimalist command line knowledge base manager

    ...Each entry in kb can be tagged, categorized, given metadata like author or status, and inspected with full-text search or regex-based grepping, helping users quickly find content even across large knowledge collections. While focused on text content, it also supports non-text artifacts such as PDFs and images, which can still be indexed and referenced, and it integrates with editors specified by the user’s $EDITOR environment variable to make detailed editing seamless.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    marqo

    marqo

    Tensor search for humans

    ...It can seamlessly handle image-to-image, image-to-text and text-to-image search and analytics. Marqo adapts and stores your data in a fully schemaless manner. It combines tensor search with a query DSL that provides efficient pre-filtering. Tensor search allows you to go beyond keyword matching and search based on the meaning of text, images and other unstructured data. Be a part of the tribe and help us revolutionize the future of search. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    qBittorrent RuTracker plugin

    qBittorrent RuTracker plugin

    qBittorrent search engine plugin for rutracker

    qBittorrent RuTracker plugin is a lightweight search engine extension designed to integrate the RuTracker torrent index directly into the qBittorrent client, allowing users to search for torrents without leaving the application interface. The plugin follows qBittorrent’s official search plugin architecture and is implemented as a Python script that communicates with the RuTracker website to retrieve and display search results. By embedding this functionality into the client, it streamlines...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 12
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    EPUB to Audiobook Converter is a tool designed to convert EPUB ebooks into chaptered audiobooks, optimized specifically for Audiobookshelf servers. It reads each chapter from an EPUB file, generates audio using a chosen text-to-speech backend, and outputs separate MP3 files with chapter titles preserved as metadata to make navigation easier. The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. ...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 14
    Tribler

    Tribler

    Privacy enhanced BitTorrent client with P2P content discovery

    ...It introduces built-in anonymity using a Tor-like onion routing network and integrates its own blockchain for economic incentives and trust management. Tribler supports standard torrenting features along with distributed search, self-contained channels, and peer reputation. Its goal is to provide a fully autonomous file-sharing network without relying on external servers, search engines, or trackers.
    Downloads: 50 This Week
    Last Update:
    See Project
  • 15
    MTEB

    MTEB

    MTEB: Massive Text Embedding Benchmark

    Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 16
    novelWriter

    novelWriter

    Open source plain text editor designed for writing novels

    ...All text is saved as plain text files with a meta data header. The core project structure is stored in a single project XML file. Other meta data is primarily saved as JSON files.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 17
    Python Client For NLP Cloud

    Python Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models for NER

    NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, source code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, served through a REST API. You can either use the NLP Cloud pre-trained models, fine-tune your own models, or deploy your own models.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    ...This architecture enables cost-efficient storage and elastic scaling for workloads that involve large datasets and complex queries. Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 19
    RecoverPy

    RecoverPy

    Interactively find and recover deleted or overwritten files

    RecoverPy is a powerful tool that leverages your system capabilities to recover lost files. Unlike others, you can not only recover deleted files but also overwritten data. Every block of your partition will be scanned. You can even find a string in binary files.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 20
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a full-text search index, and finally answer the user question with an LLM agent.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 104 This Week
    Last Update:
    See Project
  • 22
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    ...Developers define data transformations and AI operations using computed columns on tables, allowing pipelines to evolve incrementally as new data or models are added. The framework supports multimodal content including images, video, text, and audio, enabling applications such as retrieval-augmented generation systems, semantic search, and multimedia analytics.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    Nicotine+

    Nicotine+

    Graphical client for the Soulseek peer-to-peer network

    Nicotine+ is a free and open-source graphical client for the Soulseek peer-to-peer file-sharing network, popular for its focus on sharing music and niche content. Built in Python with a Qt-based GUI, Nicotine+ offers a lightweight yet feature-rich experience for users looking to share and discover files in a decentralized, user-governed environment. It includes search capabilities, bandwidth throttling, chat rooms, and user-to-user messaging, supporting a vibrant community of digital collectors and music lovers.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 24
    PaddleNLP

    PaddleNLP

    Easy-to-use and powerful NLP library with Awesome model zoo

    PaddleNLP It is a natural language processing development library for flying paddles, with Easy-to-use text area API, Examples of applications for multiple scenarios, and High-performance distributed training Three major features, aimed at improving the modeling efficiency of the flying oar developer's text field, aiming to improve the developer's development efficiency in the text field, and provide rich examples of NLP applications. Provide rich industry-level pre-task capabilities...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    SeaGOAT

    SeaGOAT

    local-first semantic code search engine

    ...The tool runs locally on a developer’s machine and processes repositories using a combination of embedding models and conventional search utilities, enabling both semantic and text-based retrieval methods. By combining vector search with tools like ripgrep, SeaGOAT provides a hybrid approach that supports both natural language queries and precise keyword matching in source files. It is built primarily in Python and is intended to work on common operating systems such as Linux, macOS, and Windows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB