Showing 110 open source projects for "file text search"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Whoogle Search

    Whoogle Search

    A self-hosted, ad-free, privacy-respecting metasearch engine

    Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile. Autocomplete/search suggestions. POST request search and suggestion queries (when possible).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. ...
    Downloads: 68 This Week
    Last Update:
    See Project
  • 4
    CineCLI

    CineCLI

    CineCLI is a cross-platform command-line movie browser

    ...CineCLI also supports paginated results and filters so users can navigate large search outputs without overwhelming their screens. Because it runs entirely from the command line, it’s ideal for developers, movie enthusiasts in headless environments, or anyone who prefers text-based tools over web browsers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    SentenceTransformers

    SentenceTransformers

    Multilingual sentence & image embeddings with BERT

    SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 7
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Toot

    Toot

    toot - Mastodon CLI & TUI

    Toot is a CLI and TUI tool for interacting with Mastodon instances from the command line.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    repren

    repren

    Rename anything

    Repren is a “rename anything” command-line tool that performs regex-based search and replace across file contents while also renaming or moving files and directories according to patterns. It’s meant for sweeping refactors: change a class or package name everywhere and update filenames to match in one pass. The design favors explicitness and safety, providing dry-run output so you can preview exactly what will change before executing it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    TextDistance

    TextDistance

    Compute distance between sequences

    ...TextDistance show benchmarks results table for your system and saves libraries priorities into the libraries.json file in TextDistance's folder. This file will be used by text distance for calling the fastest algorithm implementation. Default libraries.json is already included in the package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    PageIndex is an innovative open-source framework that reimagines retrieval-augmented generation (RAG) by eliminating conventional vector similarity search and instead building hierarchical semantic indexes that mirror a document’s natural structure. Rather than chunking text and embedding it into a vector database, PageIndex constructs a tree-structured index — similar to a detailed, AI-enhanced table of contents — that a large language model can traverse to locate the most relevant sections of long documents. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Codespell

    Codespell

    Check code for common misspellings

    Codespell is a lightweight, open-source spell checker designed specifically for detecting and correcting common misspellings in source code, documentation, and text files. Unlike traditional spell checkers, Codespell is optimized for codebases, ensuring that it correctly identifies and suggests fixes for typographical errors without introducing false positives. It integrates easily into CI/CD pipelines, enabling developers to maintain clean and professional code and documentation. By...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    sqlite-utils

    sqlite-utils

    Python CLI utility and library for manipulating SQLite databases

    ...As a CLI, it lets you build databases from structured data in one line, run queries against local files or in-memory databases, output results as JSON, CSV, or pretty tables, and configure full-text search. As a library, it exposes high-level APIs for inserting records, creating or transforming tables, normalizing schemas, and running migrations that SQLite’s limited ALTER TABLE cannot handle directly. The project also embraces an ecosystem of plugins, so you can add custom SQL functions, extra commands, or UIs (including a terminal UI) via separate packages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Meta Package Manager

    Meta Package Manager

    Wraps all package managers with a unifying CLI

    ...MPM is like yt-dlp, but for package managers instead of videos. MPM solves XKCD #1654 - Universal Install Script. List installed packages. List duplicate installed packages. Search for packages. Install a package, remove a package, and list outdated packages. Sync local package infos. Upgrade all outdated packages. Backup list of installed packages to TOML file. Restore/install list of packages from TOML files. Pin-point commands to a subset of package managers (include/exclude selectors). Support plain, versioned, and purl package specifiers. ...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 17
    Django Two-Factor Authentication

    Django Two-Factor Authentication

    Complete Two-Factor Authentication for Django

    ...Built on top of the one-time password framework django-otp and Django's built-in authentication framework django.contrib.auth for providing the easiest integration into most Django projects. Inspired by the user experience of Google's Two-Step Authentication, allowing users to authenticate through call, text messages (SMS), by using a token generator app like Google Authenticator or a YubiKey hardware token generator (optional). If you run into problems, please file an issue on GitHub, or contribute to the project by forking the repository and sending some pull requests. The package is translated into English, Dutch and other languages. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    PyGitHub

    PyGitHub

    Typed interactions with the GitHub API v3

    PyGitHub is a Python library to access the GitHub REST API. This library enables you to manage GitHub resources such as repositories, user profiles, and organizations in your Python applications. PyGithub is a Python library to use the Github API v3. With it, you can manage your Github resources (repositories, user profiles, organizations, etc.) from Python scripts. Should you have any question, any remark, or if you find a bug, or if there is something you can do with the API but not with...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    Each larger open source project has its own style guide, a series of conventions on how to write code for the project (sometimes more arbitrary). When all the code maintains a consistent style, it is more important when understanding large code bases. easy. The meaning of "style" covers a wide range, from "variables use camelCase" to "never use global variables" to "never use exceptions". The English version of the project maintains the programming style guidelines used in Google. If the...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    Papis

    Papis

    Powerful and highly extensible command-line based document

    Papis is a powerful and highly extensible CLI document and bibliography manager. With Papis, you can search your library for books and papers, add documents and notes, import and export to and from other formats, and much much more. Papis uses a human-readable and easily hackable .yaml file to store each entry's bibliographical data. It strives to be easy to use while providing a wide range of features. And for those who still want more, Papis makes it easy to write scripts that extend its features even further.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    sqlmap

    sqlmap

    Automatic SQL injection and database takeover tool

    sqlmap is a powerful, feature-filled, open source penetration testing tool. It makes detecting and exploiting SQL injection flaws and taking over the database servers an automated process. sqlmap comes with a great range of features that along with its powerful detection engine make it the ultimate penetration tester. It offers full support for MySQL, Oracle, PostgreSQL, Microsoft SQL Server, Microsoft Access, IBM DB2, SQLite, Firebird, and many other database management systems. It also...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    go1pylib

    go1pylib

    go1pylib is a Python library designed to control the Go1 robot

    go1pylib is a Python library designed to control the Go1 robot by Unitree Robotics. It provides an easy-to-use interface for robot movement, state management, collision avoidance, battery monitoring, and MQTT communication. Ideal for research and robotics development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    HumbleUI

    HumbleUI

    Clojure Desktop UI framework

    HumbleUI is a lightweight, declarative, and composable UI framework, likely intended for building graphical user interfaces in a minimal, modular way. It emphasizes ease of use, customization, and modular components. (Note: while there is a repository, I did not find a detailed README in my search to fully confirm all capabilities.) Electron is a great landmark. Normal shortcuts, icon, its own window, file system access, notifications, OS integrations. Write once, run everywhere is no longer rejected by users. Performant enough not to noticeably lag.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DocFetcher

    DocFetcher

    Desktop search application

    DocFetcher is an Open Source desktop search application: It allows you to search the contents of files on your computer. — You can think of it as Google for your local files. The application runs on Windows, Linux and Mac OS X.
    Leader badge
    Downloads: 2,899 This Week
    Last Update:
    See Project
  • 25
    GitHub520

    GitHub520

    Community-maintained approach to improving access to GitHub services

    GitHub520 is a community-maintained approach to improving access to GitHub services from regions with network friction by leveraging host mappings. The repository provides a regularly updated list of domain-to-IP entries meant to be appended to a system’s hosts file so certain GitHub endpoints resolve faster or more reliably. It includes scripts or guidance to automate updates, reducing the need for manual lookups when IPs change. The project’s goal is pragmatic: improve developer...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB