Showing 368 open source projects for "text processing"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 1
    Google AI Edge Gallery

    Google AI Edge Gallery

    A gallery that showcases on-device ML/GenAI use cases

    Gallery is a curated collection of on-device machine learning examples, demo apps, and model artifacts designed to help developers experiment with and deploy ML at the edge. The project bundles runnable samples that show how to run TensorFlow Lite/Edge TPU models (and similar lightweight runtimes) on mobile and embedded platforms, demonstrating common tasks like image classification, object detection, audio recognition, and pose estimation. Each sample is intended to be both a learning aid...
    Downloads: 220 This Week
    Last Update:
    See Project
  • 2
    TRIBE v2

    TRIBE v2

    A multimodal model for brain response prediction

    ...TRIBE v2 allows researchers to simulate and analyze brain activity without requiring direct human experiments. Overall, it provides a powerful tool for studying perception, cognition, and multimodal processing in the brain.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 6
    Supertonic

    Supertonic

    Lightning-fast, on-device TTS, running natively via ONNX

    ...Supertonic is designed to handle real-world text gracefully, including numbers, dates, currency symbols, abbreviations, and technical units, without requiring heavy pre-processing or custom text normalization. The repository provides complete reference implementations across many programming ecosystems—Python, Node.js, browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS, Rust, and Flutter.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 8
    BettaFish

    BettaFish

    Public opinion analysis system

    ...Unlike simpler analytics tools, BettaFish employs agent collaboration and a “forum” style internal mechanism to combine diverse model outputs, making the analysis richer and more robust. It also integrates multimodal processing, enabling it to parse images and video alongside text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    AUTOMATIC1111 Stable Diffusion web UI
    ...The interface also supports prompt editing, batch processing, custom scripts, and many community extensions, making it a highly customizable and continually evolving platform for creative AI art generation.
    Downloads: 171 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Keras Hub

    Keras Hub

    Pretrained model hub for Keras 3

    Keras Hub is a repository of pre-trained models for Keras 3, offering a collection of ready-to-use models for various machine-learning tasks. KerasHub is an extension of the core Keras API; KerasHub components are provided as Layer and Model implementations. If you are familiar with Keras, congratulations. You already understand most of KerasHub.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Label Sleuth

    Label Sleuth

    Open source no-code system for text annotation and building of text

    An open-source no-code system for text annotation and building text classifiers. No AI knowledge needed. From task definition to working model in just a few hours! While domain experts label their data, Label Sleuth automatically trains in the background-appropriate machine learning models. To avoid wasted labeling effort, Label Sleuth employs active learning techniques to guide the user in what they should be labeled next.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    stt

    stt

    Voice Recognition to Text Tool

    stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Omi

    Omi

    AI that sees your screen and listens to conversations

    ...At its core, omi uses a pipeline of speech-to-text systems, large language models, and memory storage services to transform raw audio and context into meaningful outputs like tasks and reminders. The architecture is modular and extensible, featuring APIs, SDKs, and plugin-like capabilities that allow developers to build custom applications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”)...
    Downloads: 126 This Week
    Last Update:
    See Project
  • 15
    LightAutoML

    LightAutoML

    Fast and customizable framework for automatic ML model creation

    LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Chinese-XLNet

    Chinese-XLNet

    Chinese XLNet pre-trained model

    Chinese-XLNet is a Chinese language pre-trained model based on the XLNet architecture, providing an advanced foundation for natural language processing tasks in Mandarin and other Chinese dialects. Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations. This model is trained on large-scale Chinese text datasets to learn linguistic patterns, long-range dependencies, and semantic nuance typical of Chinese writing, making it useful for tasks like text classification, question answering, named entity recognition, and language generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Vectorize MCP Server

    Vectorize MCP Server

    Official Vectorize MCP Server

    The Vectorize MCP Server is a Model Context Protocol server that integrates with Vectorize, offering advanced vector retrieval and text extraction capabilities. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Hugging Face - Speech To Speech

    Hugging Face - Speech To Speech

    Open speech-to-speech models and pipelines by Hugging Face toolkit AI

    This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    TextWorld

    TextWorld

    ​TextWorld is a sandbox learning environment for the training

    TextWorld is a learning environment designed to train reinforcement learning agents to play text-based games, where actions and observations are entirely in natural language. Developed by Microsoft Research, TextWorld focuses on language understanding, planning, and interaction in complex, narrative-driven environments. It generates games procedurally, enabling scalable testing of agents’ natural language processing and decision-making abilities.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Milvus Bootcamp

    Milvus Bootcamp

    Dealing with all unstructured data, such as reverse image search

    Milvus Bootcamp is a collection of tutorials, examples, and best practices for using Milvus, an open-source vector database designed for AI-powered similarity search and retrieval applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    paperless-gpt

    paperless-gpt

    Use LLMs and LLM Vision (OCR) to handle paperless-ngx

    paperless-gpt is an AI-powered extension for document management systems that enhances the capabilities of paperless-ngx by integrating large language models and vision-based OCR to automate document processing and organization. It is designed to transform scanned or uploaded documents into structured, searchable, and intelligently categorized data without requiring manual tagging or sorting. The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate metadata such as tags, titles, and categories automatically. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    LocalAI

    LocalAI

    The free, Open Source alternative to OpenAI, Claude and others

    ...It acts as a drop-in replacement for APIs such as OpenAI, enabling developers to build AI-powered applications without relying on external cloud services. The platform supports a wide range of model types, including text generation, image creation, speech processing, and embeddings. LocalAI can run on consumer-grade hardware and does not necessarily require a GPU, making it accessible for local development and private deployments. It integrates with multiple backends like llama.cpp, transformers, and diffusers to support different AI workloads. With its self-hosted architecture and OpenAI-compatible API, LocalAI enables developers to build secure, local-first AI applications.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 25
    OpenAI Privacy Filter

    OpenAI Privacy Filter

    Bidirectional token-classification model for identifiable info

    OpenAI Privacy Filter is an open-weight machine learning model designed to detect and mask personally identifiable information in text with high efficiency and contextual awareness. It operates as a bidirectional token classification system that labels sensitive data in a single forward pass rather than generating text sequentially, enabling fast processing for large datasets. The model supports long-context inputs, allowing it to analyze extensive documents without chunking, which improves consistency in redaction tasks. ...
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB