Showing 521 open source projects for "text based"

View related business solutions
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    natural

    natural

    General natural language facilities for node

    "Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported. It’s still in the early stages, so we’re very interested in bug reports, contributions and the like. Note that many algorithms from Rob Ellis’s node-nltools are being merged into this project and will be maintained from here...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    llama.vim

    llama.vim

    Vim plugin for LLM-assisted code/text completion

    llama.vim is a lightweight Vim plugin that integrates large language model capabilities directly into the Vim text editor. The plugin enables developers to access AI-assisted text and code completion features without leaving their terminal-based development environment. Instead of relying on remote AI services, the plugin is designed to work with locally running LLM inference engines such as llama.cpp. This approach allows developers to benefit from AI-assisted coding features while maintaining full control over their data and avoiding external API dependencies. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    ...Chandra can be run locally using transformer-based inference or deployed with a high-performance server setup for large-scale processing. It also includes command-line tools and optional web-based interfaces to simplify interaction and batch processing workflows.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    LiteParse

    LiteParse

    A fast, helpful, and open-source document parser

    LiteParse is an open-source lightweight parsing library designed to extract structured data from unstructured text using large language models in an efficient and cost-effective manner. It focuses on simplifying the process of turning raw text into structured outputs such as JSON by providing a streamlined interface for prompt-based parsing. The system is designed to minimize overhead, making it suitable for applications where performance and cost are critical considerations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GenAIScript

    GenAIScript

    Automatable GenAI Scripting

    JavaScript-ish environment with convenient tooling for file ingestion, prompt development, and structured data extraction. A Microsoft tool that generates AI-powered text based on prompts, useful for content creation and automation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    dLLM

    dLLM

    dLLM: Simple Diffusion Language Modeling

    dLLM is an open-source framework designed to simplify the development, training, and evaluation of diffusion-based large language models. Unlike traditional autoregressive models that generate text sequentially token by token, diffusion language models generate text through an iterative denoising process that refines masked tokens over multiple steps. This approach allows models to reason over the entire sequence simultaneously and potentially produce more coherent outputs with bidirectional context. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    VideoCaptioner

    VideoCaptioner

    AI-powered tool for generating, optimizing, and translating subtitles

    VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps. After transcription, large language models...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. This bridges the gap between modern multimedia content (podcasts, YouTube videos, interviews) and traditional written content, making cross-format publishing more efficient. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    OpenAI Translator

    OpenAI Translator

    Browser extension and cross-platform desktop app based on ChatGPT API

    ...Our tool allows for mutual translation, polishing and summarization across 55 different languages. Streaming mode is supported! It allows users to customize their translation text. One-click copying, Text-to-Speech (TTS). Available on all platforms (Windows, macOS, and Linux) for both browsers and Desktop.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    Spark TTS is an open-source, PyTorch-based text-to-speech inference system that leverages large language models to produce highly natural, intelligible speech from text input. It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Hollama

    Hollama

    A minimal LLM chat app that runs entirely in your browser

    ...Because the application runs as a static web interface, it does not require complex backend infrastructure and can be easily deployed or self-hosted. Hollama supports both text-based and multimodal interactions, allowing users to work with models that process images as well as text. The interface includes features for editing prompts, retrying responses, copying generated code snippets, and storing conversation history locally within the browser. Mathematical expressions can be rendered using KaTeX, and Markdown formatting allows code blocks and structured outputs to appear clearly within conversations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    TextWorld

    TextWorld

    ​TextWorld is a sandbox learning environment for the training

    TextWorld is a learning environment designed to train reinforcement learning agents to play text-based games, where actions and observations are entirely in natural language. Developed by Microsoft Research, TextWorld focuses on language understanding, planning, and interaction in complex, narrative-driven environments. It generates games procedurally, enabling scalable testing of agents’ natural language processing and decision-making abilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    AutoSubs

    AutoSubs

    Instantly generate AI-powered subtitles on your device

    ...It supports both standalone usage and integration with professional video editing software such as DaVinci Resolve, allowing creators to generate and edit subtitles within their existing workflows. The tool leverages speech-to-text models, including OpenAI Whisper, to produce high-quality transcriptions and can differentiate between speakers using diarization techniques. Users can customize subtitle styling, adjust timing, and export results in multiple formats, making it suitable for content creators, filmmakers, and editors. AutoSubs is designed with performance in mind, offering efficient processing through a Rust-based backend and supporting multiple operating systems including Windows, macOS, and Linux.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 22
    Flowly AI

    Flowly AI

    Flowly is 100x faster than OpenClaw

    ...It features a multi-agent architecture where different specialized agents can collaborate, delegate tasks, and operate in parallel. Flowly also includes voice capabilities, enabling real-time phone interactions using speech-to-text and text-to-speech systems. Overall, it provides a powerful, extensible, and privacy-focused alternative to cloud-based AI assistants.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 23
    RealtimeTTS

    RealtimeTTS

    Converts text to speech in realtime

    RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2, Edge TTS, Google TTS, system TTS and others, so you can swap providers without rewriting your pipeline. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MedGemma

    MedGemma

    Collection of Gemma 3 variants that are trained for performance

    ...The multimodal versions pair a SigLIP-based image encoder pre-trained on diverse de-identified medical imaging data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    repo2txt

    repo2txt

    Web-based tool converts GitHub repository contents

    repo2txt is an open-source developer tool that converts the contents of a code repository into a single structured text file that can be easily consumed by large language models. The tool is designed to address the challenge of analyzing entire codebases with AI assistants, where code is normally distributed across many files and directories. By collecting repository contents and formatting them into a single text document, repo2txt allows developers to feed complete projects into AI systems...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB