Search Results for "text generation" - Page 2

Showing 574 open source projects for "text generation"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    HY-World 2.0

    HY-World 2.0

    A Multi-Modal World Model for Reconstructing, Generating, Simulation

    HY-World 2.0 is a multi-modal world model framework for reconstructing, generating, and simulating navigable 3D worlds from diverse inputs. It accepts text prompts, single-view images, multi-view images, and videos, and produces 3D world representations rather than limiting output to flat video generation. For text and single-image inputs, it generates high-fidelity 3D Gaussian Splatting scenes through a multi-stage pipeline that includes panorama generation, trajectory planning, world expansion, and world composition. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    Voicebox

    Voicebox

    The open-source voice synthesis studio powered by Qwen3-TTS

    Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the...
    Downloads: 117 This Week
    Last Update:
    See Project
  • 3
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    Z-Image

    Z-Image

    Image generation model with single-stream diffusion transformer

    Z-Image is an efficient, open-source image generation foundation model built to make high-quality image synthesis more accessible. With just 6 billion parameters — far fewer than many large-scale models — it uses a novel “single-stream diffusion Transformer” architecture to deliver photorealistic image generation, demonstrating that excellence does not always require extremely large model sizes.
    Downloads: 55 This Week
    Last Update:
    See Project
  • 6
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers.
    Downloads: 48 This Week
    Last Update:
    See Project
  • 7
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    StarVector is a multimodal foundation model designed for generating Scalable Vector Graphics (SVG) from images or textual descriptions. The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into structured vector representations, enabling high-quality vectorization and design generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Mermaid

    Mermaid

    Diagram and flowchart generation from text similar to markdown

    Mermaid is a JavaScript-based diagram and flowchart generating tool that uses markdown-inspired text for fast and easy generation of diagrams and charts. Forget about using heavy tools to explain your code. Mermaid greatly simplifies documentation with its simple markdown-like script language, and offers a great range of diagram and chart options. The latest version of Mermaid comes with a number of bug fixes and enhancements, as well as a new diagram type, entity relationship diagrams. ...
    Downloads: 105 This Week
    Last Update:
    See Project
  • 9
    Nextra

    Nextra

    Simple, powerful and flexible site generation framework

    Simple, powerful, and flexible site generation framework with everything you love from Next.js. Nextra automatically converts Markdown links and images to use Next.js Link and Next.js Image when possible. No slow navigation or layout shift.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like chatbots, podcasts, and applications where dynamic turn-taking between speakers is essential. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 12
    PHP Client For NLP Cloud

    PHP Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models for NER

    NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each speaker. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    LongCat-Image

    LongCat-Image

    Foundation model for image generation

    ...The model excels at both text-to-image generation and instruction-guided image editing, offering users versatile capabilities for creative and practical tasks—whether generating art, mockups, or adjusting existing visuals with fine control.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Hunyuan3D 2.0

    Hunyuan3D 2.0

    High-Resolution 3D Assets Generation with Large Scale Diffusion Models

    The Hunyuan3D-2 model, developed by Tencent, is designed for generating high-resolution 3D assets using large-scale diffusion models. This model offers advanced capabilities for creating detailed 3D models, including texture enhancements, multi-view shape generation, and rapid inference for real-time applications. It is particularly useful for industries requiring high-quality 3D content, such as gaming, film, and virtual reality. Hunyuan3D-2 supports various enhancements and is available...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 16
    node-llama-cpp

    node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama

    ...The system automatically detects the available hardware on a machine and selects the most appropriate compute backend, including CPU or GPU acceleration. Developers can use the library to perform tasks such as text generation, conversational chat, embedding generation, and structured output generation. Because it runs models locally, the platform is particularly useful for privacy-sensitive environments or offline AI deployments.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    ACE-Step 1.5

    ACE-Step 1.5

    The most powerful local music generation model

    ...Beyond straightforward text-to-music synthesis, ACE-Step 1.5 enables flexible creative workflows, including tasks like cover generation, editing existing tracks, transforming vocals to background accompaniment, and stylistic personalization using low-rank adaptation from just a few example songs.
    Downloads: 82 This Week
    Last Update:
    See Project
  • 18
    Nexa SDK

    Nexa SDK

    Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML

    Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and speech-to-text (ASR), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for function calling and streaming support, and a user-friendly Streamlit UI. Users can run Nexa SDK in any device with Python environment, and GPU acceleration is supported, including CUDA, Metal, and ROCm. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Pocket TTS

    Pocket TTS

    A TTS that fits in your CPU (and pocket)

    Pocket TTS is a lightweight text-to-speech project designed to run efficiently on CPUs, targeting developers who want local speech generation without depending on GPUs or hosted web APIs. It is built to feel practical in everyday applications, where installation and usage should be as simple as adding a dependency and calling a function. The project focuses on keeping the runtime footprint manageable while still producing natural-sounding speech, which makes it attractive for offline tools, prototypes, and privacy-sensitive workflows. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including photorealistic, impressionist, anime, and minimalist aesthetics. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 21
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    Easy Diffusion

    Easy Diffusion

    An easy 1-click way to create beautiful artwork on your PC using AI

    ...Because it’s designed to be easy to install and use, EasyDiffusion’s interface includes options for queuing multiple jobs, applying modifiers like upscaling or face correction, and adjusting generation parameters like guidance scale and resolution.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 23
    AI Notes

    AI Notes

    Curated AI engineering notes on LLMs, generative models, and tools

    ...It is designed to help software engineers quickly understand modern AI concepts, tools, and developments through structured documentation and research notes. It functions as a living knowledge base composed of numerous markdown files that organize topics such as text generation, image generation, AI infrastructure, and code generation models. These notes include observations, references, experiments, and summaries of important research and industry developments in AI. ai-notes also contains collections of prompts, curated learning materials, and categorized resources intended to help developers explore AI capabilities and practical applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Cookbook (Google Gemini)

    Cookbook (Google Gemini)

    Examples and guides for using the Gemini API

    ...It provides a structured learning path with quick-start tutorials for beginners and practical examples for advanced users. The repository covers a wide range of Gemini capabilities, including text, images, video, speech, robotics, and multimodal interactions. It highlights newly introduced features such as Gemini 2.5 models (Flash and Pro), Gemini’s native image generation, Veo for video generation, robotics-focused reasoning models, and Lyria for TTS and music generation. The Cookbook also includes tutorials on advanced API workflows such as grounding answers with external tools, batch-mode request handling, and live multimodal interactivity with LiveAPI. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    vim-ai

    vim-ai

    AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim

    vim-ai is an AI-powered assistant plugin for Vim and Neovim that brings language-model features directly into the editor. It allows users to generate code or text, edit selections in place, and carry on interactive chat-style conversations without leaving the terminal editing environment. The plugin is built around OpenAI-compatible APIs, which means it can work not only with OpenAI itself but also with compatible proxies and alternative providers. Its command set covers text completion, editing, chat continuation, image generation, and debugging utilities, making it more versatile than a narrow autocomplete add-on. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB