Showing 106 open source projects for "python text"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    LongCat-Image

    LongCat-Image

    Foundation model for image generation

    LongCat-Image is an open-source foundation model for image generation and editing created by the LongCat team at Meituan, designed to deliver high-quality visual outputs while remaining efficient and accessible for developers and researchers. Rather than relying on massive parameter counts typical of many cutting-edge models, LongCat-Image achieves strong photorealism, stable structure, and accurate bilingual (Chinese and English) text rendering with a more compact ~6-billion parameter...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Diffusion Bee

    Diffusion Bee

    Diffusion Bee is the easiest way to run Stable Diffusion locally

    Diffusion Bee is a user-friendly local application designed to make running the Stable Diffusion text-to-image generative model as simple as possible on macOS machines, including both Intel and Apple Silicon. It wraps Stable Diffusion and its dependencies into a one-click installer so users don’t need to manually install Python, drivers, or machine-learning frameworks to generate images. The app runs entirely on the local machine so images are created offline and no user data is sent to external servers unless explicitly chosen, preserving privacy. ...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 3
    Qwen3 Embedding

    Qwen3 Embedding

    Designed for text embedding and ranking tasks

    Qwen3-Embedding is a model series from the Qwen family designed specifically for text embedding and ranking tasks. It builds upon the Qwen3 base/dense models and offers several sizes (0.6B, 4B, 8B parameters), for both embedding and reranking, with high multilingual capability, long‐context understanding, and reasoning. It achieves state-of-the-art performance on benchmarks like MTEB (Multilingual Text Embedding Benchmark) and supports instruction-aware embedding (i.e. embedding task...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    ACE-Step 1.5

    ACE-Step 1.5

    The most powerful local music generation model

    ACE-Step 1.5 is an advanced open-source foundation model for AI-driven music generation that pushes beyond traditional limitations in speed, musical coherence, and controllability by innovating in architecture and training design. It integrates cutting-edge generative techniques—such as diffusion-based synthesis combined with compressed autoencoders and lightweight transformer elements—to produce high-quality full-length music tracks with rapid inference times, capable of generating a...
    Downloads: 67 This Week
    Last Update:
    See Project
  • 6
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    HY-World 2.0

    HY-World 2.0

    A Multi-Modal World Model for Reconstructing, Generating, Simulation

    HY-World 2.0 is a multi-modal world model framework for reconstructing, generating, and simulating navigable 3D worlds from diverse inputs. It accepts text prompts, single-view images, multi-view images, and videos, and produces 3D world representations rather than limiting output to flat video generation. For text and single-image inputs, it generates high-fidelity 3D Gaussian Splatting scenes through a multi-stage pipeline that includes panorama generation, trajectory planning, world...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    VibeVoice-1.5B is Microsoft’s frontier open-source text-to-speech (TTS) model designed for generating expressive, long-form, multi-speaker conversational audio such as podcasts. Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz,...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 10
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FastSD CPU

    FastSD CPU

    Fast stable diffusion on CPU and AI PC

    FastSD CPU is an optimized fork of Stable Diffusion designed to run efficiently on CPUs and devices without dedicated GPUs by leveraging Latent Consistency Models and Adversarial Diffusion Distillation techniques that accelerate inference. It focuses on bringing fast text-to-image generation to mainstream hardware like desktop CPUs, lower-end laptops, or edge devices without requiring high-end graphics processors. The repository contains multiple interfaces including a desktop GUI for simple...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 12
    Stable Diffusion Version 2

    Stable Diffusion Version 2

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion (the stablediffusion repo by Stability-AI) is an open-source implementation and reference codebase for high-resolution latent diffusion image models that power many text-to-image systems. The repository provides code for training and running Stable Diffusion-style models, instructions for installing dependencies (with notes about performance libraries like xformers), and guidance on hardware/driver requirements for efficient GPU inference and training. It’s organized as a...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B),...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HY-World 1.5

    HY-World 1.5

    A Systematic Framework for Interactive World Modeling

    HY-WorldPlay is a Hunyuan AI project focusing on immersive multimodal content generation and interaction within virtual worlds or simulated environments. It aims to empower AI agents with the capability to both understand and generate multimedia content — including text, audio, image, and potentially 3D or game-world elements — enabling lifelike dialogue, environmental interpretations, and responsive world behavior. The platform targets use cases in digital entertainment, game worlds,...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 18
    OpenAI Privacy Filter

    OpenAI Privacy Filter

    Bidirectional token-classification model for identifiable info

    OpenAI Privacy Filter is an open-weight machine learning model designed to detect and mask personally identifiable information in text with high efficiency and contextual awareness. It operates as a bidirectional token classification system that labels sensitive data in a single forward pass rather than generating text sequentially, enabling fast processing for large datasets. The model supports long-context inputs, allowing it to analyze extensive documents without chunking, which improves...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Hunyuan3D 2.0

    Hunyuan3D 2.0

    High-Resolution 3D Assets Generation with Large Scale Diffusion Models

    The Hunyuan3D-2 model, developed by Tencent, is designed for generating high-resolution 3D assets using large-scale diffusion models. This model offers advanced capabilities for creating detailed 3D models, including texture enhancements, multi-view shape generation, and rapid inference for real-time applications. It is particularly useful for industries requiring high-quality 3D content, such as gaming, film, and virtual reality. Hunyuan3D-2 supports various enhancements and is available...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 23
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    WorldGen

    WorldGen

    Generate Any 3D Scene in Seconds

    WorldGen is an AI model and library that can generate full 3D scenes in a matter of seconds from either text prompts or reference images. It is designed to create interactive environments suitable for games, simulations, robotics research, and virtual reality, rather than just static 3D assets. The core idea is that you describe a world in natural language and WorldGen produces a navigable 3D scene that you can freely explore in 360 degrees, with loop closure so that the space remains...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB