Showing 2564 open source projects for "whisper-windows"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    FLUX.1

    FLUX.1

    Official inference repo for FLUX.1 models

    FLUX.1 repository contains inference code and tooling for the FLUX.1 text-to-image diffusion models, enabling developers and researchers to generate and edit images from natural-language prompts using open-weight versions of the model on their own hardware or within custom applications. The project is part of a larger family of FLUX models developed by Black Forest Labs, designed to produce high-quality, detailed visuals from text descriptions with competitive prompt adherence and artistic...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 2
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 3
    MarkItDown

    MarkItDown

    Python tool for converting files and office documents to Markdown

    MarkItDown is a lightweight Python utility developed by Microsoft for converting various files and office documents to Markdown format. It is particularly useful for preparing documents for use with large language models and related text analysis pipelines. ​
    Downloads: 21 This Week
    Last Update:
    See Project
  • 4
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 22 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    Spec Kit

    Spec Kit

    Toolkit to help you get started with Spec-Driven Development

    Spec Kit is an open-source toolkit designed to enable specification-driven development workflows powered by AI coding assistants. It introduces a structured process in which developers define detailed specifications first, then allow AI tools to generate plans, tasks, and implementation code aligned with those requirements. The toolkit provides scaffolding, prompt templates, and automation scripts that help teams maintain a clear source of truth throughout the development lifecycle. By...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 6
    MoneyPrinterTurbo

    MoneyPrinterTurbo

    Generate short videos with one click using AI LLM

    MoneyPrinterTurbo is an AI-driven tool that enables users to generate high-definition short videos with minimal input. By providing a topic or keyword, the system automatically creates video scripts, sources relevant media assets, adds subtitles, and incorporates background music, resulting in a polished video ready for distribution.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 7
    OBLITERATUS

    OBLITERATUS

    OBLITERATE THE CHAINS THAT BIND YOU

    OBLITERATUS is an advanced open-source toolkit designed to analyze and modify the internal behavior of large language models by identifying and removing mechanisms responsible for refusal or restricted responses. It implements a set of techniques collectively referred to as “abliteration,” which target specific internal representations within neural networks to alter how models respond to certain prompts. Unlike traditional fine-tuning approaches, OBLITERATUS operates directly on model...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 8
    ebook2audiobook

    ebook2audiobook

    Generate audiobooks from e-books, voice cloning & 1107+ languages

    ebook2audiobook is a tool to convert legally obtained eBooks (non-DRM) into fully narrated audiobooks, complete with chapters and metadata. It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 9
    OpenMythos

    OpenMythos

    A theoretical reconstruction of the Claude Mythos architecture

    OpenMythos is an experimental, open-source implementation that attempts to reconstruct a hypothesized architecture behind advanced language models using a design called a Recurrent-Depth Transformer. The project explores the idea that instead of stacking hundreds of unique transformer layers, a smaller set of layers can be reused iteratively during inference to achieve deeper reasoning without increasing parameter count. It divides computation into three main stages, including a...
    Downloads: 24 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Hunyuan3D-2.1

    Hunyuan3D-2.1

    From Images to High-Fidelity 3D Assets

    ...Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. Cross-platform support (MacOS, Windows, Linux) via Python / PyTorch, including diffusers-style APIs.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    Animated Drawings

    Animated Drawings

    Code to accompany "A Method for Animating Children's Drawings"

    AnimatedDrawings is a framework that converts user sketches or line drawings into fully animated 2D motion sequences using learned motion priors. The idea is that you draw a simple static figure (stick figure, silhouette, or contour lines), and the system produces plausible skeletal motion (walking, jumping, dancing) that adheres to the drawn shape constraints. The architecture separates shape embedding (to understand user-drawn geometry) from motion embedding / generation (to produce...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 12
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 13
    Qwen3

    Qwen3

    Qwen3 is the large language model series developed by Qwen team

    Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K tokens. It delivers higher quality and more helpful text generation across multiple languages and domains, including mathematics, coding, science, and tool usage. Various quantized versions,...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 14
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 15
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 16
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    ...This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 17
    AutoGPT

    AutoGPT

    Powerful tool that lets you create and run intelligent agents

    AutoGPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, AutoGPT pushes the boundaries of what is possible with AI.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 18
    Open Notebook

    Open Notebook

    An Open Source implementation of Notebook LM with more flexibility

    Open Notebook is an open-source, privacy-focused alternative to Google’s Notebook LM that gives users full control over their research and AI workflows. Designed to be self-hosted, it ensures complete data sovereignty by keeping your content local or within your own infrastructure. The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization. Open Notebook enables users to organize and analyze...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 19
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    EasyOCR

    EasyOCR

    Ready-to-use OCR with 80+ supported languages

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. EasyOCR is a python module for extracting text from image. It is a general OCR that can read both natural scene text and dense text in document. We are currently supporting 80+ languages and expanding. Second-generation models: multiple times smaller size, multiple times faster inference, additional characters and comparable accuracy to the first...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 21
    Kimi Code CLI

    Kimi Code CLI

    Kimi Code CLI is your next CLI agent

    Kimi CLI is a command-line AI agent that brings an intelligent software development assistant directly into your terminal, helping you with coding tasks, shell operations, and workflow automation without leaving your command prompt. It supports an interactive shell-like user interface where you can chat with the agent, request code edits, run shell commands, and receive contextual suggestions as you work, creating a seamless blend of AI-augmented development and traditional terminal usage....
    Downloads: 20 This Week
    Last Update:
    See Project
  • 22
    Qwen3-Coder

    Qwen3-Coder

    Qwen3-Coder is the code version of Qwen3

    Qwen3-Coder is the latest and most powerful agentic code model developed by the Qwen team at Alibaba Cloud. Its flagship version, Qwen3-Coder-480B-A35B-Instruct, features a massive 480 billion-parameter Mixture-of-Experts architecture with 35 billion active parameters, delivering top-tier performance on coding and agentic tasks. This model sets new state-of-the-art benchmarks among open models for agentic coding, browser-use, and tool-use, matching performance comparable to leading models...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 23
    OpenHands

    OpenHands

    Open-source autonomous AI software engineer

    Welcome to OpenHands (formerly OpenDevin), an open-source autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects. Use AI to tackle the toil in your backlog, so you can focus on what matters: hard problems, creative challenges, and over-engineering your dotfiles We believe agentic technology is too important to be controlled by a few corporations. So we're building all our agents in the...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 24
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 25
    LangChain

    LangChain

    ⚡ Building applications with LLMs through composability ⚡

    Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge. This library is aimed at assisting in the development of those types of applications.
    Downloads: 21 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB