67 projects for "automatic" with 2 filters applied:

  • Grafana: The open and composable observability platform Icon
    Grafana: The open and composable observability platform

    Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

    Grafana is the open source analytics & monitoring solution for every database.
    Learn More
  • Run applications fast and securely in a fully managed environment Icon
    Run applications fast and securely in a fully managed environment

    Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

    Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
    Try for free
  • 1
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...
    Downloads: 324 This Week
    Last Update:
    See Project
  • 2
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ...It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    Segment Anything

    Segment Anything

    Provides code for running inference with the SegmentAnything Model

    ...The architecture separates a powerful image encoder from a lightweight mask decoder, so the heavy vision work can be computed once and the interactive part stays fast. A bundled automatic mask generator can sweep an image and propose many object masks, which is useful for dataset bootstrapping or bulk annotation. The repository includes ready-to-use weights, Python APIs, and example notebooks demonstrating both interactive and automatic modes. Because SAM was trained with an extremely large and diverse mask dataset, it tends to generalize well to new domains, making it a practical starting point for research and production annotation tools.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Portkey AI Gateway

    Portkey AI Gateway

    A blazing fast AI Gateway with integrated guardrails

    ...It presents a single, friendly API through which you can route to 200+ LLMs, while applying configurable input/output guardrails to enforce policies or restrict certain content. It supports automatic retries, fallbacks, load balancing across providers or keys, and request timeouts to avoid latency spikes. The gateway is multimodal: it can handle text, vision, audio, and image models under a common interface. It also offers features for governance: role-based access, compliance with standards (SOC2, HIPAA, GDPR), secure key management, and logging/analytics of usage, latency, errors, and cost. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • viewneo - Smart software for digital advertising boards Icon
    viewneo - Smart software for digital advertising boards

    Smart digital signage for 1 to 1000+ screens.

    viewneo is a user-friendly, cloud-based solution that allows companies of all sizes to set up digital signage
    Learn More
  • 5
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Bifrost

    Bifrost

    The Fastest LLM Gateway with built in OTel observability

    ...It is built to be high performance: in benchmark tests at 5,000 requests per second, it reportedly adds only microseconds of overhead and achieves perfect success rates with no failed requests. Bifrost supports features such as automatic fallback (failover between providers), load balancing across API keys/providers, and semantic caching to reduce latency and cost. It also includes observability with built-in metrics, tracing, logging, and supports governance features like rate limiting, access control, and cost budgeting. The architecture is modular: there is a core engine, plugin layers, and transport layers (HTTP APIs).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    FastbuildAI

    FastbuildAI

    An open-source AI framework for developers and entrepreneurs

    FastbuildAI is a pragmatic framework for building agentic applications that can plan, call tools, and produce reliable outputs without forcing you into a heavy platform. It emphasizes fast iteration: you describe tasks declaratively, wire up tools with typed schemas, and let the runtime handle planning, retries, and result aggregation. The project leans into reproducibility with run records, seed control, and structured traces so you can compare behaviors across versions and inputs. Prompt...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Alan AI

    Alan AI

    In-App assistant SDK to build a multimodal conversational UX websites

    ...A powerful web-based IDE where you can write, test and debug dialog scenarios for your voice assistant or chatbot. Alan's AI-backend powered by the industry’s best Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Speech Synthesis. The Alan Cloud provisions and handles the infrastructure required to maintain your voice deployments and perform all the voice processing tasks. To voice enable your app, you only need to get the Alan Client SDK and drop it to your app. No need to plan for, deploy and maintain any infrastructure or speech components - the Alan Platform does the bulk of the work.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Online database software to empower your business management Icon
    Online database software to empower your business management

    Create your own custom database in a fast and easy way, requiring zero technical knowledge.

    TeamDesk is the leading AI-Powered Low-Code platform for creating powerful and flexible web-based databases with no-coding. From small companies to large enterprises, from specific manufactures to vertical business integration, TeamDesk is scalable enough to grow with your business needs.
    Learn More
  • 10
    VideoChat

    VideoChat

    Real-time voice interactive digital human

    VideoChat is a real-time voice-interactive “digital human” system that combines automatic speech recognition, large language models, text-to-speech, and talking-head generation into a single conversational pipeline. It supports both pure end-to-end voice solutions based on multimodal large language models (GLM-4-Voice feeding directly into talking-head generation) and a more traditional cascaded pipeline using ASR → LLM → TTS → talking head.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    easyVoice is an open-source text-to-speech platform aimed at turning long-form text and novels into high-quality audio, with a strong focus on usability and scalability. It provides a web interface where users can paste or upload large texts and generate speech and subtitles in a single workflow, even for works exceeding 100,000 characters. The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    nanochat

    nanochat

    The best ChatGPT that $100 can buy

    nanochat is a from-scratch, end-to-end “mini ChatGPT” that shows the entire path from raw text to a chatty web app in one small, dependency-lean codebase. The repository stitches together every stage of the lifecycle: tokenizer training, pretraining a Transformer on a large web corpus, mid-training on dialogue and multiple-choice tasks, supervised fine-tuning, optional reinforcement learning for alignment, and finally efficient inference with caching. Its north star is approachability and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Story Flicks

    Story Flicks

    Generate high-definition story short videos with one click using AI

    Story Flicks is another open-source project in the AI-assisted video generation / editing space, focused on creating short, story-style videos from script or prompt inputs. It aims to let users generate high-definition short movies or video stories with minimal manual effort, using AI models under the hood to assemble visuals, timing, and possibly narration or subtitles. For creators who want to produce narrative short-form content — whether for social media, storytelling, or prototyping...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    LLM Course

    LLM Course

    Course to get into Large Language Models (LLMs)

    ...The materials also cover inference optimization and quantization to make serving LLMs feasible on commodity GPUs or even CPUs, which is crucial for side projects and startups. Evaluation is treated as a first-class topic, with examples of automatic and human-in-the-loop methods to catch regressions and verify quality beyond simple loss values. By the end, students have a mental model and a practical toolkit for iterating on datasets, training configs, etc.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Lingvo

    Lingvo

    Framework for building neural networks

    ...It has been used to implement state of the art architectures such as recurrent neural networks, Transformer models, variational autoencoder hybrids, and multi task systems. Lingvo includes reference models and configurations for domains like machine translation, automatic speech recognition, language modeling, image understanding, and 3D object detection. Centralized hyperparameter configuration files allow researchers to share exact experiment setups so others can retrain and compare results reliably.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while still benefiting from the robust data preparation practices developed in the speech community. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users. The project is modular: each core function — ASR, VAD, LLM, TTS — exists as a separately replaceable component, which allows flexibility in picking your preferred models depending on resources or languages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    PaSa

    PaSa

    An advanced paper search agent powered by large language models

    PaSa is an open-source “paper search agent” built around large language models (LLMs), designed to automate the process of academic literature retrieval with human-like decision making. Instead of simply translating a query into keywords and returning a flat list of matching papers, PaSa uses a dual-agent architecture (Crawler + Selector) that can iteratively search, read, analyze, and filter academic publications — simulating how a researcher might dig through citation networks, expand...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Armadillo

    Armadillo

    fast C++ library for linear algebra & scientific computing

    * Fast C++ library for linear algebra (matrix maths) and scientific computing * Easy to use functions and syntax, deliberately similar to Matlab / Octave * Uses template meta-programming techniques to increase efficiency * Provides user-friendly wrappers for OpenBLAS, Intel MKL, LAPACK, ATLAS, ARPACK, SuperLU and FFTW libraries * Useful for machine learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. * Downloads:...
    Leader badge
    Downloads: 2,116 This Week
    Last Update:
    See Project
  • 20
    Provides optical character recognition (OCR) solutions for Vietnamese language.
    Leader badge
    Downloads: 225 This Week
    Last Update:
    See Project
  • 21
    Google2SRT

    Google2SRT

    Download, save and convert multiple subtitles from YouTube videos

    Google2SRT allows you to download, save and convert multiple subtitles and translations from YouTube and Google Video to SubRip (.srt) format, which is recognized by most video players. You can download XML subtitles or simply type video's URL, Google2SRT will do the rest.
    Downloads: 74 This Week
    Last Update:
    See Project
  • 22
    ControlNet

    ControlNet

    Let us control diffusion models

    ...The project includes many trained model variants that accept different types of conditioning (e.g., canny edge input, normal maps, skeletal pose) and produce improved fidelity in stable diffusion outputs. It is widely adopted in the community as a go-to tool for semi-automatic image generation workflows, especially when users want structure plus creative freedom.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Anse

    Anse

    Supercharged experience for multiple models such as ChatGPT

    ...It emphasizes a clean, user-friendly experience and supports different conversation modes (single prompt, continuous dialogue, image generation, etc.). Anse uses client-side storage (IndexDB) to keep session history locally, prioritizing user privacy and avoiding automatic uploads of sensitive chat content. It is responsive and optimized for mobile, supports dark mode, and is designed for deployment in a variety of environments (Vercel, Netlify, Docker, etc.). The architecture includes a plugin/provider system so that new model endpoints or providers can be added easily, making it extensible. While it’s designed for end-users, developers can self-host and customize it, adjust parameters, or adapt its UI to match specific needs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    ...The project supports multiple back-end models and toolchains (such as Bark, RVC, AudioLDM, Audiocraft, and other text-to-audio or voice-cloning tools), exposing them through a consistent UI for inference and experimentation. Installation is streamlined through automatic installers and platform-specific scripts that create a virtual environment, install dependencies, and launch the web app with minimal manual setup. For more advanced users, it exposes a rich set of command-line flags to control behavior such as skipping installation, disabling venv, changing model cache directories, sharing Gradio links, setting passwords, and specifying themes or ports.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    OpenAI OpenAPI

    OpenAI OpenAPI

    OpenAPI specification for the OpenAI API

    openai-openapi is the official OpenAPI specification for the OpenAI API, maintained by OpenAI to provide a standardized, machine-readable contract for all supported endpoints. The repository ensures that developers and toolchains can integrate with OpenAI services using a consistent schema across SDKs, code generators, and API clients. It contains both an automatically documented version of the spec and a manually updated branch for reference. Developers can use this specification to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next