20 projects for "gpu faster" with 1 filter applied:

  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Alacritty

    Alacritty

    A cross-platform, GPU-accelerated terminal emulator

    ...With such a strong focus on simplicity and performance, Alacritty’s included features are very carefully considered, ensuring that it remains blazingly fast. It’s got a GPU for rendering that makes a whole lot of optimizations possible. In various benchmarked terminals, Alacritty has shown to be either faster, or way faster than others. Alacritty requires no additional setup, but still allows configuration of many aspects of the terminal. It supports Windows, macOS, Linux and BSD.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    ...Its design emphasizes efficiency and practicality, fitting within modest GPU memory footprints.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    DeepEP

    DeepEP

    DeepEP: an efficient expert-parallel communication library

    DeepEP is a communication library designed specifically to support Mixture-of-Experts (MoE) and expert parallelism (EP) deployments. Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 13 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    XFrames

    XFrames

    GPU-accelerated GUI development for Node.js and the browser

    xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SageAttention

    SageAttention

    NeurIPS2025 Spotlight] Quantized Attention

    SageAttention is an open-source optimization library designed to accelerate the attention mechanism used in transformer-based neural networks. Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Secret Llama

    Secret Llama

    Fully private LLM chatbot that runs entirely with a browser

    Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    ProbabilisticCircuits.jl

    ProbabilisticCircuits.jl

    Probabilistic Circuits from the Juice library

    This module provides a Julia implementation of Probabilistic Circuits (PCs), tools to learn structure and parameters of PCs from data, and tools to do tractable exact inference with them. Probabilistic Circuits provides a unifying framework for several family of tractable probabilistic models. PCs are represented as computational graphs that define a joint probability distribution as recursive mixtures (sum units) and factorizations (product units) of simpler distributions (input units)....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DeepSpeed

    DeepSpeed

    Deep learning optimization library: makes distributed training easy

    DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    Detectron

    Detectron

    FAIR's research platform for object detection research

    Detectron is an object detection and instance segmentation research framework that popularized many modern detection models in a single, reproducible codebase. Built on Caffe2 with custom CUDA/C++ operators, it provided reference implementations for models like Faster R-CNN, Mask R-CNN, RetinaNet, and Feature Pyramid Networks. The framework emphasized a clean configuration system, strong baselines, and a “model zoo” so researchers could compare results under consistent settings. It includes training and evaluation pipelines that handle multi-GPU setups, standard datasets, and common augmentations, which helped standardize experimental practice in detection research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AI-powered enterprise search engine

    AI-powered enterprise search engine

    AI-powered enterprise search engine

    AI-powered enterprise search engine is an open-source, AI-powered enterprise search engine designed to help organizations quickly locate and retrieve information scattered across multiple internal tools, documents, and communication platforms. It enables users to search across sources such as Slack, Confluence, Jira, Google Drive, and other enterprise systems, consolidating fragmented knowledge into a single, unified search experience. By leveraging natural language processing, Gerev allows...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Point-E

    Point-E

    Point cloud diffusion for 3D model synthesis

    point-e is the official repository for Point-E, a generative model developed by OpenAI that produces 3D point clouds from textual (or image) prompts. Its principal advantage is speed: it can generate 3D assets in just 1–2 minutes on a single GPU, which is significantly faster than many competing text-to-3D models. The model works via a two-stage diffusion approach: first, it uses a text → image diffusion network to produce a synthetic 2D view consistent with the prompt; then a second diffusion model converts that image into a 3D point cloud. While it does not match the fine detail of some slower methods, the tradeoff in speed makes it practical for prototyping and interactive 3D generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    HiFi-GAN

    HiFi-GAN

    Generative Adversarial Networks for Efficient and High Fidelity Speech

    ...It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    maskrcnn-benchmark

    maskrcnn-benchmark

    Fast, modular reference implementation of Instance Segmentation

    Mask R-CNN Benchmark is a PyTorch-based framework that provides high-performance implementations of object detection, instance segmentation, and keypoint detection models. Originally built to benchmark Mask R-CNN and related models, it offers a clean, modular design to train and evaluate detection systems efficiently on standard datasets like COCO. The framework integrates critical components—region proposal networks (RPNs), RoIAlign layers, mask heads, and backbone architectures such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    LUMINOTH

    LUMINOTH

    Deep Learning toolkit for Computer Vision

    LUMINOTH is an open-source deep learning toolkit designed for computer vision tasks, particularly object detection. The framework is implemented in Python and built on top of TensorFlow and the Sonnet neural network library, providing a modular environment for training and deploying detection models. It was created to simplify the process of building and experimenting with deep learning models capable of identifying objects within images. Luminoth includes support for popular object...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    SOAP3-DP

    SOAP3-DP

    Fast, Accurate and Sensitive GPU-based Short Read Aligner

    Latest Code on GitHub: https://github.com/aquaskyline/SOAP3-dp SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MICA-aligner

    MICA-aligner

    Next-generation sequencing short reads aligner based on Intel® MIC

    Latest Code in GitHub: https://github.com/aquaskyline/MICA-aligner To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MIC’s limitation and the extra parallelism inside each MIC core. Experiments on aligning 150bp paired-end reads show that MICA using one MIC board is ~4.85 times faster than the CPU-(multi-core)-based BWA-MEM and about the same speed as the GPU-based SOAP3-dp. Furthermore, MICA’s simplicity allows very efficient scale-up when multiple MIC boards are used in a node (3 cards gives a 14-fold speedup over 6-core BWA-MEM).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB