Showing 79 open source projects for "gpu faster"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    ChatGLM2-6B

    ChatGLM2-6B

    ChatGLM2-6B: An Open Bilingual Chat LLM

    ChatGLM2-6B is the second-gen Chinese-English conversational LLM from ZhipuAI/Tsinghua. It upgrades the base model with GLM’s hybrid pretraining objective, 1.4 TB bilingual data, and preference alignment—delivering big gains on MMLU, CEval, GSM8K, and BBH. The context window extends up to 32K (FlashAttention), and Multi-Query Attention improves speed and memory use. The repo includes Python APIs, CLI & web demos, OpenAI-style/FASTAPI servers, and quantized checkpoints for lightweight local...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Unsloth Studio

    Unsloth Studio

    Unified web UI for training and running open models locally

    Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    AlphaZero.jl

    AlphaZero.jl

    A generic, simple and fast implementation of Deepmind's AlphaZero

    Beyond its much publicized success in attaining superhuman level at games such as Chess and Go, DeepMind's AlphaZero algorithm illustrates a more general methodology of combining learning and search to explore large combinatorial spaces effectively. We believe that this methodology can have exciting applications in many different research areas. Because AlphaZero is resource-hungry, successful open-source implementations (such as Leela Zero) are written in low-level languages (such as C++)...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 4
    ncnn

    ncnn

    High-performance neural network inference framework for mobile

    ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...
    Downloads: 15 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Secret Llama

    Secret Llama

    Fully private LLM chatbot that runs entirely with a browser

    Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    SageAttention

    SageAttention

    NeurIPS2025 Spotlight] Quantized Attention

    SageAttention is an open-source optimization library designed to accelerate the attention mechanism used in transformer-based neural networks. Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    TensorRT

    TensorRT

    C++ library for high performance inference on NVIDIA GPUs

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Isaac ROS Visual SLAM

    Isaac ROS Visual SLAM

    Visual SLAM/odometry package based on NVIDIA-accelerated cuVSLAM

    Discover a faster, easier way to build advanced AI robotics applications with the NVIDIA Isaac™ ROS collection of accelerated computing packages and AI models, bringing NVIDIA acceleration to ROS developers everywhere. Isaac ROS Visual SLAM provides a high-performance, best-in-class ROS 2 package for VSLAM (visual simultaneous localization and mapping).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 10
    ProbabilisticCircuits.jl

    ProbabilisticCircuits.jl

    Probabilistic Circuits from the Juice library

    This module provides a Julia implementation of Probabilistic Circuits (PCs), tools to learn structure and parameters of PCs from data, and tools to do tractable exact inference with them. Probabilistic Circuits provides a unifying framework for several family of tractable probabilistic models. PCs are represented as computational graphs that define a joint probability distribution as recursive mixtures (sum units) and factorizations (product units) of simpler distributions (input units)....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Tiny CUDA Neural Networks

    Tiny CUDA Neural Networks

    Lightning fast C++/CUDA neural network framework

    This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning-fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers. We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    marqo

    marqo

    Tensor search for humans

    A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Lightweight' GAN

    Lightweight' GAN

    Implementation of 'lightweight' GAN, proposed in ICLR 2021

    ...You can turn on automatic mixed precision with one flag --amp. You should expect it to be 33% faster and save up to 40% memory. Aim is an open-source experiment tracker that logs your training runs, and enables a beautiful UI to compare them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DeepSpeed

    DeepSpeed

    Deep learning optimization library: makes distributed training easy

    DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    ...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 16
    JOS

    JOS

    JOS is an N-body simulation system written in Java

    ...The idea was to have a simulation system which can use different interaction laws to calculate the force emerging between the objects. Current version implements only Newton's law. For GUI it uses Swing and Java 2D Graphics. GPU version (master branch) does not use Z coordinate of the objects. This is done for faster calculations but can be easily changed. CPU version (arbitrary_precision branch) use Z coordinate. Current visualization is 2D only. Aparapi library is used for GPU computations. The CPU version (jos-cpu.jar) introduces an abstraction for numbers which allows you to choose which implementation to use: primitive type double, common BigDecimal or arbitrary precision ApFloat. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 18
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the...
    Leader badge
    Downloads: 5,140 This Week
    Last Update:
    See Project
  • 19
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    ...RC1 is now came out🎉🎉🎉 Note: Old Archived ISOs are based on Devuan Chimaera and MX 21.3 Main Website: https://devuboxlinux.github.io/ Official Channel: https://www.youtube.com/channel/UCywtLk7H83pLhn9mVvEZgAw (Outdated) Ventoy is untested. Use Rufus, BalenaEtcher, or MX Live USB Maker. Created in indonesia🇮🇩 System Requirements: Processor: 1 gigahertz (Ghz) or faster​ with 1 or more cores Memory: 1GB Minimum* (With Swap/Zram) / 2GB+ Recommended Storage: 16GB HDD Minimum / 20GB+ HDD or SSD Recommended Integrated or Discrete GPU with 3D acceleration (128 MB+ VRAM) Legacy BIOS or UEFI (Non Secure Boot) 1024x768 VGA Minimum / 1366x768+ Recommended No TPM and Internet is not required for installation!
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    mc-fast-sodium

    mc-fast-sodium

    faster minecraft than some modpacks, beats Fabulously Optimised

    fasest modpack possible, includes Niodium and stuff like that, also optimised for pvp! Minecraft is copyrighted by Mojand, the mods are avaliable either on Curseforge or Modrinth, i did NOT make them requires a minecraft license, PIRACY not allowed in ANY way! minecraft 1.21.1 with fabric 0.16.10, do not use any other config than this one! can get almost any display with almost any pc to 100% capacity(stonger the pc, better the stability and higher can be the resolution. at the start of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    EvaDB

    EvaDB

    Database system for building simpler and faster AI-powered application

    Over the last decade, AI models have radically changed the world of natural language processing and computer vision. They are accurate on various tasks ranging from question answering to object tracking in videos. To use an AI model, the user needs to program against multiple low-level libraries, like PyTorch, Hugging Face, Open AI, etc. This tedious process often leads to a complex AI app that glues together these libraries to accomplish the given task. This programming complexity prevents...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Detectron

    Detectron

    FAIR's research platform for object detection research

    Detectron is an object detection and instance segmentation research framework that popularized many modern detection models in a single, reproducible codebase. Built on Caffe2 with custom CUDA/C++ operators, it provided reference implementations for models like Faster R-CNN, Mask R-CNN, RetinaNet, and Feature Pyramid Networks. The framework emphasized a clean configuration system, strong baselines, and a “model zoo” so researchers could compare results under consistent settings. It includes training and evaluation pipelines that handle multi-GPU setups, standard datasets, and common augmentations, which helped standardize experimental practice in detection research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    GSvit

    GSvit

    Fast FDTD solver with graphics card support

    Fast FDTD solver with graphics card support. Optimized for nanoscale optics - scanning near field optical microscopy, rough surface scattering and solar cells. Uses CUDA environment for graphics card operation.
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB