Showing 79 open source projects for "gpu faster"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 1
    AI-powered enterprise search engine

    AI-powered enterprise search engine

    AI-powered enterprise search engine

    AI-powered enterprise search engine is an open-source, AI-powered enterprise search engine designed to help organizations quickly locate and retrieve information scattered across multiple internal tools, documents, and communication platforms. It enables users to search across sources such as Slack, Confluence, Jira, Google Drive, and other enterprise systems, consolidating fragmented knowledge into a single, unified search experience. By leveraging natural language processing, Gerev allows...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ChatLLM Web

    ChatLLM Web

    Chat with LLM like Vicuna totally in your browser with WebGPU

    ...Powered By web-llm. To use this app, you need a browser that supports WebGPU, such as Chrome 113 or Chrome Canary. Chrome versions ≤ 112 are not supported. You will need a GPU with about 6.4GB of memory. If your GPU has less memory, the app will still run, but the response time will be slower. The first time you use the app, you will need to download the model. For the Vicuna-7b model that we are currently using, the download size is about 4GB. After the initial download, the model will be loaded from the browser cache for faster usage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    FFCV

    FFCV

    Fast Forward Computer Vision (and other ML workloads!)

    ffcv is a drop-in data loading system that dramatically increases data throughput in model training. From gridding to benchmarking to fast research iteration, there are many reasons to want faster model training. Below we present premade codebases for training on ImageNet and CIFAR, including both (a) extensible codebases and (b) numerous premade training configurations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Point-E

    Point-E

    Point cloud diffusion for 3D model synthesis

    point-e is the official repository for Point-E, a generative model developed by OpenAI that produces 3D point clouds from textual (or image) prompts. Its principal advantage is speed: it can generate 3D assets in just 1–2 minutes on a single GPU, which is significantly faster than many competing text-to-3D models. The model works via a two-stage diffusion approach: first, it uses a text → image diffusion network to produce a synthetic 2D view consistent with the prompt; then a second diffusion model converts that image into a 3D point cloud. While it does not match the fine detail of some slower methods, the tradeoff in speed makes it practical for prototyping and interactive 3D generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    G-Diffuser Bot

    G-Diffuser Bot

    Discord bot and Interface for Stable Diffusion

    The first release of the all-in-one installer version of G-Diffuser is here. This release no longer requires the installation of WSL or Docker and has a systray icon to keep track of and launch G-Diffuser components. The infinite zoom scripts have been updated with some improvements, notably a new compositer script that is hundreds of times faster than before. The first release of the all-in-one installer is here. It notably features much easier "one-click" installation and updating, as well...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Fairseq

    Fairseq

    Facebook AI Research Sequence-to-Sequence Toolkit written in Python

    Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. We provide reference implementations of various sequence modeling papers. Recent work by Microsoft and Google has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Tensorflow Transformers

    Tensorflow Transformers

    State of the art faster Transformer with Tensorflow 2.0

    ...Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Faster AutoReggressive Decoding, TFlite support, creating TFRecords is simple. Auto-Batching tf.data.dataset or tf.ragged tensors. Everything is dictionary (inputs and outputs) Multiple mask modes like causal, user-defined, prefix. tensorflow-text tokenizer support. Supports GPU, TPU, multi-GPU trainer with wandb, multiple callbacks, auto tensorboard.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Guild AI

    Guild AI

    Experiment tracking, ML developer tools

    Guild AI is an open-source experiment tracking toolkit designed to bring systematic control to machine learning workflows, enabling users to build better models faster. It automatically captures every detail of training runs as unique experiments, facilitating comprehensive tracking and analysis. Users can compare and analyze runs to deepen their understanding and incrementally improve models. Guild AI simplifies hyperparameter tuning by applying state-of-the-art algorithms through...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    TensorFlowOnSpark

    TensorFlowOnSpark

    TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

    By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    Hugging Face Transformer

    Hugging Face Transformer

    CPU/GPU inference server for Hugging Face transformer models

    ...Both are great tools but not very performant in inference. Then, if you spend some time, you can build something over ONNX Runtime and Triton inference server. You will usually get from 2X to 4X faster inference compared to vanilla Pytorch. It's cool! However, if you want the best in class performances on GPU, there is only a single possible combination: Nvidia TensorRT and Triton. You will usually get 5X faster inference compared to vanilla Pytorch.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    TSNE-CUDA

    TSNE-CUDA

    GPU Accelerated t-SNE for CUDA with Python bindings

    This repo is an optimized CUDA version of FIt-SNE algorithm with associated python modules. We find that our implementation of t-SNE can be up to 1200x faster than Sklearn, or up to 50x faster than Multicore-TSNE when used with the right GPU. You can install binaries with anaconda for CUDA version 10.1 and 10.2 using conda install tsnecuda -c conda-forge. Tsnecuda supports CUDA versions 9.0 and later through source installation, check out the wiki for up to date installation instructions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MACE

    MACE

    Deep learning inference framework optimized for mobile platforms

    ...Runtime is optimized with NEON, OpenCL and Hexagon, and Winograd algorithm is introduced to speed up convolution operations. The initialization is also optimized to be faster. Chip-dependent power options like big.LITTLE scheduling, Adreno GPU hints are included as advanced APIs. UI responsiveness guarantee is sometimes obligatory when running a model. Mechanism like automatically breaking OpenCL kernel into small units is introduced to allow better preemption for the UI rendering task. Graph level memory allocation optimization and buffer reuse are supported. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Detectron2

    Detectron2

    Next-generation platform for object detection and segmentation

    ...Includes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc. Can be used as a library to support different projects on top of it. We'll open source more research projects in this way. It trains much faster. Models can be exported to TorchScript format or Caffe2 format for deployment. With a new, more modular design, Detectron2 is flexible and extensible, and able to provide fast training on single or multiple GPU servers. Detectron2 includes high-quality implementations of state-of-the-art object detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Tez

    Tez

    Tez is a super-simple and lightweight Trainer for PyTorch

    Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch. tez (तेज़ / تیز) means sharp, fast & active. This is a simple, to-the-point, library to make your PyTorch training easy. This library is in early-stage currently! So, there might be breaking changes. Currently, tez supports cpu, single gpu and multi-gpu & tpu training. More coming soon! Using tez is super-easy. We don't want you to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    YOLO ROS

    YOLO ROS

    YOLO ROS: Real-Time Object Detection for ROS

    ...Darknet on the CPU is fast (approximately 1.5 seconds on an Intel Core i7-6700HQ CPU @ 2.60GHz × 8) but it's like 500 times faster on GPU! You'll have to have an Nvidia GPU and you'll have to install CUDA. The CMakeLists.txt file automatically detects if you have CUDA installed or not. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    HiFi-GAN

    HiFi-GAN

    Generative Adversarial Networks for Efficient and High Fidelity Speech

    ...It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    PyText

    PyText

    A natural language modeling framework based on PyTorch

    ...We use PyText at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale. Distributed-training support built on the new C10d backend in PyTorch 1.0. Mixed precision training support through APEX (trains faster with less GPU memory on NVIDIA Tensor Cores). Extensible components that allows easy creation of new models and tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Bangla TTS

    Bangla TTS

    Bangla text to speech synthesis in python

    Bangla text to speech Multilingual (Bangla, English) real-time ([almost] in a GPU) speech synthesis library. Installation -------------------------------------- * Install Anaconda * conda create -n new_virtual_env python==3.6.8 * conda activate new_virtual_env * pip install -r requirements.txt * While running for the first time, keep your internet connection on to download the weights of the speech synthesis models (>500 MB) * For...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    textgenrnn

    textgenrnn

    Easily train your own text-generating neural network

    ...Train on and generate text at either the character-level or word-level. Configure RNN size, the number of RNN layers, and whether to use bidirectional RNNs. Train on any generic input text file, including large files. Train models on a GPU and then use them to generate text with a CPU. Utilize a powerful CuDNN implementation of RNNs when trained on the GPU, which massively speeds up training time as opposed to typical LSTM implementations. Train the model using contextual labels, allowing it to learn faster and produce better results in some cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    maskrcnn-benchmark

    maskrcnn-benchmark

    Fast, modular reference implementation of Instance Segmentation

    Mask R-CNN Benchmark is a PyTorch-based framework that provides high-performance implementations of object detection, instance segmentation, and keypoint detection models. Originally built to benchmark Mask R-CNN and related models, it offers a clean, modular design to train and evaluate detection systems efficiently on standard datasets like COCO. The framework integrates critical components—region proposal networks (RPNs), RoIAlign layers, mask heads, and backbone architectures such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Tensorpack

    Tensorpack

    A Neural Net Training Interface on TensorFlow, with focus on speed

    Tensorpack is a neural network training interface based on TensorFlow v1. Uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack. Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. Squeeze the best data loading performance of Python with tensorpack.dataflow. Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LUMINOTH

    LUMINOTH

    Deep Learning toolkit for Computer Vision

    LUMINOTH is an open-source deep learning toolkit designed for computer vision tasks, particularly object detection. The framework is implemented in Python and built on top of TensorFlow and the Sonnet neural network library, providing a modular environment for training and deploying detection models. It was created to simplify the process of building and experimenting with deep learning models capable of identifying objects within images. Luminoth includes support for popular object...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    RavenCoin Wallet

    RavenCoin Wallet

    RavenCoin Wallet including CPU and GPU miners!

    Raven is an experimental digital currency that enables instant payments to anyone, anywhere in the world. Raven uses peer-to-peer technology to operate with no central authority: managing transactions and issuing money are carried out collectively by the network. Raven Core is the name of open source software that enables the use of this currency. A digital peer-to-peer network for the facilitation of asset transfer. In the fictional world of Westeros, ravens are used as messengers who carry...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    SoAx

    Structure of Arrays of multiple types

    Structures of arrays (SoA) are generally faster than arrays of structures (AoS) while AoS are more handy. This project (SoAx) combines the advantages of both. By means of C++(11) meta-template programming SoAx achieves maximal performance (efficient use of vector units and cache of modern CPUs) while providing a very convenient user interface (including object-oriented element handling) and flexibility. It has been designed to handle list-like sets of particles (similar to struct {int id;...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    CUDA-Quicksort

    CUDA-Quicksort: A GPU-based implementation of the quicksort algorithm

    ...CUDA-quicksort is an iterative GPU-based implementation of the quicksort algorithm. "Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort."[*]. *Copyright © 2015 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. (2015) DOI: 10.1002/cpe.3611 For further information, please see the corresponding publication: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3611/abstract
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB