Showing 17 open source projects for "gpu speed"

View related business solutions
  • Run Any Workload on Compute Engine VMs Icon
    Run Any Workload on Compute Engine VMs

    From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.

    Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
    Try Compute Engine
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • 1
    Shumai

    Shumai

    Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

    ...It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    cuDF

    cuDF

    GPU DataFrame Library

    ...The RAPIDS suite of open-source software libraries aims to enable the execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    CuPy

    CuPy

    A NumPy-compatible array library accelerated by CUDA

    CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. CuPy is very easy to install through pip or through precompiled binary packages called wheels for recommended environments. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    TensorRT Node for ComfyUI

    TensorRT Node for ComfyUI

    Enables the best performance on NVIDIA RTX Graphics Cards

    ComfyUI_TensorRT is an extension that lets ComfyUI run AI inference through NVIDIA’s TensorRT, aiming to get faster, more efficient execution on supported GPUs. It bridges the gap between ComfyUI’s flexible, node-based workflows and TensorRT’s highly optimized engine format. The result is that complex diffusion or image-processing graphs can be accelerated without the user having to rewrite the pipeline. The repo typically includes instructions for converting models to TensorRT engines and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    The Futhark Programming Language

    The Futhark Programming Language

    A data-parallel functional programming language

    ...While the Futhark language and compiler is an ongoing research project, it is quite usable for real programming and can compile nontrivial programs which then run on real machines at high speed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    MNN

    MNN

    MNN is a blazing fast, lightweight deep learning framework

    MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models, and has industry leading performance for inference and training on-device. At present, MNN has been integrated in more than 20 apps of Alibaba Inc, such as Taobao, Tmall, Youku, Dingtalk, Xianyu and etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    ...You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR algorithm. Gain the lowest memory usage when inferencing a model by leveraging our unique pushdown memory planner. NOTE: MegEngine now supports Python installation on Linux-64bit/Windows-64bit/MacOS(CPU-Only)-10.14+/Android 7+(CPU-Only) platforms with Python from 3.5 to 3.8. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    FurMark

    FurMark

    GPU stress test OpenGL and Vulkan graphics benchmark Windows/Linux

    FurMark is an intensive benchmarking tool designed to evaluate the performance of graphics cards using fur rendering algorithms. This tool is particularly effective in generating high workloads that can significantly increase the temperature of the GPU, making it a useful utility for testing the stability and stress tolerance of graphics cards. By simulating demanding rendering tasks, FurMark serves as a comprehensive test for assessing the robustness and thermal performance of GPUs under...
    Downloads: 379 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 10
    Bandicoot

    Bandicoot

    fast C++ library for GPU linear algebra & scientific computing

    * Fast GPU linear algebra library (matrix maths) for the C++ language, aiming towards a good balance between speed and ease of use * Provides high-level syntax and functionality deliberately similar to Matlab * Provides an API that is aiming to be compatible with Armadillo for easy transition between CPU and GPU linear algebra code * Useful for algorithm development directly in C++, or quick conversion of research code into production environments * Distributed under the permissive Apache 2.0 license, useful for both open-source and proprietary (closed-source) software * Can be used for machine learning, pattern recognition, computer vision, signal processing, bioinformatics, statistics, finance, etc * Downloads: http://coot.sourceforge.io/download.html * Documentation: http://coot.sourceforge.io/docs.html * Bug reports: http://coot.sourceforge.io/faq.html * Git repo: https://gitlab.com/conradsnicta/bandicoot-code
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Neural Tangents

    Neural Tangents

    Fast and Easy Infinite Neural Networks in Python

    Neural Tangents is a high-level neural network API for specifying complex, hierarchical models at both finite and infinite width, built in Python on top of JAX and XLA. It lets researchers define architectures from familiar building blocks—convolutions, pooling, residual connections, and nonlinearities—and obtain not only the finite network but also the corresponding Gaussian Process (GP) kernel of its infinite-width limit. With a single specification, you can compute NNGP and NTK kernels,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MACE

    MACE

    Deep learning inference framework optimized for mobile platforms

    Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices. Runtime is optimized with NEON, OpenCL and Hexagon, and Winograd algorithm is introduced to speed up convolution operations. The initialization is also optimized to be faster. Chip-dependent power options like big.LITTLE scheduling, Adreno GPU hints are included as advanced APIs. UI responsiveness guarantee is sometimes obligatory when running a model. Mechanism like automatically breaking OpenCL kernel into small units is introduced to allow better preemption for the UI rendering task. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Minkowski Engine

    Minkowski Engine

    Auto-diff neural network library for high-dimensional sparse tensors

    The Minkowski Engine is an auto-differentiation library for sparse tensors. It supports all standard neural network layers such as convolution, pooling, unspooling, and broadcasting operations for sparse tensors. The Minkowski Engine supports various functions that can be built on a sparse tensor. We list a few popular network architectures and applications here. To run the examples, please install the package and run the command in the package root directory. Compressing a neural network to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TFLearn

    TFLearn

    Deep learning library featuring a higher-level API for TensorFlow

    ...Powerful helper functions to train any TensorFlow graph, with support of multiple inputs, outputs, and optimizers. Easy and beautiful graph visualization, with details about weights, gradients, activations, and more. Effortless device placement for using multiple CPU/GPU. The high-level API currently supports the most of the recent deep learning models, such as Convolutions, LSTM, BiRNN, BatchNorm, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Caffe2

    Caffe2

    Caffe2 is a lightweight, modular, and scalable deep learning framework

    Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind. Caffe2 is a deep learning framework that provides an easy and straightforward way for you to experiment with deep learning and leverage community contributions of new models and algorithms. You can bring your creations to scale using the power of GPUs in the cloud or to the masses on mobile with Caffe2’s cross-platform...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Caffe

    Caffe

    A fast open framework for deep learning

    Caffe is an open source deep learning framework that’s focused on expression, speed and modularity. It’s got an expressive architecture that encourages application and innovation, and extensible code that’s great for active development. Caffe also offers great speed, capable of processing over 60M images per day with a single NVIDIA K40 GPU. It’s arguably one of the fastest convnet implementations around. Caffe is developed by the Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and a great community of contributors that continue to make Caffe state-of-the-art in both code and models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    overdrive5

    AMD GPU power/fan control via ADL OverDrive5 interface

    Command line tool that uses AMD ADL OverDrive5 interface to control power and fan speed of their GPU boards. Look at the Wiki page about usage. Works on Windows and Linux (if video driver is loaded).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →