Showing 43 open source projects for "cuda"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    CV-CUDA

    CV-CUDA

    CV-CUDA™ is an open-source, GPU accelerated library

    CV-CUDA is an open-source project that enables building efficient cloud-scale Artificial Intelligence (AI) imaging and computer vision (CV) applications. It uses graphics processing unit (GPU) acceleration to help developers build highly efficient pre- and post-processing pipelines. CV-CUDA originated as a collaborative effort between NVIDIA and ByteDance.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Tiny CUDA Neural Networks

    Tiny CUDA Neural Networks

    Lightning fast C++/CUDA neural network framework

    ...It will likely only work on an RTX 3090, an RTX 2080 Ti, or high-end enterprise GPUs. Lower-end cards must reduce the n_neurons parameter or use the CutlassMLP (better compatibility but slower) instead. tiny-cuda-nn comes with a PyTorch extension that allows using the fast MLPs and input encodings from within a Python context. These bindings can be significantly faster than full Python implementations; in particular for the multiresolution hash encoding.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    CUDA Containers for Edge AI & Robotics is an open-source project that provides a modular container build system designed for running machine learning and AI workloads on NVIDIA Jetson devices. The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    TensorRT Backend For ONNX

    TensorRT Backend For ONNX

    ONNX-TensorRT: TensorRT backend for ONNX

    ...For building within docker, we recommend using and setting up the docker containers as instructed in the main (TensorRT repository). Note that this project has a dependency on CUDA. By default the build will look in /usr/local/cuda for the CUDA toolkit installation. If your CUDA path is different, overwrite the default path. ONNX models can be converted to serialized TensorRT engines using the onnx2trt executable.
    Downloads: 1 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    CUTLASS

    CUTLASS

    CUDA Templates for Linear Algebra Subroutines

    CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS and cuDNN. CUTLASS decomposes these "moving parts" into reusable, modular software components abstracted by C++ template classes.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    GPU Puzzles

    GPU Puzzles

    Solve puzzles. Learn CUDA

    ...Instead of presenting traditional lecture-style explanations, the project immerses learners directly in hands-on programming tasks that demonstrate how GPU computation works. The exercises are implemented using Python with the Numba CUDA interface, which allows Python code to compile into GPU kernels that run on CUDA-enabled hardware. By solving progressively more complex puzzles, learners gain a practical understanding of how parallel algorithms operate on graphics processing units. The project emphasizes experimentation and problem solving, encouraging learners to discover GPU programming techniques through trial and exploration. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Model Zoo

    Model Zoo

    Please do not feed the models

    ...The examples serve both as educational tools for learning Flux and as practical starting points for building new models. GPU acceleration is supported for most models through CUDA integration, enabling efficient training on compatible hardware. With community contributions encouraged, the Model Zoo acts as a hub for sharing and exploring diverse machine learning applications in Julia.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    TensorRT

    TensorRT

    C++ library for high performance inference on NVIDIA GPUs

    ...With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. TensorRT is built on CUDA®, NVIDIA’s parallel programming model, and enables you to optimize inference leveraging libraries, development tools, and technologies in CUDA-X™ for artificial intelligence, autonomous machines, high-performance computing, and graphics. With new NVIDIA Ampere Architecture GPUs, TensorRT also leverages sparse tensor cores providing an additional performance boost.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 9
    Koila

    Koila

    Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code

    ...The system acts as a thin wrapper around PyTorch tensors and operations, meaning that it integrates easily into existing PyTorch code without requiring major changes to model implementations. It is particularly useful in environments where GPU resources are limited or where models frequently encounter CUDA memory errors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    emgucv

    emgucv

    Cross platform .Net wrapper to the OpenCV image processing library

    Emgu CV is a cross platform .Net wrapper to the OpenCV image processing library. Allowing OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio and Unity, it can run on Windows, Linux, Mac OS, iOS and Android.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 11
    PyTorch Geometric

    PyTorch Geometric

    Geometric deep learning extension library for PyTorch

    ...We have outsourced a lot of functionality of PyTorch Geometric to other packages, which needs to be additionally installed. These packages come with their own CPU and GPU kernel implementations based on C++/CUDA extensions. We do not recommend installation as root user on your system python. Please setup an Anaconda/Miniconda environment or create a Docker image. We provide pip wheels for all major OS/PyTorch/CUDA combinations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Implicit

    Implicit

    Fast Python collaborative filtering for implicit feedback datasets

    ...All models have multi-threaded training routines, using Cython and OpenMP to fit the models in parallel among all available CPU cores. In addition, the ALS and BPR models both have custom CUDA kernels - enabling fitting on compatible GPU’s. This library also supports using approximate nearest neighbour libraries such as Annoy, NMSLIB and Faiss for speeding up making recommendations.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 13
    OpenCV

    OpenCV

    Open Source Computer Vision Library

    OpenCV (Open Source Computer Vision Library) is a comprehensive open-source library for computer vision, machine learning, and image processing. It enables developers to build real-time vision applications ranging from facial recognition to object tracking. OpenCV supports a wide range of programming languages including C++, Python, and Java, and is optimized for both CPU and GPU operations.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 14
    Jittor

    Jittor

    Jittor is a high-performance deep learning framework

    ...Module Design and Dynamic Graph Execution is used in the front-end, which is the most popular design for deep learning framework interface. The back-end is implemented by high-performance languages, such as CUDA, C++. Jittor'op is similar to NumPy. Let's try some operations. We create Var a and b via operation jt.float32, and add them. Printing those variables shows they have the same shape and dtype.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    FlashAttention

    FlashAttention

    Fast and memory-efficient exact attention

    FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations. The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 16
    PyKEEN

    PyKEEN

    A Python library for learning and evaluating knowledge graph embedding

    ...PyKEEN is a Python package for reproducible, facile knowledge graph embeddings. PyKEEN has a function pykeen.env() that magically prints relevant version information about PyTorch, CUDA, and your operating system that can be used for debugging. If you’re in a Jupyter Notebook, it will be pretty-printed as an HTML table.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    cuML

    cuML

    RAPIDS Machine Learning Library

    cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    Torch-TensorRT

    Torch-TensorRT

    PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

    Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA’s TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch’s Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into a module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extension and compiles modules that integrate...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    BentoML

    BentoML

    Unified Model Serving Framework

    ...Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    OpenVINO Training Extensions

    OpenVINO Training Extensions

    Trainable models and NN optimization tools

    OpenVINO™ Training Extensions provide a convenient environment to train Deep Learning models and convert them using the OpenVINO™ toolkit for optimized inference. When ote_cli is installed in the virtual environment, you can use the ote command line interface to perform various actions for templates related to the chosen task type, such as running, training, evaluating, exporting, etc. ote train trains a model (a particular model template) on a dataset and saves results in two files. ote...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    GoCV

    GoCV

    Go package for computer vision using OpenCV 4 and beyond

    GoCV gives programmers who use the Go programming language access to the OpenCV 4 computer vision library. The GoCV package supports the latest releases of Go and OpenCV v4.5.4 on Linux, macOS, and Windows. Our mission is to make the Go language a “first-class” client compatible with the latest developments in the OpenCV ecosystem. Computer Vision (CV) is the ability of computers to process visual information, and perform tasks normally associated with those performed by humans. CV software...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Instant Neural Graphics Primitives

    Instant Neural Graphics Primitives

    Instant neural graphics primitives: lightning fast NeRF and more

    Instant Neural Graphics Primitives, is an open-source research project developed by NVIDIA that enables extremely fast training and rendering of neural graphics representations. The system implements several neural graphics primitives including neural radiance fields, signed distance functions, neural images, and neural volumes. These representations are trained using a compact neural network combined with a multiresolution hash encoding that dramatically accelerates both training and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TorchRec

    TorchRec

    Pytorch domain library for recommendation systems

    TorchRec is a PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale recommender systems (RecSys). It allows authors to train models with large embedding tables sharded across many GPUs. Parallelism primitives that enable easy authoring of large, performant multi-device/multi-node models using hybrid data-parallelism/model-parallelism. The TorchRec sharder can shard embedding tables with different sharding strategies including data-parallel,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    MegEngine is a fast, scalable and easy-to-use deep learning framework with 3 key features. You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo