Showing 38 open source projects for "cuda gpu"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    cuda-oxide

    cuda-oxide

    cuda-oxide is an experimental Rust-to-CUDA compiler

    cuda-oxide is an experimental NVIDIA Labs project that brings Rust closer to native CUDA GPU development. It works as a Rust-to-CUDA compiler path that lets developers write SIMT GPU kernels in idiomatic Rust instead of using a separate CUDA C++ workflow. The project compiles standard Rust code directly to PTX, avoiding DSLs, source-to-source translation, or foreign-language bindings.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    CUDA Python

    CUDA Python

    Performance meets Productivity

    CUDA Python is a unified Python interface for accessing and working with the NVIDIA CUDA platform, enabling developers to build GPU-accelerated applications entirely in Python. It acts as a metapackage composed of multiple submodules that provide both high-level and low-level access to CUDA functionality, including runtime APIs, driver APIs, and JIT compilation tools.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Numba CUDA Target

    Numba CUDA Target

    The CUDA target for Numba

    Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Numbast

    Numbast

    Build an automated pipeline that converts CUDA APIs into Numba

    Numbast is an automated toolchain that bridges CUDA C++ and Python by generating Numba-compatible bindings directly from CUDA header files. Its primary goal is to eliminate the manual effort required to expose CUDA libraries to Python, enabling developers to use GPU-accelerated functionality in Python environments more easily. The system parses CUDA C++ declarations and converts them into Python bindings that can be used within Numba, allowing seamless integration with Python-based GPU workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    CUDA Core Compute Libraries (CCCL)

    CUDA Core Compute Libraries (CCCL)

    CUDA Core Compute Libraries

    CCCL, or CUDA Core Compute Libraries, is a unified repository that consolidates several foundational CUDA C++ libraries into a single, cohesive development platform. It brings together Thrust, CUB, and libcudacxx, which collectively provide high-level abstractions, low-level performance primitives, and a CUDA-compatible standard library for GPU programming.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    NVIDIA GPU Operator

    NVIDIA GPU Operator

    NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

    ...However, configuring and managing nodes with these hardware resources requires the configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM-based monitoring, and others.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    CUDA API Wrappers

    CUDA API Wrappers

    Thin, unified, C++-flavored wrappers for the CUDA APIs

    CUDA API Wrappers is a C++ library providing high-level, modern wrappers for NVIDIA’s CUDA runtime and driver APIs, enhancing usability and efficiency. It is intended for those who would otherwise use these APIs directly, to make working with them more intuitive and consistent, making use of modern C++ language capabilities, programming idioms, and best practices. In a nutshell - making CUDA API work more fun.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    CuPy

    CuPy

    A NumPy-compatible array library accelerated by CUDA

    CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Tiny CUDA Neural Networks

    Tiny CUDA Neural Networks

    Lightning fast C++/CUDA neural network framework

    This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning-fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers. We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Triton

    Triton

    Development repository for the Triton language and compiler

    Triton is a programming language and compiler framework specifically designed for writing highly efficient custom deep learning operations, particularly for GPUs. It aims to bridge the gap between low-level GPU programming, such as CUDA, and higher-level abstractions by providing a more productive and flexible environment for developers. Triton enables users to write optimized kernels for machine learning workloads while maintaining readability and control over performance-critical aspects like memory access patterns and parallel execution. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    NVIDIA Warp

    NVIDIA Warp

    A Python framework for accelerated simulation, data generation

    NVIDIA Warp is a high-performance Python framework developed by NVIDIA for building and accelerating simulation, graphics, and physics-based workloads using GPU computing. It enables developers to write kernel-level code in Python that is automatically compiled into efficient CUDA kernels, combining ease of use with near-native performance. The framework is designed for applications such as robotics, reinforcement learning, physical simulation, and differentiable computing, where performance and flexibility are critical. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    CubeCL

    CubeCL

    Multi-platform high-performance compute language extension for Rust

    CubeCL is a low-level compute language and compiler framework designed to simplify and optimize GPU programming for high-performance workloads, particularly in machine learning and numerical computing. It provides an abstraction layer that allows developers to write portable, hardware-efficient compute kernels without directly dealing with complex GPU APIs such as CUDA or OpenCL. CubeCL focuses on delivering predictable performance and composability by exposing explicit control over memory layouts, parallelism, and execution patterns while still maintaining a developer-friendly syntax. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    Halide

    A language for fast, portable data-parallel computation

    ...It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. This representation can then be compiled to an object file, or a JIT-compile and run in the same process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MuJoCo Playground

    MuJoCo Playground

    An open source library for GPU-accelerated robot learning

    MuJoCo Playground, developed by Google DeepMind, is a GPU-accelerated suite of simulation environments for robot learning and sim-to-real research, built on top of MuJoCo MJX. It unifies a range of control, locomotion, and manipulation tasks into a consistent and scalable framework optimized for JAX and Warp backends. The project includes classic control benchmarks from dm_control, advanced quadruped and bipedal locomotion systems, and dexterous as well as non-prehensile manipulation setups....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Faiss

    Faiss

    Library for efficient similarity search and clustering dense vectors

    Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research. Faiss contains several methods for similarity search. It...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    TensorRT

    TensorRT

    C++ library for high performance inference on NVIDIA GPUs

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 17
    PyTorch Geometric

    PyTorch Geometric

    Geometric deep learning extension library for PyTorch

    ...We have outsourced a lot of functionality of PyTorch Geometric to other packages, which needs to be additionally installed. These packages come with their own CPU and GPU kernel implementations based on C++/CUDA extensions. We do not recommend installation as root user on your system python. Please setup an Anaconda/Miniconda environment or create a Docker image. We provide pip wheels for all major OS/PyTorch/CUDA combinations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Face Alignment

    Face Alignment

    2D and 3D Face alignment library build using pytorch

    ...However, the users can alternatively use dlib, BlazeFace, or pre-existing ground truth bounding boxes. While not required, for optimal performance(especially for the detector) it is highly recommended to run the code using a CUDA-enabled GPU. While here the work is presented as a black box, if you want to know more about the intrisecs of the method please check the original paper either on arxiv or my webpage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AWS Deep Learning Containers

    AWS Deep Learning Containers

    A set of Docker images for training and serving models in TensorFlow

    AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Bend

    Bend

    A massively parallel, high-level programming language

    Bend is an interactive programming environment (REPL) built on top of the Kotlin language, designed to allow users to explore, experiment, and learn Kotlin in a live, feedback-driven manner. The tool lets you define variables, functions, or values at the prompt and iteratively refine them—immediately seeing output and types—while preserving state across commands. It emphasizes discoverability and experimentation: users can inspect functions, call them on sample inputs, and evolve logic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Jupyter Docker Stacks

    Jupyter Docker Stacks

    Ready-to-run Docker images containing Jupyter applications

    Jupyter Docker Stacks provides a curated set of ready-to-run Docker container images that bundle Jupyter applications with popular data science and computing tools, enabling users to quickly start working in a reproducible environment. These stacks support a range of use cases, from lightweight base notebook images to full featured environments that include scientific computing libraries, machine learning tools, and IDE-like notebook interfaces, all within Docker containers that run...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    BentoML

    BentoML

    Unified Model Serving Framework

    ...Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ArrayFire

    ArrayFire

    ArrayFire, a general purpose GPU library

    ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs, and other hardware acceleration devices. The library serves users in every technical computing market. Data structures in ArrayFire are smartly managed to avoid costly memory transfers and to take advantage of each performance feature provided by the underlying hardware. The community of ArrayFire developers invites you to build with us if...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Multimodal

    Multimodal

    TorchMultimodal is a PyTorch library

    This project, also known as TorchMultimodal, is a PyTorch library for building, training, and experimenting with multimodal, multi-task models at scale. The library provides modular building blocks such as encoders, fusion modules, loss functions, and transformations that support combining modalities (vision, text, audio, etc.) in unified architectures. It includes a collection of ready model classes—like ALBEF, CLIP, BLIP-2, COCA, FLAVA, MDETR, and Omnivore—that serve as reference...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    MegEngine is a fast, scalable and easy-to-use deep learning framework with 3 key features. You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo