Showing 12 open source projects for "cuda benchmark"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    CUDA Agent

    CUDA Agent

    Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

    ...Its architecture combines large-scale data synthesis, a skill-augmented CUDA development environment, and long-horizon reinforcement learning to build intrinsic optimization capability rather than relying on simple post-hoc tuning. The system operates in a ReAct-style loop where the agent profiles baseline implementations, writes CUDA code, compiles it in a sandbox, and iteratively refines performance. CUDA-Agent has demonstrated strong benchmark results, achieving high pass rates and significant speedups compared with compiler baselines such as torch.compile.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    XMRig

    XMRig

    RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner

    High performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT CPU/GPU miner, RandomX benchmark, and stratum proxy. XMRig is a high-performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT unified CPU/GPU miner and RandomX benchmark. Official binaries are available for Windows, Linux, macOS, and FreeBSD. The preferred way to configure the miner is the JSON config file as it is more flexible and human-friendly. The command-line interface...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 3
    PyTorch Geometric

    PyTorch Geometric

    Geometric deep learning extension library for PyTorch

    It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of an easy-to-use mini-batch loader for many small and single giant graphs, a large number of common benchmark datasets (based on simple interfaces to create your own), and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds. We have outsourced a lot of functionality of PyTorch Geometric to other packages, which needs to be additionally installed. These packages come with their own CPU and GPU kernel implementations based on C++/CUDA extensions. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    DeepGEMM

    DeepGEMM

    Clean and efficient FP8 GEMM kernels with fine-grained scaling

    DeepGEMM is a specialized CUDA library for efficient, high-performance general matrix multiplication (GEMM) operations, with particular focus on low-precision formats such as FP8 (and experimental support for BF16). The library is designed to work cleanly and simply, avoiding overly templated or heavily abstracted code, while still delivering performance that rivals expert-tuned libraries. It supports both standard and “grouped” GEMMs, which is useful for architectures like Mixture of...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    DeepSeek-V3.2-Exp

    DeepSeek-V3.2-Exp

    An experimental version of DeepSeek model

    ...The key innovation in this version is DeepSeek Sparse Attention (DSA), a sparse attention mechanism that aims to optimize training and inference efficiency in long-context settings without degrading output quality. According to the authors, they aligned the training setup of V3.2-Exp with V3.1-Terminus so that benchmark results remain largely comparable, even though the internal attention mechanism changes. In public evaluations across a variety of reasoning, code, and question-answering benchmarks (e.g. MMLU, LiveCodeBench, AIME, Codeforces, etc.), V3.2-Exp shows performance very close to or in some cases matching that of V3.1-Terminus. The repository includes tools and kernels to support the new sparse architecture—for instance, CUDA kernels, logit indexers, and open-source modules like FlashMLA and DeepGEMM are invoked for performance.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 6
    OpenFace Face Recognition

    OpenFace Face Recognition

    Face recognition with deep neural networks

    ...Accuracies from research papers have just begun to surpass human accuracies on some benchmarks. The accuracies of open source face recognition systems lag behind the state-of-the-art. See our accuracy comparisons on the famous LFW benchmark.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    BCI

    BCI

    BCI: Breast Cancer Immunohistochemical Image Generation

    Breast Cancer Immunohistochemical Image Generation through Pyramid Pix2pix. We have released the trained model on BCI and LLVIP datasets. We host a competition for breast cancer immunohistochemistry image generation on Grand Challenge. Project pix2pix provides a python script to generate pix2pix training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene, these can be pairs {HE, IHC}. Then we can learn to translate A(HE images)...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Tiny

    Tiny

    Tiny Face Detector, CVPR 2017

    ...It provides training/testing scripts, a demo (tiny_face_detector.m), model loading, evaluation on WIDER FACE, and supporting utilities (e.g. cnn_widerface_eval.m). The code depends on MatConvNet, which must be compiled (with GPU / CUDA / cuDNN support) for full performance. Pretrained model provided (ResNet101-based, plus alternatives). Demo and evaluation scripts for benchmark datasets. Use of “foveal descriptors” to incorporate context for low-resolution faces. Pretrained model provided (ResNet101-based, plus alternatives).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    cocolib / light field suite

    CUDA library for continuous optimization and light field analysis

    Library for continuous convex optimization in image analysis, together with a command line tool and Matlab interface. Implements several recent algorithms for inverse problems and image segmentation with total variation regularizers and vectorial multilabel transition costs. Also included is a suite for variational light field analysis, which ties into the HCI light field benchmark set and givens reference implementations for a number of our recently published algorithms. *** NOTE: ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10

    CUDA-Quicksort

    CUDA-Quicksort: A GPU-based implementation of the quicksort algorithm

    CUDA-quicksort is an iterative GPU-based implementation of the quicksort algorithm. "Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort."[*]. *Copyright © 2015 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    SVMBenchmark

    CUDA SVM training benchmark

    This application can train SVM using LibSVM and several CUDA implementations. Supported input file formats are LibSVM text file and Bottou's LaSVM binary file. Wanted implementation can be chosen using command line parameter. Training, input data loading and output data saving times are measured and reported. Output model is saved in LibSVM text format.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    GPUBench2 is a cross platform suite to analyze the performance of GPU( Graphics Processing Unit). GPUBench2 will gather of state-of-the-art parameters from different interfaces ( OpenGL, Cg, CUDA) available.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB