Showing 4 open source projects for "pytorch"

View related business solutions
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 1
    XNNPACK

    XNNPACK

    High-efficiency floating-point neural network inference operators

    ...Rather than serving as a standalone ML framework, XNNPACK provides high-performance computational primitives—such as convolutions, pooling, activation functions, and arithmetic operations—that are integrated into higher-level frameworks like TensorFlow Lite, PyTorch Mobile, ONNX Runtime, TensorFlow.js, and MediaPipe. The library is written in C/C++ and designed for maximum portability, efficiency, and performance, leveraging platform-specific instruction sets (e.g., NEON, AVX, SIMD) for optimized execution. It supports NHWC tensor layouts and allows flexible striding along the channel dimension to efficiently handle channel-split and concatenation operations without additional cost.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Tiny CUDA Neural Networks

    Tiny CUDA Neural Networks

    Lightning fast C++/CUDA neural network framework

    ...It will likely only work on an RTX 3090, an RTX 2080 Ti, or high-end enterprise GPUs. Lower-end cards must reduce the n_neurons parameter or use the CutlassMLP (better compatibility but slower) instead. tiny-cuda-nn comes with a PyTorch extension that allows using the fast MLPs and input encodings from within a Python context. These bindings can be significantly faster than full Python implementations; in particular for the multiresolution hash encoding.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    TensorRT Pro

    TensorRT Pro

    C++ library based on tensorrt integration

    High-level interface for C++/Python. Simplify the implementation of the custom plugin. And serialization and deserialization have been encapsulated for easier usage. Simplify the compilation of fp32, fp16 and int8 for facilitating the deployment with C++/Python in server or embedded device. Models ready for use also with examples are RetinaFace, Scrfd, YoloV5, YoloX, Arcface, AlphaPose, CenterNet and DeepSORT(C++).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Previous
  • You're on page 1
  • Next
Auth0 Logo