Showing 50 open source projects for "gpu max performance"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    TensorFlow Model Garden

    TensorFlow Model Garden

    Models and examples built with TensorFlow

    The TensorFlow Model Garden is a repository with a number of different implementations of state-of-the-art (SOTA) models and modeling solutions for TensorFlow users. We aim to demonstrate the best practices for modeling so that TensorFlow users can take full advantage of TensorFlow for their research and product development. To improve the transparency and reproducibility of our models, training logs on TensorBoard.dev are also provided for models to the extent possible though not all models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    tvm

    tvm

    Open deep learning compiler stack for cpu, gpu, etc.

    Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend. The vision of the Apache TVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Deep Java Library (DJL)

    Deep Java Library (DJL)

    An engine-agnostic deep learning framework in Java

    ...Because DJL is deep learning engine agnostic, you don't have to make a choice between engines when creating your projects. You can switch engines at any point. To ensure the best performance, DJL also provides automatic CPU/GPU choice based on hardware configuration.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    The SpeechBrain Toolkit

    The SpeechBrain Toolkit

    A PyTorch-based Speech Toolkit

    ...Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well. SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 5
    fastai

    fastai

    Deep learning library

    fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    SSD in PyTorch 1.0

    SSD in PyTorch 1.0

    High quality, fast, modular reference implementation of SSD in PyTorch

    This repository implements SSD (Single Shot MultiBox Detector). The implementation is heavily influenced by the projects ssd.pytorch, pytorch-ssd and maskrcnn-benchmark. This repository aims to be the code base for research based on SSD. Multi-GPU training and inference: We use DistributedDataParallel, you can train or test with arbitrary GPU(s), the training schema will change accordingly. Add your own modules without pain. We abstract backbone, Detector, BoxHead, BoxPredictor, etc. You can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    FEDML Open Source

    FEDML Open Source

    The unified and scalable ML library for large-scale training

    ...Highly integrated with TensorOpera open source library, TensorOpera AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds. A typical workflow is shown in the figure above. When a developer wants to run a pre-built job in Studio or Job Store, TensorOperaLaunch swiftly pairs AI jobs with the most economical GPU resources, and auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SIG Rust

    SIG Rust

    Rust language bindings for TensorFlow

    SIG Rust provides idiomatic Rust bindings for TensorFlow, making it possible for developers to work with TensorFlow functionality from within the Rust programming language. Rather than replacing TensorFlow itself, it acts as an integration layer that connects Rust applications to the TensorFlow C API. The repository is designed for developers who want Rust’s performance, safety, and systems programming strengths while still accessing TensorFlow’s machine learning capabilities. It includes...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MLPACK is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and flexibility for expert users. * More info + downloads: https://mlpack.org * Git repo: https://github.com/mlpack/mlpack
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Coqui STT

    Coqui STT

    The deep learning toolkit for speech-to-text

    Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. Coqui STT is battle-tested in both production and research. Multiple possible transcripts, each with an associated confidence score. Experience the immediacy of script-to-performance. With Coqui text-to-speech, production times go from months to minutes. With Coqui, the post is a pleasure. Effortlessly clone the voices of your talent and have the clone handle the problems...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    DeepMosaics

    DeepMosaics

    Automatically remove the mosaics in images and videos, or add mosaics

    ...This project is based on "semantic segmentation" and "Image-to-Image Translation". You can either run DeepMosaics via a pre-built binary package, or from source. Run time depends on the computer's performance (GPU version has better performance but requires CUDA to be installed). Different pre-trained models are suitable for different effects.[Introduction to pre-trained models].
    Downloads: 86 This Week
    Last Update:
    See Project
  • 12
    YOLOv4-large

    YOLOv4-large

    Scaled-YOLOv4: Scaling Cross Stage Partial Network

    YOLOv4-large is an open-source implementation of the Scaled-YOLOv4 object detection architecture, designed to improve both the accuracy and scalability of real-time computer vision models. The project provides a PyTorch implementation of the Scaled-YOLOv4 framework, which extends the original YOLOv4 architecture using Cross Stage Partial (CSP) networks and new scaling techniques. Unlike earlier object detection systems that only scale depth or width, this architecture scales multiple aspects...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TensorLayer

    TensorLayer

    Deep learning and reinforcement learning library for scientists

    TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extensive collection of customizable neural layers to build advanced AI models quickly, based on this, the community open-sourced mass tutorials and applications. TensorLayer is awarded the 2017 Best Open Source Software by the ACM Multimedia Society. This project can also be found at OpenI and Gitee. 3.0.0 has been pre-released, the current version...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    BytePS

    BytePS

    A high performance and generic framework for distributed DNN training

    BytePS is a high-performance and generally distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on either TCP or RDMA networks. BytePS outperforms existing open-sourced distributed training frameworks by a large margin. For example, on BERT-large training, BytePS can achieve ~90% scaling efficiency with 256 GPUs (see below), which is much higher than Horovod+NCCL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    imgaug

    imgaug

    Image augmentation for machine learning experiments

    imgaug is a library for image augmentation in machine learning experiments. It supports a wide range of augmentation techniques, allows to easily combine these and to execute them in random order or on multiple CPU cores, has a simple yet powerful stochastic interface and can not only augment images but also key points/landmarks, bounding boxes, heatmaps and segmentation maps. Affine transformations, perspective transformations, contrast changes, gaussian noise, dropout of regions,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Torchreid

    Torchreid

    Deep learning person re-identification in PyTorch

    Torchreid is a library for deep-learning person re-identification, written in PyTorch and developed for our ICCV’19 project, Omni-Scale Feature Learning for Person Re-Identification. In "deep-person-reid/scripts/", we provide a unified interface to train and test a model. See "scripts/main.py" and "scripts/default_config.py" for more details. The folder "configs/" contains some predefined configs which you can use as a starting point. The code will automatically (download and) load the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Tensorpack

    Tensorpack

    A Neural Net Training Interface on TensorFlow, with focus on speed

    ...On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack. Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. Squeeze the best data loading performance of Python with tensorpack.dataflow. Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. Tensorpack squeezes the most performance out of pure Python with various auto parallelization strategies. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DIGITS

    DIGITS

    Deep Learning GPU training system

    The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning into the hands of engineers and data scientists. DIGITS can be used to rapidly train the highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks. DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real-time with advanced visualizations, and selecting the best performing model from the results browser for deployment. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Intel neon

    Intel neon

    Intel® Nervana™ reference deep learning framework

    neon is Intel's reference deep learning framework committed to best performance on all hardware. Designed for ease of use and extensibility. See the new features in our latest release. We want to highlight that neon v2.0.0+ has been optimized for much better performance on CPUs by enabling Intel Math Kernel Library (MKL). The DNN (Deep Neural Networks) component of MKL that is used by neon is provided free of charge and downloaded automatically as part of the neon installation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    NNVM

    NNVM

    Open deep learning compiler stack for cpu, gpu

    The vision of the Apache NNVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging machine learning models for any hardware platform. Compilation of deep learning models into minimum deployable modules. Infrastructure to automatically generates and optimize models on more backend with better performance....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Caffe2

    Caffe2

    Caffe2 is a lightweight, modular, and scalable deep learning framework

    Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind. Caffe2 is a deep learning framework that provides an easy and straightforward way for you to experiment with deep learning and leverage community contributions of new models and algorithms. You can bring your creations to scale using the power of GPUs in the cloud or to the masses on mobile with Caffe2’s cross-platform...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    GPU Machine Learning Library. This library aims to provide machine learning researchers and practitioners with a high performance library by taking advantage of the GPU enormous computational power. The library is developed in C++ and CUDA.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    LightSpMV

    lightweight GPU-based sparse matrix-vector multiplication (SpMV)

    LightSpMV is a novel CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm using the standard compressed sparse row (CSR) storage format. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the state-of-the-art CUSP and cuSPARSE. Performance evaluation reveals that on a single Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single and double precision, respectively.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24

    LBP in multiple platforms

    LBP implementation in multiple computing platforms (ARM,GPU, DSP...)

    The Local Binary Pattern (LBP) is a texture operator that is used in several different computer vision applications and implemented in a variety of platforms. When selecting a suitable LBP implementation platform, the specific application and its requirements in terms of performance, size, energy efficiency, cost and developing time has to be carefully considered. This is a software toolbox that collects software implementations of the Local Binary Pattern operator in several platforms: - OpenCL for CPU & GPU - OpenCL for GPU (branchless) - C code optimized for ARM - OpenGL ES 2.0 shaders mobile GPUs - C code for TI C64x DSP core (branchless) - C code for TTA processor synthesis If you use the code somewhere, please cite: Bordallo López M., Nieto A., Boutellier J., Hannuksela J., and Silvén O. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    SweetOnionCCG2PTBConverter

    SweetOnionCCG2PTBConverter

    A tool that converts CCGBank to PTB

    Conversion between different grammar frameworks is of great importance to comparative performance analysis of the parsers developed on them. This tool can convert CCG derivations to PTB trees by using Max Entropy models as well as visualizing the tree graphs. The main technical innovation presented here is the effective conversion method which achieves a F score over 95%.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB