PyTorch library of curated Transformer models and their components
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Framework that is dedicated to making neural data processing
Visual Instruction Tuning: Large Language-and-Vision Assistant
Open-source tool designed to enhance the efficiency of workloads
MII makes low-latency and high-throughput inference possible
Low-latency REST API for serving text-embeddings
A library for accelerating Transformer models on NVIDIA GPUs
Multilingual Automatic Speech Recognition with word-level timestamps
Open platform for training, serving, and evaluating language models
Uncover insights, surface problems, monitor, and fine tune your LLM
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Run 100B+ language models at home, BitTorrent-style
A toolkit to optimize ML models for deployment for Keras & TensorFlow
High quality, fast, modular reference implementation of SSD in PyTorch
Library for serving Transformers models on Amazon SageMaker
Powering Amazon custom machine learning chips
Deep learning optimization library: makes distributed training easy
OpenMLDB is an open-source machine learning database
A GPU-accelerated library containing highly optimized building blocks
Implementation of "Tree of Thoughts
Toolbox of models, callbacks, and datasets for AI/ML researchers
Lightweight anchor-free object detection model