OpenVINO™ Toolkit repository
Sparsity-aware deep learning inference runtime for CPUs
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Large Language Model Text Generation Inference
A GPU-accelerated library containing highly optimized building blocks
Unified Model Serving Framework
Data manipulation and transformation for audio signal processing
LLM.swift is a simple and readable library
Neural Network Compression Framework for enhanced OpenVINO
Standardized Serverless ML Inference Platform on Kubernetes
Efficient few-shot learning with Sentence Transformers
Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
Bring the notion of Model-as-a-Service to life
Openai style api for open large language models
Bolt is a deep learning library with high performance
A Unified Library for Parameter-Efficient Learning
The Triton Inference Server provides an optimized cloud
Lightweight Python library for adding real-time multi-object tracking
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Easy-to-use deep learning framework with 3 key features
Integrate, train and manage any AI models and APIs with your database
Library for serving Transformers models on Amazon SageMaker
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Build Production-ready Agentic Workflow with Natural Language