Large Language Model Text Generation Inference
Sparsity-aware deep learning inference runtime for CPUs
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Efficient few-shot learning with Sentence Transformers
Libraries for applying sparsification recipes to neural networks
Bring the notion of Model-as-a-Service to life
A Unified Library for Parameter-Efficient Learning
An easy-to-use LLMs quantization package with user-friendly apis
Framework that is dedicated to making neural data processing
Database system for building simpler and faster AI-powered application
Framework for Accelerating LLM Generation with Multiple Decoding Heads