Sparsity-aware deep learning inference runtime for CPUs
Efficient few-shot learning with Sentence Transformers
Probabilistic reasoning and statistical analysis in TensorFlow
Libraries for applying sparsification recipes to neural networks
A Unified Library for Parameter-Efficient Learning
Ready-to-use OCR with 80+ supported languages
Phi-3.5 for Mac: Locally-run Vision and Language Models
Library for OCR-related tasks powered by Deep Learning
DoWhy is a Python library for causal inference
Bring the notion of Model-as-a-Service to life
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Neural Network Compression Framework for enhanced OpenVINO
Tensor search for humans
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
Database system for building simpler and faster AI-powered application
A computer vision framework to create and deploy apps in minutes
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Implementation of model parallel autoregressive transformers on GPUs
CPU/GPU inference server for Hugging Face transformer models