Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Bayesian inference with probabilistic programming
AI interface for tinkerers (Ollama, Haystack RAG, Python)
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
A high-performance inference system for large language models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
LLMs as Copilots for Theorem Proving in Lean
Unofficial (Golang) Go bindings for the Hugging Face Inference API
Framework which allows you transform your Vector Database
Libraries for applying sparsification recipes to neural networks
FlashInfer: Kernel Library for LLM Serving
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
lightweight, standalone C++ inference engine for Google's Gemma models
Efficient few-shot learning with Sentence Transformers
Pytorch domain library for recommendation systems
Official inference library for Mistral models
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Private Open AI on Kubernetes
A Pythonic framework to simplify AI service building
Adversarial Robustness Toolbox (ART) - Python Library for ML security
A Unified Library for Parameter-Efficient Learning
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method