Efficient few-shot learning with Sentence Transformers
Uncover insights, surface problems, monitor, and fine tune your LLM
Fast inference engine for Transformer models
A lightweight vision library for performing large object detection
A Unified Library for Parameter-Efficient Learning
Serve, optimize and scale PyTorch models in production
AI interface for tinkerers (Ollama, Haystack RAG, Python)
Pure C++ implementation of several models for real-time chatting
FlashInfer: Kernel Library for LLM Serving
Open-source tool designed to enhance the efficiency of workloads
Open-Source AI Camera. Empower any camera/CCTV
Tensor search for humans
The Triton Inference Server provides an optimized cloud
Multilingual Automatic Speech Recognition with word-level timestamps
A high-performance ML model serving framework, offers dynamic batching
Unified Model Serving Framework
A high-performance inference system for large language models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
LLMs as Copilots for Theorem Proving in Lean
An innovative library for efficient LLM inference
The unofficial python package that returns response of Google Bard
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
A GPU-accelerated library containing highly optimized building blocks
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method