Large Language Model Text Generation Inference
Efficient few-shot learning with Sentence Transformers
Low-latency REST API for serving text-embeddings
Tensor search for humans
Phi-3.5 for Mac: Locally-run Vision and Language Models
LLM training code for MosaicML foundation models
State-of-the-art diffusion models for image and audio generation
MII makes low-latency and high-throughput inference possible
A graphical manager for ollama that can manage your LLMs
Framework that is dedicated to making neural data processing
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Implementation of "Tree of Thoughts
Training & Implementation of chatbots leveraging GPT-like architecture
CPU/GPU inference server for Hugging Face transformer models