Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
PyTorch library of curated Transformer models and their components
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Operating LLMs in production
Open-source tool designed to enhance the efficiency of workloads
Serve, optimize and scale PyTorch models in production
The unofficial python package that returns response of Google Bard
OpenAI swift async text to image for SwiftUI app using OpenAI
Run 100B+ language models at home, BitTorrent-style
Self-contained Machine Learning and Natural Language Processing lib
A GPU-accelerated library containing highly optimized building blocks
An innovative library for efficient LLM inference
LLMFlows - Simple, Explicit and Transparent LLM Apps
Pure C++ implementation of several models for real-time chatting
Build your chatbot within minutes on your favorite device
20+ high-performance LLMs with recipes to pretrain, finetune at scale
lightweight, standalone C++ inference engine for Google's Gemma models
GPU environment management and cluster orchestration
Bolt is a deep learning library with high performance
Low-latency REST API for serving text-embeddings
OpenMLDB is an open-source machine learning database
A library for accelerating Transformer models on NVIDIA GPUs
Multilingual Automatic Speech Recognition with word-level timestamps
Turn your existing data infrastructure into a feature store
A high-performance ML model serving framework, offers dynamic batching