Run Local LLMs on Any Device. Open-source
AirLLM 70B inference with single 4GB GPU
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
High-performance inference framework for large language models
Implement CPU from scratch and play with large model deployments
Parallax is a distributed model serving framework
Find the local LLM that actually runs and performs best
Performance-optimized AI inference on your GPUs
LLM training in simple, raw C/CUDA
A course of learning LLM inference serving on Apple Silicon
An LLM Compiler for Parallel Function Calling
Run PyTorch LLMs locally on servers, desktop and mobile
A modular graph-based Retrieval-Augmented Generation (RAG) system
Examples of using E2B
Tools for merging pretrained large language models
The official Meta Llama 3 GitHub site
All-in-one WebUI for AI generative image and video creation
Universal LLM Deployment Engine with ML Compilation
Access large language models from the command-line
Free ChatGPT&DeepSeek API Key
Open-source AI hackers to find and fix your app’s vulnerabilities
A simple, performant and scalable Jax LLM
Phi-3.5 for Mac: Locally-run Vision and Language Models
Open source libraries and APIs to build custom preprocessing pipelines
Operating LLMs in production