Build Vision Agents quickly with any model or video provider
Toolkit for audio, music, and speech generation
Concatenate a directory full of files into a single prompt
One-click deployment (including offline integration package)
Code for the paper Language Models are Unsupervised Multitask Learners
Framework for building neural networks
A single Gradio + React WebUI with extensions for ACE-Step
Implementation of "MobileCLIP" CVPR 2024
SOTA discrete acoustic codec models with 40/75 tokens per second
Unified Multimodal Understanding and Generation Models
Official code for Style Aligned Image Generation via Shared Attention
Memory-efficient and performant finetuning of Mistral's models
Official python implementation of UTCP. UTCP is an open standard
LLM powered fuzzing via OSS-Fuzz
Virtual AI anchor that combines state-of-the-art technology
The official PyTorch implementation of Google's Gemma models
Multimodal Diffusion with Representation Alignment
Framework that is dedicated to making neural data processing
Central interface to connect your LLM's with external data
Renderer for the harmony response format to be used with gpt-oss
⚡ Building applications with LLMs through composability ⚡
Phi-3.5 for Mac: Locally-run Vision and Language Models
PPTAgent: Generating and Evaluating Presentations
Machine Learning Systems: Design and Implementation
Generate 3D objects conditioned on text or images