LLM abstractions that aren't obstructions
Scalable generative AI framework built for researchers and developers
Machine learning, conversational dialog engine for creating chat bots
Multi-lingual large voice generation model, providing inference
A Python package for segmenting geospatial data with the SAM
Bailing is a voice dialogue robot similar to GPT-4o
MARS5 speech model (TTS) from CAMB.AI
Large Multimodal Models for Video Understanding and Editing
Stable Diffusion web UI
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Guiding Instruction-based Image Editing via Multimodal Large Language
ImageBind One Embedding Space to Bind Them All
Diffusion Transformer with Fine-Grained Chinese Understanding
Sample code and notebooks for Generative AI on Google Cloud
Flexible Photo Recrafting While Preserving Your Identity
Build Vision Agents quickly with any model or video provider
Towards Real-World Vision-Language Understanding
Framework for building neural networks
Toolkit for audio, music, and speech generation
CogView4, CogView3-Plus and CogView3(ECCV 2024)
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Extract schema, statistics and entities from datasets
Conversational voice AI agents
ChatGPT extension for scientific research work
Data loaders and abstractions for text and NLP