Tensor search for humans
Free, high-quality text-to-speech API endpoint to replace OpenAI
Stable-diffusion-webui-pixelization
Generating Immersive, Explorable, and Interactive 3D Worlds
Code for running inference and finetuning with SAM 3 model
Circuit diagrams and firmware source code for Gboard DIY keyboards
A neural network that transforms a design mock-up into static websites
Fast stable diffusion on CPU and AI PC
Qwen3-TTS is an open-source series of TTS models
Designed for text embedding and ranking tasks
Implementation of Phenaki Video, which uses Mask GIT
A nearly-live implementation of OpenAI's Whisper
ImageBind One Embedding Space to Bind Them All
GPT4V-level open-source multi-modal model based on Llama3-8B
Towards Real-World Vision-Language Understanding
State-of-the-art (SoTA) text-to-video pre-trained model
An open source implementation of CLIP
High-Resolution Image Synthesis with Latent Diffusion Models
This repo contains the code for 1D tokenizer and generator
Edit PDF files with Nano Banana
RGBD video generation model conditioned on camera input
Code and models for ICML 2024 paper, NExT-GPT
CogView4, CogView3-Plus and CogView3(ECCV 2024)
InvokeAI is a leading creative engine for Stable Diffusion models
Image-to-Image Translation in PyTorch