Controllable & emotion-expressive zero-shot TTS
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
Repo of Qwen2-Audio chat & pretrained large audio language model
Miso TTS is an 8 billion, highly emotive text-to-speech model
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention
Capable of understanding text, audio, vision, video
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
A Unified Framework for Text-to-3D and Image-to-3D Generation
ChatGPT interface with better UI
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
OCR expert VLM powered by Hunyuan's native multimodal architecture
Audio foundation model excelling in audio understanding
A state-of-the-art open visual language model
Stable Diffusion with Core ML on Apple Silicon
Towards Real-World Vision-Language Understanding
The ChatGPT Retrieval Plugin lets you easily find personal documents
Pushing the Limits of Mathematical Reasoning in Open Language Models
High-Resolution Image Synthesis with Latent Diffusion Models
AI Suite for upscaling, interpolating & restoring images/videos
Chat & pretrained large vision language model