Research code artifacts for Code World Model (CWM)
Controllable & emotion-expressive zero-shot TTS
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Repo of Qwen2-Audio chat & pretrained large audio language model
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention
Capable of understanding text, audio, vision, video
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
A Unified Framework for Text-to-3D and Image-to-3D Generation
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
OCR expert VLM powered by Hunyuan's native multimodal architecture
Audio foundation model excelling in audio understanding
ChatGPT interface with better UI
A state-of-the-art open visual language model
Stable Diffusion with Core ML on Apple Silicon
Towards Real-World Vision-Language Understanding
The ChatGPT Retrieval Plugin lets you easily find personal documents
Pushing the Limits of Mathematical Reasoning in Open Language Models
High-Resolution Image Synthesis with Latent Diffusion Models
AI Suite for upscaling, interpolating & restoring images/videos