tiktoken is a fast BPE tokeniser for use with OpenAI's models
Repo of Qwen2-Audio chat & pretrained large audio language model
A Unified Framework for Text-to-3D and Image-to-3D Generation
Controllable & emotion-expressive zero-shot TTS
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
Large-language-model & vision-language-model based on Linear Attention
Capable of understanding text, audio, vision, video
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
OCR expert VLM powered by Hunyuan's native multimodal architecture
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Audio foundation model excelling in audio understanding
Towards Real-World Vision-Language Understanding
Stable Diffusion with Core ML on Apple Silicon
The ChatGPT Retrieval Plugin lets you easily find personal documents
Pushing the Limits of Mathematical Reasoning in Open Language Models
Chat & pretrained large vision language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
High-Resolution Image Synthesis with Latent Diffusion Models
Release for Improved Denoising Diffusion Probabilistic Models
Open-source, high-performance Mixture-of-Experts large language model
A Conversational Speech Generation Model
AI Suite for upscaling, interpolating & restoring images/videos