SOTA discrete acoustic codec models with 40/75 tokens per second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
GitLab automatic code review tool based on large models
Multi-Agent daTa geneRation Infra and eXperimentation framework
Interface for OuteTTS models
A TTS model capable of generating ultra-realistic dialogue
Chinese and English multimodal conversational language model
Towards Human-Sounding Speech
Qwen3-omni is a natively end-to-end, omni-modal LLM
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Turn your website into a GIF
Chinese Llama-3 LLMs) developed from Meta Llama 3
Inference code for CodeLlama models
Algorithms for explaining machine learning models
Visual Automation IDE — automate anything you see on screen
This repository is a curated collection of links to various courses
Powerful open source image generation model
Overcoming Data Limitations for High-Quality Video Diffusion Models
High-quality multi-lingual text-to-speech library by MyShell.ai
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding
Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, etc
A Customizable Image-to-Video Model based on HunyuanVideo
Release for Improved Denoising Diffusion Probabilistic Models
AI Suite for upscaling, interpolating & restoring images/videos
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation