State-of-the-art TTS model under 25MB
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Phi-3.5 for Mac: Locally-run Vision and Language Models
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Chinese LLaMA & Alpaca large language model + local CPU/GPU training
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
Wan2.2: Open and Advanced Large-Scale Video Generative Model
High-Resolution Image Synthesis with Latent Diffusion Models
Generating Immersive, Explorable, and Interactive 3D Worlds from Words
Powerful image generation foundation model
Powerful large language model (LLM) from Alibaba Cloud
Wan2.1: Open and Advanced Large-Scale Video Generative Model
A Conversational Speech Generation Model
Janus-Series: Unified Multimodal Understanding and Generation Models
Qwen (通义千问) chat/pretrained large language model Alibaba Cloud
An implementation of model parallel GPT-2 and GPT-3-style models
Text-to-image diffusion model for high-quality image generation
Advanced base model for high-quality text-to-image generation
Llama-2-7B is a 7B-parameter transformer model for text generation
Efficient text-to-image model with enhanced quality and typography
7B-parameter foundational LLM by Meta for text generation tasks
GPT-2 is a 124M parameter English language model for text generation
Dia-1.6B generates lifelike English dialogue and vocal expressions
Compact 360M text model with high efficiency and fine-tuning support