Open Source Speech Language Model
Open-source industrial-grade ASR models
A SOTA open-source image editing model
OCR expert VLM powered by Hunyuan's native multimodal architecture
GPT4V-level open-source multi-modal model based on Llama3-8B
Collection of Gemma 3 variants that are trained for performance
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Open-source deep-learning framework
HY-Motion model for 3D character animation generation
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Revolutionizing Database Interactions with Private LLM Technology
Open-source framework for intelligent speech interaction
Implementation of the Surya Foundation Model for Heliophysics
Long-form streaming TTS system for multi-speaker dialogue generation
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Audio foundation model excelling in audio understanding
Diversity-driven optimization and large-model reasoning ability
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
High-Fidelity and Controllable Generation of Textured 3D Assets
Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing
Generate Any 3D Scene in Seconds
GLM-4-Voice | End-to-End Chinese-English Conversational Model
LLM-based Reinforcement Learning audio edit model
Tiny vision language model