Awesome multilingual OCR toolkits based on PaddlePaddle
Industrial-level controllable zero-shot text-to-speech system
A theoretical reconstruction of the Claude Mythos architecture
Visual Causal Flow
From Images to High-Fidelity 3D Assets
State of the art LLM and coding model
A multimodal model for brain response prediction
Qwen3.5 is the large language model series developed by Qwen team
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Video Object and Interaction Deletion
Controllable & emotion-expressive zero-shot TTS
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Bidirectional token-classification model for identifiable info
CogView4, CogView3-Plus and CogView3(ECCV 2024)
RGBD video generation model conditioned on camera input
Long-form streaming TTS system for multi-speaker dialogue generation
Qwen3-ASR is an open-source series of ASR models
Contexts Optical Compression
Python SDK for Claude Agent
Open Source Speech Language Model
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Audio foundation model excelling in audio understanding
Large Multimodal Models for Video Understanding and Editing
Claude Code image, a one-stop open source transit service
Project Lyra: Open Generative 3D World Models