Awesome multilingual OCR toolkits based on PaddlePaddle
Python SDK for Claude Agent
Video Object and Interaction Deletion
Visual Causal Flow
From Images to High-Fidelity 3D Assets
Native and Compact Structured Latents for 3D Generation
A multimodal model for brain response prediction
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Qwen3.5 is the large language model series developed by Qwen team
Open Source Speech Language Model
RGBD video generation model conditioned on camera input
Claude Code image, a one-stop open source transit service
Bidirectional token-classification model for identifiable info
State of the art LLM and coding model
ChatGPT interface with better UI
Contexts Optical Compression
Qwen3-ASR is an open-source series of ASR models
Long-form streaming TTS system for multi-speaker dialogue generation
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Open-source framework for intelligent speech interaction
Audio foundation model excelling in audio understanding
Project Lyra: Open Generative 3D World Models
Controllable & emotion-expressive zero-shot TTS
Genome modeling and design across all domains of life
Foundational Models for State-of-the-Art Speech and Text Translation