Awesome multilingual OCR toolkits based on PaddlePaddle
Python SDK for Claude Agent
From Images to High-Fidelity 3D Assets
Visual Causal Flow
Video Object and Interaction Deletion
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A multimodal model for brain response prediction
Qwen3.5 is the large language model series developed by Qwen team
Open Source Speech Language Model
RGBD video generation model conditioned on camera input
Claude Code image, a one-stop open source transit service
Bidirectional token-classification model for identifiable info
State of the art LLM and coding model
Contexts Optical Compression
Qwen3-ASR is an open-source series of ASR models
Long-form streaming TTS system for multi-speaker dialogue generation
Audio foundation model excelling in audio understanding
Project Lyra: Open Generative 3D World Models
Controllable & emotion-expressive zero-shot TTS
Foundational Models for State-of-the-Art Speech and Text Translation
Analyze computation-communication overlap in V3/R1
Pushing the Limits of Mathematical Reasoning in Open Language Models
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Example Discord bot written in Python that uses the completions API
Let us control diffusion models