Awesome multilingual OCR toolkits based on PaddlePaddle
Python SDK for Claude Agent
Visual Causal Flow
From Images to High-Fidelity 3D Assets
Video Object and Interaction Deletion
Native and Compact Structured Latents for 3D Generation
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Open Source Speech Language Model
RGBD video generation model conditioned on camera input
Bidirectional token-classification model for identifiable info
Contexts Optical Compression
ChatGPT interface with better UI
Qwen3-ASR is an open-source series of ASR models
Long-form streaming TTS system for multi-speaker dialogue generation
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Open-source framework for intelligent speech interaction
Audio foundation model excelling in audio understanding
Project Lyra: Open Generative 3D World Models
Controllable & emotion-expressive zero-shot TTS
Genome modeling and design across all domains of life
Pushing the Limits of Mathematical Reasoning in Open Language Models
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Example Discord bot written in Python that uses the completions API
Let us control diffusion models
LL model providing reasoning and conversational capabilities