Awesome multilingual OCR toolkits based on PaddlePaddle
Industrial-level controllable zero-shot text-to-speech system
AlphaFold 3 inference pipeline
RGBD video generation model conditioned on camera input
Visual Causal Flow
From Images to High-Fidelity 3D Assets
Contexts Optical Compression
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Open Source Speech Language Model
Qwen3-ASR is an open-source series of ASR models
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Python SDK for Claude Agent
General-purpose image editing model that delivers high-fidelity
Long-form streaming TTS system for multi-speaker dialogue generation
Controllable & emotion-expressive zero-shot TTS
Pushing the Limits of Mathematical Reasoning in Open Language Models
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Audio foundation model excelling in audio understanding
Official implementation of Watermark Anything with Localized Messages
Inference script for Oasis 500M
FAIR Sequence Modeling Toolkit 2
Pokee Deep Research Model Open Source Repo
code for Mesh R-CNN, ICCV 2019
VGGSfM: Visual Geometry Grounded Deep Structure From Motion