Awesome multilingual OCR toolkits based on PaddlePaddle
AlphaFold 3 inference pipeline
Industrial-level controllable zero-shot text-to-speech system
Native and Compact Structured Latents for 3D Generation
From Images to High-Fidelity 3D Assets
Qwen3.5 is the large language model series developed by Qwen team
A theoretical reconstruction of the Claude Mythos architecture
Video Object and Interaction Deletion
A multimodal model for brain response prediction
Contexts Optical Compression
Project Lyra: Open Generative 3D World Models
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Qwen3-ASR is an open-source series of ASR models
Python SDK for Claude Agent
Robust Speech Recognition Across Languages, Dialects
Visual Causal Flow
Open Source Speech Language Model
code for Mesh R-CNN, ICCV 2019
1B text generation model based on the HRM architecture
Bidirectional token-classification model for identifiable info
Long-form streaming TTS system for multi-speaker dialogue generation
Controllable & emotion-expressive zero-shot TTS
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
General-purpose image editing model that delivers high-fidelity
Inference script for Oasis 500M