Sharp Monocular Metric Depth in Less Than a Second
Diffusion Transformer with Fine-Grained Chinese Understanding
CTC-based forced aligner for audio-text in 158 languages
Compact 8B multimodal instruct model optimized for edge deployment
Small 3B-base multimodal model ideal for custom AI on edge hardware
Efficient 14B multimodal instruct model with edge deployment and FP8