CTC-based forced aligner for audio-text in 158 languages
Mirror of Ultralytics YOLO-World model weights for object detection
Speaker segmentation model for 10s audio chunks with powerset labels
Open-weight, large-scale hybrid-attention reasoning model
Vision-language-action model for robot control via images and text
Detects speech activity in audio using pyannote.audio 2.1 pipeline
Time series forecasting model using T5 architecture with 46M params