DistilGPT2: Lightweight, distilled GPT-2 for faster text generation
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Llama 3.2–1B: Multilingual, instruction-tuned model for mobile AI
SigLIP: Zero-shot image-text model with shape-optimized ViT
Summarization model fine-tuned on CNN/DailyMail articles
CTC-based forced aligner for audio-text in 158 languages
Base Vision Transformer pretrained on ImageNet-21k at 224x224
Mirror of Ultralytics YOLO-World model weights for object detection
3B parameter ESM-2 model for protein sequence understanding
Lightweight ResNet-18 model trained on ImageNet with A1 recipe
CLIP model fine-tuned for zero-shot fashion product classification
Sentiment analysis model fine-tuned on SST-2 with DistilBERT
Multilingual task-adaptive embeddings for 94 languages and NLP tasks
Improved DeBERTa model with ELECTRA-style pretraining
Multilingual sentence embeddings for search and similarity tasks
Multimodal 7B model for image, video, and text understanding tasks
Transformer model for image classification with patch-based input.
Efficient 13B MoE language model with long context and reasoning modes
Lightweight sentence embedding model for semantic search
Detects speech activity in audio using pyannote.audio 2.1 pipeline
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Efficient English embedding model for semantic search and retrieval
Efficient cross-encoder for MS MARCO passage re-ranking tasks
High-performance multilingual embedding model for 94 languages
Compact multi-vector retriever with state-of-the-art ranking accuracy