Learning embeddings for classification, retrieval and ranking
Generate embeddings from large-scale graph-structured data
Dual LSTM Encoder for Dialog Response Generation
Open language model developed by NVIDIA as part of Nemotron-3 family
High-performance MoE model with MLA, MTP, and multilingual reasoning
Flagship MoE model for long-context agents and complex coding
Omnimodal AI model for agents, coding, and long-context tasks
QwQ-32B is a reasoning-focused language model for complex tasks
Self-evolving AI model for agents, coding, and complex workflows
Agentic 123B coding model optimized for large-scale engineering
Qwen3-Next: 80B instruct LLM with ultra-long context up to 1M tokens
Efficient 13B MoE language model with long context and reasoning modes
Efficient 8B multimodal model tuned for advanced reasoning tasks.
High-precision 14B multimodal model built for advanced reasoning tasks
Metric monocular depth estimation (vision model)
Efficient MoE reasoning model for coding and math workloads
Multimodal agent model for coding, orchestration, and autonomy
Lightweight 24B agentic coding model with vision and long context
Large-scale xAI model for local inference with SGLang, Grok-2.5
T5-Small: Lightweight text-to-text transformer for NLP tasks
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Open agentic coding model optimized for local deployment
Flagship MoE model for advanced reasoning, coding, and agents
High-efficiency reasoning and agentic intelligence model
Small 3B-base multimodal model ideal for custom AI on edge hardware