Nemotron 3 Ultra
Nemotron 3 Nano is a compact, open large language model in NVIDIA’s Nemotron 3 family, designed for efficient agentic reasoning, conversational AI, and coding tasks. It uses a hybrid Mixture-of-Experts Mamba-Transformer architecture that activates only a small subset of parameters per token, enabling low-latency inference while maintaining strong accuracy and reasoning performance. It has approximately 31.6 billion total parameters with around 3.2 billion active (3.6 billion including embeddings), allowing it to achieve higher accuracy than previous Nemotron 2 Nano while using less computation per forward pass. Nemotron 3 Nano supports long-context processing of up to one million tokens, enabling it to handle large documents, multi-step workflows, and extended reasoning chains in a single pass. It is designed for high-throughput, real-time execution, excelling in multi-turn conversations, tool calling, and agent-based workflows where tasks require planning, reasoning, and more.
Learn more
Reka Flash 3
Reka Flash 3 is a 21-billion-parameter multimodal AI model developed by Reka AI, designed to excel in general chat, coding, instruction following, and function calling. It processes and reasons with text, images, video, and audio inputs, offering a compact, general-purpose solution for various applications. Trained from scratch on diverse datasets, including publicly accessible and synthetic data, Reka Flash 3 underwent instruction tuning on curated, high-quality data to optimize performance. The final training stage involved reinforcement learning using REINFORCE Leave One-Out (RLOO) with both model-based and rule-based rewards, enhancing its reasoning capabilities. With a context length of 32,000 tokens, Reka Flash 3 performs competitively with proprietary models like OpenAI's o1-mini, making it suitable for low-latency or on-device deployments. The model's full precision requires 39GB (fp16), but it can be compressed to as small as 11GB using 4-bit quantization.
Learn more
DeepSeek-V4
DeepSeek V4 is an advanced AI model designed to push the boundaries of large-scale artificial intelligence with an estimated 1 trillion parameters. It utilizes a Mixture-of-Experts architecture, activating only a fraction of its parameters per task to improve efficiency. The model supports a massive context window of up to 1 million tokens, enabling it to process long documents and complex codebases. It is natively multimodal, allowing it to understand and generate text, images, audio, and video. DeepSeek V4 introduces innovations such as Engram memory, sparse attention mechanisms, and improved training stability techniques. It is expected to deliver high performance in areas like software engineering and reasoning while maintaining lower operational costs. Overall, DeepSeek V4 aims to combine scalability, efficiency, and affordability to compete with leading AI models.
Learn more
Kimi K2 Thinking
Kimi K2 Thinking is an advanced open source reasoning model developed by Moonshot AI, designed specifically for long-horizon, multi-step workflows where the system interleaves chain-of-thought processes with tool invocation across hundreds of sequential tasks. The model uses a mixture-of-experts architecture with a total of 1 trillion parameters, yet only about 32 billion parameters are activated per inference pass, optimizing efficiency while maintaining vast capacity. It supports a context window of up to 256,000 tokens, enabling the handling of extremely long inputs and reasoning chains without losing coherence. Native INT4 quantization is built in, which reduces inference latency and memory usage without performance degradation. Kimi K2 Thinking is explicitly built for agentic workflows; it can autonomously call external tools, manage sequential logic steps (up to and typically between 200-300 tool calls in a single chain), and maintain consistent reasoning.
Learn more