Best Large Language Models - Page 13

Compare the Top Large Language Models as of May 2026 - Page 13

  • 1
    MiMo-V2-Omni

    MiMo-V2-Omni

    Xiaomi Technology

    MiMo-V2-Omni is an advanced multimodal AI model designed to handle a wide range of real-world tasks across text, code, and other data formats. It is built to support agentic workflows, enabling seamless execution of complex, multi-step processes. The model integrates strong reasoning, tool usage, and contextual understanding to deliver reliable outputs. With its ability to process diverse inputs, it enhances productivity across development, automation, and enterprise use cases. MiMo-V2-Omni focuses on delivering consistent performance in both general and specialized tasks.
  • 2
    Claude Mythos

    Claude Mythos

    Anthropic

    Claude Mythos Preview is a highly advanced AI model developed with strong capabilities in cybersecurity, particularly in identifying and exploiting software vulnerabilities. It demonstrates the ability to autonomously discover zero-day vulnerabilities across major operating systems, browsers, and critical software systems. The model can also generate complex exploit chains, including privilege escalation and remote code execution attacks. Its capabilities extend beyond vulnerability detection to reverse engineering and exploit development in both open-source and closed-source environments. Mythos Preview operates through agentic workflows, enabling it to analyze codebases, test hypotheses, and validate exploits independently. These abilities represent a significant leap compared to previous models, which struggled with exploit generation. Overall, Claude Mythos Preview highlights a new era where AI can both strengthen and challenge global cybersecurity practices.
  • 3
    Claude Sonnet 4.7
    Claude Sonnet 4.7 is an advanced AI model designed to deliver strong performance across everyday tasks, professional workflows, and technical problem-solving. It offers improved reasoning, faster responses, and more reliable outputs compared to earlier Sonnet versions. The model excels at writing, coding, analysis, and general productivity tasks with a balanced approach to speed and quality. It supports multimodal capabilities, allowing it to understand and work with both text and images. Claude Sonnet 4.7 is built to follow instructions more accurately, reducing errors and improving consistency. It is optimized for real-world applications such as business operations, content creation, and software development. The model also includes safety and alignment improvements to ensure responsible usage. Overall, Claude Sonnet 4.7 provides a versatile and efficient AI solution for a wide range of use cases.
  • 4
    Grok 4.3
    Grok 4.3 is the latest iteration of xAI’s Grok model, designed to deliver improved reasoning, real-time information access, and advanced task automation. It builds on earlier Grok 4 models by enhancing performance in complex problem-solving, coding, and analytical workflows. The model is integrated with real-time web and X (formerly Twitter) data, allowing it to provide up-to-date insights and answers. Grok 4.3 supports multimodal capabilities, enabling it to work with text, images, and other data types. It operates within the SuperGrok Heavy tier, offering access to more powerful compute and advanced features. The model is designed to handle long-context tasks and multi-step reasoning with greater accuracy. It also supports tool use and integrations, enabling it to interact with external systems and automate workflows. Overall, Grok 4.3 is positioned as a high-performance AI assistant for real-time, data-driven tasks.
  • 5
    Llama

    Llama

    Meta

    Llama (Large Language Model Meta AI) is a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI. Smaller, more performant models such as Llama enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field. Training smaller foundation models like Llama is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases. Foundation models train on a large set of unlabeled data, which makes them ideal for fine-tuning for a variety of tasks. We are making Llama available at several sizes (7B, 13B, 33B, and 65B parameters) and also sharing a Llama model card that details how we built the model in keeping with our approach to Responsible AI practices.
  • 6
    OPT

    OPT

    Meta

    Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
  • 7
    T5

    T5

    Google

    With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
  • 8
    PanGu-α

    PanGu-α

    Huawei

    PanGu-α is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-α, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-α in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-α in performing various tasks under few-shot or zero-shot settings.
  • 9
    Megatron-Turing
    Megatron-Turing Natural Language Generation model (MT-NLG), is the largest and the most powerful monolithic transformer English language model with 530 billion parameters. This 105-layer, transformer-based MT-NLG improves upon the prior state-of-the-art models in zero-, one-, and few-shot settings. It demonstrates unmatched accuracy in a broad set of natural language tasks such as, Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation, etc. With the intent of accelerating research on the largest English language model till date and enabling customers to experiment, employ and apply such a large language model on downstream language tasks - NVIDIA is pleased to announce an Early Access program for its managed API service to MT-NLG mode.
  • 10
    Galactica
    Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. Galactica is a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%.
  • 11
    PanGu-Σ

    PanGu-Σ

    Huawei

    Significant advancements in the field of natural language processing, understanding, and generation have been achieved through the expansion of large language models. This study introduces a system which utilizes Ascend 910 AI processors and the MindSpore framework to train a language model with over a trillion parameters, specifically 1.085T, named PanGu-{\Sigma}. This model, which builds upon the foundation laid by PanGu-{\alpha}, takes the traditionally dense Transformer model and transforms it into a sparse one using a concept known as Random Routed Experts (RRE). The model was efficiently trained on a dataset of 329 billion tokens using a technique called Expert Computation and Storage Separation (ECSS), leading to a 6.3-fold increase in training throughput via heterogeneous computing. Experimentation indicates that PanGu-{\Sigma} sets a new standard in zero-shot learning for various downstream Chinese NLP tasks.
  • 12
    OpenELM

    OpenELM

    Apple

    OpenELM is an open-source language model family developed by Apple. It uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy compared to existing open language models of similar size. OpenELM is trained on publicly available datasets and achieves state-of-the-art performance for its size.
  • 13
    LTM-2-mini

    LTM-2-mini

    Magic AI

    LTM-2-mini is a 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels. For each decoded token, LTM-2-mini’s sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B1 for a 100M token context window. The contrast in memory requirements is even larger – running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache.2 In contrast, LTM requires a small fraction of a single H100’s HBM per user for the same context.
  • 14
    OpenAI o3-mini-high
    The o3-mini-high model from OpenAI advances AI reasoning by refining deep problem-solving in coding, mathematics, and complex tasks. It features adaptive thinking time with adjustable reasoning modes (low, medium, high) to optimize performance based on task complexity. Outperforming the o1 series by 200 Elo points on Codeforces, it delivers high efficiency at a lower cost while maintaining speed and accuracy. As part of the o3 family, it pushes AI problem-solving boundaries while remaining accessible, offering a free tier and expanded limits for Plus subscribers.
  • 15
    Grounded Language Model (GLM)
    Contextual AI introduces its Grounded Language Model (GLM), engineered specifically to minimize hallucinations and deliver highly accurate, source-based responses for retrieval-augmented generation (RAG) and agentic applications. The GLM prioritizes faithfulness to the provided data, ensuring responses are grounded in specific knowledge sources and backed by inline citations. With state-of-the-art performance on the FACTS groundedness benchmark, the GLM outperforms other foundation models in scenarios requiring high accuracy and reliability. The model is designed for enterprise use cases like customer service, finance, and engineering, where trustworthy and precise responses are critical to minimizing risks and improving decision-making.
  • 16
    ERNIE 4.5 Turbo
    ERNIE 4.5 Turbo, unveiled by Baidu at the 2025 Baidu Create conference, is a cutting-edge AI model designed to handle a variety of data inputs, including text, images, audio, and video. It offers powerful multimodal processing capabilities that enable it to perform complex tasks across industries such as customer support automation, content creation, and data analysis. With enhanced reasoning abilities and reduced hallucinations, ERNIE 4.5 Turbo ensures that businesses can achieve higher accuracy and reliability in AI-driven processes. Additionally, this model is priced at just 1% of GPT-4.5’s cost, making it a highly cost-effective alternative for enterprises looking for top-tier AI performance.
  • 17
    ERNIE X1.1
    ERNIE X1.1 is Baidu’s upgraded reasoning model that delivers major improvements over its predecessor. It achieves 34.8% higher factual accuracy, 12.5% better instruction following, and 9.6% stronger agentic capabilities compared to ERNIE X1. In benchmark testing, it surpasses DeepSeek R1-0528 and performs on par with GPT-5 and Gemini 2.5 Pro. Built on the foundation of ERNIE 4.5, it has been enhanced with extensive mid-training and post-training, including reinforcement learning. The model is available through ERNIE Bot, the Wenxiaoyan app, and Baidu’s Qianfan MaaS platform via API. These upgrades are designed to reduce hallucinations, improve reliability, and strengthen real-world AI task performance.
  • 18
    ERNIE 5.0
    ERNIE 5.0 is a next-generation conversational AI platform developed by Baidu, designed to deliver natural, human-like interactions across multiple domains. Built on Baidu’s Enhanced Representation through Knowledge Integration (ERNIE) framework, it fuses advanced natural language processing (NLP) with deep contextual understanding. The model supports multimodal capabilities, allowing it to process and generate text, images, and voice seamlessly. ERNIE 5.0’s refined contextual awareness enables it to handle complex conversations with greater precision and nuance. Its applications span customer service, content generation, and enterprise automation, enhancing both user engagement and productivity. With its robust architecture, ERNIE 5.0 represents a major step forward in Baidu’s pursuit of intelligent, knowledge-driven AI systems.
  • 19
    Grok 4.20
    Grok 4.20 is an advanced artificial intelligence model developed by xAI to elevate reasoning and natural language understanding. Built on the high-performance Colossus supercomputer, it is engineered for speed, scale, and accuracy. Grok 4.20 processes multimodal inputs such as text and images, with video support planned for future releases. The model excels in scientific, technical, and linguistic tasks, delivering highly precise and context-aware responses. Its architecture supports deep reasoning and sophisticated problem-solving capabilities. Enhanced moderation improves output reliability and reduces bias compared to earlier versions. Overall, Grok 4.20 represents a significant step toward more human-like AI reasoning and interpretation.
  • 20
    Grok 4.4
    Grok 4.4 is expected to be the next iteration in xAI’s rapidly evolving AI lineup, building on Grok 4’s advanced reasoning, real-time search, and agentic capabilities. Designed to push performance even further, Grok 4.4 will likely focus on faster responses, deeper contextual understanding, and improved reliability across complex tasks. With tighter integration into live data streams and tools, it aims to deliver more accurate, up-to-date insights while reducing hallucinations and enhancing decision-making workflows.
  • 21
    Chinchilla

    Chinchilla

    Google DeepMind

    Chinchilla is a large language model. Chinchilla uses the same compute budget as Gopher but with 70B parameters and 4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
MongoDB Logo MongoDB