Compare the Top Large Language Models in Canada as of June 2026 - Page 6

  • 1
    Ai2 OLMoE

    Ai2 OLMoE

    The Allen Institute for Artificial Intelligence

    Ai2 OLMoE is a fully open source mixture-of-experts language model that is capable of running completely on-device, allowing you to try our model privately and securely. Our app is intended to help researchers better explore how to make on-device intelligence better and to enable developers to quickly prototype new AI experiences, all with no cloud connectivity required. OLMoE is a highly efficient mixture-of-experts version of the Ai2 OLMo family of models. Experience which real-world tasks state-of-the-art local models are capable of. Research how to improve small AI models. Test your own models locally using our open-source codebase. Integrate OLMoE into other iOS applications. The Ai2 OLMoE app provides privacy and security by operating completely on-device. Easily share the output of your conversations with friends or colleagues. The OLMoE model and the application code are fully open source.
    Starting Price: Free
  • 2
    QwQ-Max-Preview
    QwQ-Max-Preview is an advanced AI model built on the Qwen2.5-Max architecture, designed to excel in deep reasoning, mathematical problem-solving, coding, and agent-related tasks. This preview version offers a sneak peek at its capabilities, which include improved performance in a wide range of general-domain tasks and the ability to handle complex workflows. QwQ-Max-Preview is slated for an official open-source release under the Apache 2.0 license, offering further advancements and refinements in its full version. It also paves the way for a more accessible AI ecosystem, with the upcoming launch of the Qwen Chat app and smaller variants of the model like QwQ-32B, aimed at developers seeking local deployment options.
    Starting Price: Free
  • 3
    Octave TTS

    Octave TTS

    Hume AI

    Hume AI has introduced Octave (Omni-capable Text and Voice Engine), a groundbreaking text-to-speech system that leverages large language model technology to understand and interpret the context of words, enabling it to generate speech with appropriate emotions, rhythm, and cadence, unlike traditional TTS models that merely read text, Octave acts akin to a human actor, delivering lines with nuanced expression based on the content. Users can create diverse AI voices by providing descriptive prompts, such as "a sarcastic medieval peasant," allowing for tailored voice generation that aligns with specific character traits or scenarios. Additionally, Octave offers the flexibility to modify the emotional delivery and speaking style through natural language instructions, enabling commands like "sound more enthusiastic" or "whisper fearfully" to fine-tune the output.
    Starting Price: $3 per month
  • 4
    QwQ-32B

    QwQ-32B

    Alibaba

    ​QwQ-32B is an advanced reasoning model developed by Alibaba Cloud's Qwen team, designed to enhance AI's problem-solving capabilities. With 32 billion parameters, it achieves performance comparable to state-of-the-art models like DeepSeek's R1, which has 671 billion parameters. This efficiency is achieved through optimized parameter utilization, allowing QwQ-32B to perform complex tasks such as mathematical reasoning, coding, and general problem-solving with fewer resources. The model supports a context length of up to 32,000 tokens, enabling it to process extensive input data effectively. QwQ-32B is accessible via Alibaba's chatbot service, Qwen Chat, and is open sourced under the Apache 2.0 license, promoting collaboration and further development within the AI community.
    Starting Price: Free
  • 5
    Gemma 3

    Gemma 3

    Google

    Gemma 3, introduced by Google, is a new AI model built on the Gemini 2.0 architecture, designed to offer enhanced performance and versatility. This model is capable of running efficiently on a single GPU or TPU, making it accessible for a wide range of developers and researchers. Gemma 3 focuses on improving natural language understanding, generation, and other AI-driven tasks. By offering scalable, powerful AI capabilities, Gemma 3 aims to advance the development of AI systems across various industries and use cases.
    Starting Price: Free
  • 6
    Command A

    Command A

    Cohere AI

    Command A, introduced by Cohere, is a high-performance AI model designed to maximize efficiency with minimal computational resources. This model outperforms or matches other top-tier models like GPT-4 and DeepSeek-V3 in agentic enterprise tasks while significantly reducing compute costs. It is tailored for applications requiring fast, efficient AI-driven solutions, providing businesses with the capability to perform advanced tasks across various domains, all while optimizing performance and computational demands.
    Starting Price: $2.50 / 1M tokens
  • 7
    Mistral Large 2
    Mistral AI has launched the Mistral Large 2, an advanced AI model designed to excel in code generation, multilingual capabilities, and complex reasoning tasks. The model features a 128k context window, supporting dozens of languages including English, French, Spanish, and Arabic, as well as over 80 programming languages. Mistral Large 2 is tailored for high-throughput single-node inference, making it ideal for large-context applications. Its improved performance on benchmarks like MMLU and its enhanced code generation and reasoning abilities ensure accuracy and efficiency. The model also incorporates better function calling and retrieval, supporting complex business applications.
    Starting Price: Free
  • 8
    Mistral Small 3.1
    ​Mistral Small 3.1 is a state-of-the-art, multimodal, and multilingual AI model released under the Apache 2.0 license. Building upon Mistral Small 3, this enhanced version offers improved text performance, and advanced multimodal understanding, and supports an expanded context window of up to 128,000 tokens. It outperforms comparable models like Gemma 3 and GPT-4o Mini, delivering inference speeds of 150 tokens per second. Designed for versatility, Mistral Small 3.1 excels in tasks such as instruction following, conversational assistance, image understanding, and function calling, making it suitable for both enterprise and consumer-grade AI applications. Its lightweight architecture allows it to run efficiently on a single RTX 4090 or a Mac with 32GB RAM, facilitating on-device deployments. It is available for download on Hugging Face, accessible via Mistral AI's developer playground, and integrated into platforms likeGemini Enterprise Agent Platform, with availability on NVIDIA NIM.
    Starting Price: Free
  • 9
    Llama 4 Behemoth
    Llama 4 Behemoth is Meta's most powerful AI model to date, featuring a massive 288 billion active parameters. It excels in multimodal tasks, outperforming previous models like GPT-4.5 and Gemini 2.0 Pro across multiple STEM-focused benchmarks such as MATH-500 and GPQA Diamond. As the teacher model for the Llama 4 series, Behemoth sets the foundation for models like Llama 4 Maverick and Llama 4 Scout. While still in training, Llama 4 Behemoth demonstrates unmatched intelligence, pushing the boundaries of AI in fields like math, multilinguality, and image understanding.
    Starting Price: Free
  • 10
    Llama 4 Maverick
    Llama 4 Maverick is one of the most advanced multimodal AI models from Meta, featuring 17 billion active parameters and 128 experts. It surpasses its competitors like GPT-4o and Gemini 2.0 Flash in a broad range of benchmarks, especially in tasks related to coding, reasoning, and multilingual capabilities. Llama 4 Maverick combines image and text understanding, enabling it to deliver industry-leading results in image-grounding tasks and precise, high-quality output. With its efficient performance at a reduced parameter size, Maverick offers exceptional value, especially in general assistant and chat applications.
    Starting Price: Free
  • 11
    Llama 4 Scout
    Llama 4 Scout is a powerful 17 billion active parameter multimodal AI model that excels in both text and image processing. With an industry-leading context length of 10 million tokens, it outperforms its predecessors, including Llama 3, in tasks such as multi-document summarization and parsing large codebases. Llama 4 Scout is designed to handle complex reasoning tasks while maintaining high efficiency, making it perfect for use cases requiring long-context comprehension and image grounding. It offers cutting-edge performance in image-related tasks and is particularly well-suited for applications requiring both text and visual understanding.
    Starting Price: Free
  • 12
    Claude Max

    Claude Max

    Anthropic

    The Max Plan from Anthropic's Claude platform is designed for users who require extended access and higher usage limits for their AI-powered collaboration. Ideal for frequent and demanding tasks, the Max Plan offers up to 20 times higher usage than the standard Pro plan. With flexible usage levels, users can select the plan that fits their needs—whether they need additional usage for complex data, large documents, or extended conversations. The Max Plan also includes priority access to new features and models, ensuring users always have the latest tools at their disposal.
    Starting Price: $100/month
  • 13
    GPT-4.1 mini
    GPT-4.1 mini is a compact version of OpenAI’s powerful GPT-4.1 model, designed to provide high performance while significantly reducing latency and cost. With a smaller size and optimized architecture, GPT-4.1 mini still delivers impressive results in tasks such as coding, instruction following, and long-context processing. It supports up to 1 million tokens of context, making it an efficient solution for applications that require fast responses without sacrificing accuracy or depth.
    Starting Price: $0.40 per 1M tokens (input)
  • 14
    GPT-4.1 nano
    GPT-4.1 nano is the smallest and most efficient version of OpenAI's GPT-4.1 model, optimized for low-latency, cost-effective AI processing. Despite its compact size, GPT-4.1 nano delivers strong performance with a 1 million token context window, making it ideal for applications like classification, autocompletion, and smaller-scale tasks that require fast responses. It provides a highly efficient solution for businesses and developers who need an AI model that balances speed, cost, and performance.
    Starting Price: $0.10 per 1M tokens (input)
  • 15
    Qwen3

    Qwen3

    Alibaba

    Qwen3, the latest iteration of the Qwen family of large language models, introduces groundbreaking features that enhance performance across coding, math, and general capabilities. With models like the Qwen3-235B-A22B and Qwen3-30B-A3B, Qwen3 achieves impressive results compared to top-tier models, thanks to its hybrid thinking modes that allow users to control the balance between deep reasoning and quick responses. The platform supports 119 languages and dialects, making it an ideal choice for global applications. Its pre-training process, which uses 36 trillion tokens, enables robust performance, and advanced reinforcement learning (RL) techniques continue to refine its capabilities. Available on platforms like Hugging Face and ModelScope, Qwen3 offers a powerful tool for developers and researchers working in diverse fields.
    Starting Price: Free
  • 16
    Mistral Medium 3
    Mistral Medium 3 is a powerful AI model designed to deliver state-of-the-art performance at a fraction of the cost compared to other models. It offers simpler deployment options, allowing for hybrid or on-premises configurations. Mistral Medium 3 excels in professional applications like coding and multimodal understanding, making it ideal for enterprise use. Its low-cost structure makes it highly accessible while maintaining top-tier performance, outperforming many larger models in specific domains.
    Starting Price: Free
  • 17
    NuExtract

    NuExtract

    NuExtract

    NuExtract is a large language model specialized in extracting structured information from documents of any format, including raw text, scanned images, PDFs, PowerPoints, spreadsheets, and more, supporting over a dozen languages and mixed‑language inputs. It delivers JSON‑formatted output that faithfully follows user‑defined templates, with built‑in verification and null‑value handling to minimize hallucinations. Users define extraction tasks by creating a template, either by describing the desired fields or importing existing schemas—and can improve accuracy by adding document, output examples in the example set. The NuExtract Platform provides an intuitive workspace for designing templates, testing extractions in a playground, managing teaching examples, and fine‑tuning settings such as model temperature and document rasterization DPI. Once validated, projects can be deployed via a RESTful API endpoint that processes documents in real time.
    Starting Price: $5 per 1M tokens
  • 18
    GLM-4.5-Air
    Z.ai is a free AI assistant that brings presentations, writing, and coding into one conversational interface. Leveraging large language models, it lets you generate polished slide decks with AI slides, craft professional‑grade text for emails, reports, or blogs, and write or debug complex code. Beyond content creation, Z.ai supports deep research and information search, helping you gather facts, summarize long documents, and overcome writer’s block, while its code agent can explain snippets, refactor functions, or build scripts from scratch. An intuitive chat interface means no steep learning curves: simply tell Z.ai what you need, a strategic deck, marketing copy, or a data‑analysis script, and get instant, contextually relevant results. With support for multiple languages (including Chinese), native function calling, and up to 128K token context length, Z.ai handles everything from brainstorming ideas to automating repetitive writing or coding.
    Starting Price: Free
  • 19
    ByteDance Seed
    Seed Diffusion Preview is a large-scale, code-focused language model that uses discrete-state diffusion to generate code non-sequentially, achieving dramatically faster inference without sacrificing quality by decoupling generation from the token-by-token bottleneck of autoregressive models. It combines a two-stage curriculum, mask-based corruption followed by edit-based augmentation, to robustly train a standard dense Transformer, striking a balance between speed and accuracy and avoiding shortcuts like carry-over unmasking to preserve principled density estimation. The model delivers an inference speed of 2,146 tokens/sec on H20 GPUs, outperforming contemporary diffusion baselines while matching or exceeding their accuracy on standard code benchmarks, including editing tasks, thereby establishing a new speed-quality Pareto frontier and demonstrating discrete diffusion’s practical viability for real-world code generation.
    Starting Price: Free
  • 20
    GPT-5 mini
    GPT-5 mini is a streamlined, faster, and more affordable variant of OpenAI’s GPT-5, optimized for well-defined tasks and precise prompts. It supports text and image inputs and delivers high-quality text outputs with a 400,000-token context window and up to 128,000 output tokens. This model excels at rapid response times, making it suitable for applications requiring fast, accurate language understanding without the full overhead of GPT-5. Pricing is cost-effective, with input tokens at $0.25 per million and output tokens at $2 per million, providing savings over the flagship model. GPT-5 mini supports advanced features like streaming, function calling, structured outputs, and fine-tuning, but does not support audio input or image generation. It integrates well with various API endpoints including chat completions, responses, and embeddings, making it versatile for many AI-powered tasks.
    Starting Price: $0.25 per 1M tokens
  • 21
    GPT-5 nano
    GPT-5 nano is OpenAI’s fastest and most affordable version of the GPT-5 family, designed for high-speed text processing tasks like summarization and classification. It supports text and image inputs, generating high-quality text outputs with a large 400,000-token context window and up to 128,000 output tokens. GPT-5 nano offers very fast response times, making it ideal for applications requiring quick turnaround without sacrificing quality. Pricing is extremely competitive, with input tokens costing $0.05 per million and output tokens $0.40 per million, making it accessible for budget-conscious projects. The model supports advanced API features such as streaming, function calling, structured outputs, and fine-tuning. While it supports image input, it does not handle audio input or web search, focusing on core text tasks efficiently.
    Starting Price: $0.05 per 1M tokens
  • 22
    DeepSeek V3.1
    DeepSeek V3.1 is a groundbreaking open-weight large language model featuring a massive 685-billion parameters and an extended 128,000‑token context window, enabling it to process documents equivalent to 400-page books in a single prompt. It delivers integrated capabilities for chat, reasoning, and code generation within a unified hybrid architecture, seamlessly blending these functions into one coherent model. V3.1 supports a variety of tensor formats to give developers flexibility in optimizing performance across different hardware. Early benchmark results show robust performance, including a 71.6% score on the Aider coding benchmark, putting it on par with or ahead of systems like Claude Opus 4 and doing so at a far lower cost. Made available under an open source license on Hugging Face with minimal fanfare, DeepSeek V3.1 is poised to reshape access to high-performance AI, challenging traditional proprietary models.
    Starting Price: Free
  • 23
    Hermes 4

    Hermes 4

    Nous Research

    Hermes 4 is the latest evolution in Nous Research’s line of neutrally aligned, steerable foundational models, featuring novel hybrid reasoners that can dynamically shift between expressive, creative responses and efficient, standard replies based on user prompts. The model is designed to respond to system and user instructions, rather than adhering to any corporate ethics framework, producing interactions that feel more humanistic, less lecturing or sycophantic, and encouraging roleplay and creativity. By incorporating a special tag in prompts, users can trigger deeper, internally token-intensive reasoning when tackling complex problems, while retaining prompt efficiency when such depth isn't required. Trained on a dataset 50 times larger than that of Hermes 3, much of which was synthetically generated using Atropos, Hermes 4 shows significant performance improvements.
    Starting Price: Free
  • 24
    K2 Think

    K2 Think

    Institute of Foundation Models

    K2 Think is an open source advanced reasoning model developed collaboratively by the Institute of Foundation Models at MBZUAI and G42. Despite only having 32 billion parameters, it delivers performance comparable to flagship models with many more parameters. It excels in mathematical reasoning, achieving top scores on competitive benchmarks such as AIME ’24/’25, HMMT ’25, and OMNI-Math-HARD. K2 Think is part of a suite of UAE-developed open models, alongside Jais (Arabic), NANDA (Hindi), and SHERKALA (Kazakh), and builds on the foundation laid by K2-65B, the fully reproducible open source foundation model released in 2024. The model is designed to be open, fast, and flexible, offering a web app interface for exploration, and with its efficiency in parameter positioning, it is a breakthrough in compact architectures for advanced AI reasoning.
    Starting Price: Free
  • 25
    DeepSeek-V3.1-Terminus
    DeepSeek has released DeepSeek-V3.1-Terminus, which enhances the V3.1 architecture by incorporating user feedback to improve output stability, consistency, and agent performance. It notably reduces instances of mixed Chinese/English character output and unintended garbled characters, resulting in cleaner, more consistent language generation. The update upgrades both the code agent and search agent subsystems to yield stronger, more reliable performance across benchmarks. DeepSeek-V3.1-Terminus is also available as an open source model, and its weights are published on Hugging Face. The model structure remains the same as DeepSeek-V3, ensuring compatibility with existing deployment methods, with updated inference demos provided for community use. While trained at a scale of 685B parameters, the model includes FP8, BF16, and F32 tensor formats, offering flexibility across environments.
    Starting Price: Free
  • 26
    Qwen3-Max

    Qwen3-Max

    Alibaba

    Qwen3-Max is Alibaba’s latest trillion-parameter large language model, designed to push performance in agentic tasks, coding, reasoning, and long-context processing. It is built atop the Qwen3 family and benefits from the architectural, training, and inference advances introduced there; mixing thinker and non-thinker modes, a “thinking budget” mechanism, and support for dynamic mode switching based on complexity. The model reportedly processes extremely long inputs (hundreds of thousands of tokens), supports tool invocation, and exhibits strong performance on benchmarks in coding, multi-step reasoning, and agent benchmarks (e.g., Tau2-Bench). While its initial variant emphasizes instruction following (non-thinking mode), Alibaba plans to bring reasoning capabilities online to enable autonomous agent behavior. Qwen3-Max inherits multilingual support and extensive pretraining on trillions of tokens, and it is delivered via API interfaces compatible with OpenAI-style functions.
    Starting Price: Free
  • 27
    GLM-4.6

    GLM-4.6

    Zhipu AI

    GLM-4.6 advances upon its predecessor with stronger reasoning, coding, and agentic capabilities: it demonstrates clear improvements in inferential performance, supports tool use during inference, and more effectively integrates into agent frameworks. In benchmark tests spanning reasoning, coding, and agents, GLM-4.6 outperforms GLM-4.5 and shows competitive strength against models such as DeepSeek-V3.2-Exp and Claude Sonnet 4, though it still trails Claude Sonnet 4.5 in pure coding performance. In real-world tests using an extended “CC-Bench” suite across front-end development, tool building, data analysis, and algorithmic tasks, GLM-4.6 beats GLM-4.5 and approaches parity with Claude Sonnet 4, winning ~48.6% of head-to-head comparisons, while also achieving ~15% better token efficiency. GLM-4.6 is available via the Z.ai API, and developers can integrate it as an LLM backend or agent core using the platform’s API.
    Starting Price: Free
  • 28
    DeepSeek-V3.2-Exp
    Introducing DeepSeek-V3.2-Exp, our latest experimental model built on V3.1-Terminus, debuting DeepSeek Sparse Attention (DSA) for faster and more efficient inference and training on long contexts. DSA enables fine-grained sparse attention with minimal loss in output quality, boosting performance for long-context tasks while reducing compute costs. Benchmarks indicate that V3.2-Exp performs on par with V3.1-Terminus despite these efficiency gains. The model is now live across app, web, and API. Alongside this, the DeepSeek API prices have been cut by over 50% immediately to make access more affordable. For a transitional period, users can still access V3.1-Terminus via a temporary API endpoint until October 15, 2025. DeepSeek welcomes feedback on DSA via its feedback portal. In conjunction with the release, DeepSeek-V3.2-Exp has been open-sourced: the model weights and supporting technology (including key GPU kernels in TileLang and CUDA) are available on Hugging Face.
    Starting Price: Free
  • 29
    Gemini Enterprise
    Gemini Enterprise app is an advanced AI-powered platform that brings Google’s AI capabilities to every employee, enabling organizations to automate workflows, analyze data, and create high-quality content across multiple business functions. It securely connects to tools like Microsoft 365, Google Workspace, HubSpot, and Jira, allowing users to search and interact with their business data using natural language. The platform supports prebuilt agents such as NotebookLM and Deep Research, helping teams quickly extract insights and streamline tasks. It also allows users to build custom no-code agents to automate multi-step workflows across different applications. With centralized management, organizations can deploy and monitor all agents from a single interface. Built-in security and governance features ensure data privacy and compliance with enterprise standards. Overall, Gemini Enterprise app enhances productivity by combining AI automation with secure data integration.
    Starting Price: $21 per month
  • 30
    Claude Haiku 4.5
    Anthropic has launched Claude Haiku 4.5, its latest small-language model designed to deliver near-frontier performance at significantly lower cost. The model provides similar coding and reasoning quality as the company’s mid-tier Sonnet 4, yet it runs at roughly one-third of the cost and more than twice the speed. In benchmarks cited by Anthropic, Haiku 4.5 meets or exceeds Sonnet 4’s performance in key tasks such as code generation and multi-step “computer use” workflows. It is optimized for real-time, low-latency scenarios such as chat assistants, customer service agents, and pair-programming support. Haiku 4.5 is made available via the Claude API under the identifier “claude-haiku-4-5” and supports large-scale deployments where cost, responsiveness, and near-frontier intelligence matter. Claude Haiku 4.5 is available now on Claude Code and our apps. Its efficiency means you can accomplish more within your usage limits while maintaining premium model performance.
    Starting Price: $1 per million input tokens
Auth0 Logo