Compare the Top LLM Routers in the UK as of June 2026

What are LLM Routers in the UK?

​LLM routers are systems that intelligently direct queries to the most appropriate Large Language Model (LLM) based on factors like complexity and cost. By analyzing incoming prompts, these routers balance performance with resource expenditure, ensuring efficient and effective responses. They contribute to operational efficiency by optimizing resource allocation, leading to cost savings without compromising quality. Additionally, LLM routers enhance system reliability by managing load distribution and providing fallback options during peak times or outages. Overall, they play a crucial role in maximizing the utility of LLMs across various applications. Compare and read user reviews of the best LLM Routers in the UK currently available using the table below. This list is updated regularly.

  • 1
    OpenRouter

    OpenRouter

    OpenRouter

    OpenRouter is a unified interface for LLMs. OpenRouter scouts for the lowest prices and best latencies/throughputs across dozens of providers, and lets you choose how to prioritize them. No need to change your code when switching between models or providers. You can even let users choose and pay for their own. Evals are flawed; instead, compare models by how often they're used for different purposes. Chat with multiple at once in the chatroom. Model usage can be paid by users, developers, or both, and may shift in availability. You can also fetch models, prices, and limits via API. OpenRouter routes requests to the best available providers for your model, given your preferences. By default, requests are load-balanced across the top providers to maximize uptime, but you can customize how this works using the provider object in the request body. Prioritize providers that have not seen significant outages in the last 10 seconds.
    Starting Price: $2 one-time payment
  • 2
    Anyscale

    Anyscale

    Anyscale

    Anyscale is a unified AI platform built around Ray, the world’s leading AI compute engine, designed to help teams build, deploy, and scale AI and Python applications efficiently. The platform offers RayTurbo, an optimized version of Ray that delivers up to 4.5x faster data workloads, 6.1x cost savings on large language model inference, and up to 90% lower costs through elastic training and spot instances. Anyscale provides a seamless developer experience with integrated tools like VSCode and Jupyter, automated dependency management, and expert-built app templates. Deployment options are flexible, supporting public clouds, on-premises clusters, and Kubernetes environments. Anyscale Jobs and Services enable reliable production-grade batch processing and scalable web services with features like job queuing, retries, observability, and zero-downtime upgrades. Security and compliance are ensured with private data environments, auditing, access controls, and SOC 2 Type II attestation.
    Starting Price: $0.00006 per minute
  • 3
    TrueFoundry

    TrueFoundry

    TrueFoundry

    TrueFoundry is a unified platform with an enterprise-grade AI Gateway - combining LLM, MCP, and Agent Gateway - to securely manage, route, and govern AI workloads across providers. Its agentic deployment platform also enables GPU-based LLM deployment along with agent deployment with best practices for scalability and efficiency. It supports on-premise and VPC installations while maintaining full compliance with SOC 2, HIPAA, and ITAR standards.
    Starting Price: $5 per month
  • 4
    Inworld

    Inworld

    Inworld

    The developer platform for AI characters. Get a fully integrated platform for AI characters that goes beyond large language models (LLMs), and adds configurable safety, knowledge, memory, narrative controls, multimodality, and more. Craft characters with distinct personalities and contextual awareness that stay in-world or on brand. Seamlessly integrate into real-time applications, with optimization for scale and performance built-in. Optimized for real-time experiences, Inworld offers low-latency interactions that scale with your application. Orchestrating across LLMs allows us to deliver high-quality interactions with faster inference and lower costs. Every interaction has a context and models need to be aware of yours. Add custom knowledge, content and safety guardrails, and narrative controls to keep your AI in character, in-world, or on brand. Put personality at the center of your AI. Our multimodal AI mimics the full range of human expression.
    Starting Price: $20 per month
  • 5
    Unify AI

    Unify AI

    Unify AI

    Explore the power of choosing the right LLM for your needs and how to optimize for quality, speed, and cost-efficiency. Access all LLMs across all providers with a single API key and a standard API. Setup your own cost, latency, and output speed constraints. Define a custom quality metric. Personalize your router for your requirements. Systematically send your queries to the fastest provider, based on the very latest benchmark data for your region of the world, refreshed every 10 minutes. Get started with Unify with our dedicated walkthrough. Discover the features you already have access to and our upcoming roadmap. Just create a Unify account to access all models from all supported providers with a single API key. Our router balances output quality, speed, and cost based on user-specific preferences. The quality is predicted ahead of time using a neural scoring function, which predicts how good each model would be at responding to a given prompt.
    Starting Price: $1 per credit
  • 6
    Not Diamond

    Not Diamond

    Not Diamond

    Call the right model at the right time with the world's most powerful AI model router. Make the most of every model with relentless precision and speed. Not Diamond works out of the box with no setup, or train your own custom router with your evaluation data and benefit from model routing optimized to your use case. Select the right model in less time than it takes to stream a single token. Efficiently leverage faster and cheaper models without degrading quality. Program the best prompt for each LLM so you always call the right model with the right prompt. No more manual tweaking and experimentation. Not Diamond is not a proxy and all requests are made client-side. Enable fuzzy hashing on our API or deploy directly to your infra for maximum security. For any input, Not Diamond automatically determines which model is best suited to respond, delivering a state-of-the-art performance that beats every foundation model on every major benchmark.
    Starting Price: $100 per month
  • 7
    Vercel AI Gateway
    Vercel AI Gateway is a unified AI infrastructure platform that allows developers to access, manage, and route requests across hundreds of AI models and providers through a single API interface. Built as part of the Vercel AI ecosystem, the platform supports text, image, and video generation models from providers such as OpenAI, Anthropic, xAI, and others while simplifying authentication, billing, observability, and failover management. Developers can use one API key and centralized dashboard to integrate multiple AI providers into applications without managing separate provider accounts or infrastructure. The platform also includes built-in routing, automatic failovers, usage tracking, unified billing, and compatibility with SDKs such as the Vercel AI SDK, enabling faster development and more resilient AI-powered applications.
  • 8
    LiteLLM

    LiteLLM

    LiteLLM

    ​LiteLLM is a versatile platform designed to streamline interactions with over 100 Large Language Models (LLMs) through a unified interface. It offers both a Proxy Server (LLM Gateway) and a Python SDK, enabling developers to integrate various LLMs seamlessly into their applications. The Proxy Server facilitates centralized management, allowing for load balancing, cost tracking across projects, and consistent input/output formatting compatible with OpenAI standards. This setup supports multiple providers. It ensures robust observability by generating unique call IDs for each request, aiding in precise tracking and logging across systems. Developers can leverage pre-defined callbacks to log data using various tools. For enterprise users, LiteLLM offers advanced features like Single Sign-On (SSO), user management, and professional support through dedicated channels like Discord and Slack.
    Starting Price: Free
  • 9
    Pruna AI

    Pruna AI

    Pruna AI

    Pruna uses generative AI to enable companies to produce professional-grade visual content quickly and affordably. By eliminating the traditional need for studios and manual editing, it empowers brands to create consistent, customized images for advertising, product displays, and digital campaigns with minimal effort.
    Starting Price: $0.40 per runtime hour
  • 10
    LangDB

    LangDB

    LangDB

    LangDB offers a community-driven, open-access repository focused on natural language processing tasks and datasets for multiple languages. It serves as a central resource for tracking benchmarks, sharing tools, and supporting the development of multilingual AI models with an emphasis on openness and cross-linguistic representation.
    Starting Price: $49 per month
  • 11
    LLM Gateway

    LLM Gateway

    LLM Gateway

    LLM Gateway is a fully open source, unified API gateway that lets you route, manage, and analyze requests to any large language model provider, OpenAI, Anthropic, Gemini Enterprise Agent Platform, and more, using a single, OpenAI-compatible endpoint. It offers multi-provider support with seamless migration and integration, dynamic model orchestration that routes each request to the optimal engine, and comprehensive usage analytics to track requests, token consumption, response times, and costs in real time. Built-in performance monitoring lets you compare models’ accuracy and cost-effectiveness, while secure key management centralizes API credentials under role-based controls. You can deploy LLM Gateway on your own infrastructure under the MIT license or use the hosted service as a progressive web app, and simple integration means you only need to change your API base URL, your existing code in any language or framework (cURL, Python, TypeScript, Go, etc.)
    Starting Price: $50 per month
  • 12
    TensorBlock

    TensorBlock

    TensorBlock

    TensorBlock is an open source AI infrastructure platform designed to democratize access to large language models through two complementary components. It has a self-hosted, privacy-first API gateway that unifies connections to any LLM provider under a single, OpenAI-compatible endpoint, with encrypted key management, dynamic model routing, usage analytics, and cost-optimized orchestration. TensorBlock Studio delivers a lightweight, developer-friendly multi-LLM interaction workspace featuring a plugin-based UI, extensible prompt workflows, real-time conversation history, and integrated natural-language APIs for seamless prompt engineering and model comparison. Built on a modular, scalable architecture and guided by principles of openness, composability, and fairness, TensorBlock enables organizations to experiment, deploy, and manage AI agents with full control and minimal infrastructure overhead.
    Starting Price: Free
  • 13
    OrcaRouter

    OrcaRouter

    OrcaRouter

    OrcaRouter is an OpenAI-compatible AI model router that sends each prompt to the right model across OpenAI, Anthropic, Gemini, DeepSeek, Qwen, Kimi, and 200+ frontier and open source models. It is built to preserve frontier answer quality while reducing AI inference spend by grading every prompt and routing hard reasoning to frontier models and routine work to lower-cost open-source models. The routing is quality-graded, never a blind, cheap-model swap, and each request shows the difficulty grade, selected model, provider, and cost so routes are visible, auditable, and reproducible. Developers can switch by changing the API base URL, while existing SDKs, model names, and streaming behavior continue to work as before. OrcaRouter supports automatic failover, so if a provider goes down mid-stream, traffic can switch transparently, and the application avoids user-facing errors. It also includes API key management with spend caps, model allowlists, rate limits, budget enforcement, and more.
    Starting Price: $29 per month
  • 14
    Factory Router

    Factory Router

    Factory Router

    Factory Router is an automatic model-selection system for autonomous software engineering workflows, designed to deliver frontier performance at lower cost and with higher reliability. Instead of expecting engineers to manually choose the best model for every task, Factory Router automatically selects the right model for each Droid session, drawing from a diverse pool of frontier and efficient models. Simple questions, mechanical refactors, documentation updates, small bug fixes, search-heavy investigations, and other routine work can be handled by efficient models, while harder work that genuinely needs deeper reasoning can stay on frontier models. If the selected model struggles to complete a task, Factory Router can move the session to a more capable model to reliably preserve high-quality outcomes. It also routes across models, providers, and capacity sources when endpoints degrade, rate limits hit, or capacity becomes constrained, helping Droid sessions keep working.
    Starting Price: Free
  • 15
    Portkey

    Portkey

    Portkey.ai

    Launch production-ready apps with the LMOps stack for monitoring, model management, and more. Replace your OpenAI or other provider APIs with the Portkey endpoint. Manage prompts, engines, parameters, and versions in Portkey. Switch, test, and upgrade models with confidence! View your app performance & user level aggregate metics to optimise usage and API costs Keep your user data secure from attacks and inadvertent exposure. Get proactive alerts when things go bad. A/B test your models in the real world and deploy the best performers. We built apps on top of LLM APIs for the past 2 and a half years and realised that while building a PoC took a weekend, taking it to production & managing it was a pain! We're building Portkey to help you succeed in deploying large language models APIs in your applications. Regardless of you trying Portkey, we're always happy to help!
    Starting Price: $49 per month
  • 16
    Manifest

    Manifest

    Manifest

    Manifest is a Backend-as-a-Service (BaaS) designed to accelerate app development by simplifying the backend part. With a focus on developer efficiency, Manifest allows developers to get a complete backend that fits into only 1 YMAL file, enabling teams to go from idea to deployment faster. It integrates seamlessly with any front-end and scales effortlessly. Built with flexibility in mind, Manifest supports multiple use cases, from MVPs to production-grade applications. Developers can focus on building projects while Manifest takes care of the backend.
    Starting Price: $0
  • 17
    Substrate

    Substrate

    Substrate

    Substrate is the platform for agentic AI. Elegant abstractions and high-performance components, optimized models, vector database, code interpreter, and model router. Substrate is the only compute engine designed to run multi-step AI workloads. Describe your task by connecting components and let Substrate run it as fast as possible. We analyze your workload as a directed acyclic graph and optimize the graph, for example, merging nodes that can be run in a batch. The Substrate inference engine automatically schedules your workflow graph with optimized parallelism, reducing the complexity of chaining multiple inference APIs. No more async programming, just connect nodes and let Substrate parallelize your workload. Our infrastructure guarantees your entire workload runs in the same cluster, often on the same machine. You won’t spend fractions of a second per task on unnecessary data roundtrips and cross-region HTTP transport.
    Starting Price: $30 per month
  • 18
    RouteLLM
    Developed by LM-SYS, RouteLLM is an open-source toolkit that allows users to route tasks between different large language models to improve efficiency and manage resources. It supports strategy-based routing, helping developers balance speed, accuracy, and cost by selecting the best model for each input dynamically.
  • 19
    FastRouter

    FastRouter

    FastRouter

    FastRouter is a unified API gateway that enables AI applications to access many large language, image, and audio models (like GPT-5, Claude 4 Opus, Gemini 2.5 Pro, Grok 4, etc.) through a single OpenAI-compatible endpoint. It features automatic routing, which dynamically picks the optimal model per request based on factors like cost, latency, and output quality. It supports massive scale (no imposed QPS limits) and ensures high availability via instant failover across model providers. FastRouter also includes cost control and governance tools to set budgets, rate limits, and model permissions per API key or project, and it delivers real-time analytics on token usage, request counts, and spending trends. The integration process is minimal; you simply swap your OpenAI base URL to FastRouter’s endpoint and configure preferences in the dashboard; the routing, optimization, and failover functions then run transparently.
  • 20
    Martian

    Martian

    Martian

    By using the best-performing model for each request, we can achieve higher performance than any single model. Martian outperforms GPT-4 across OpenAI's evals (open/evals). We turn opaque black boxes into interpretable representations. Our router is the first tool built on top of our model mapping method. We are developing many other applications of model mapping including turning transformers from indecipherable matrices into human-readable programs. If a company experiences an outage or high latency period, automatically reroute to other providers so your customers never experience any issues. Determine how much you could save by using the Martian Model Router with our interactive cost calculator. Input your number of users, tokens per session, and sessions per month, and specify your cost/quality tradeoff.
  • 21
    Requesty

    Requesty

    Requesty

    Requesty is a cutting-edge platform designed to optimize AI workloads by intelligently routing requests to the most appropriate model based on the task at hand. With advanced features like automatic fallback mechanisms and queuing, Requesty ensures uninterrupted service delivery, even during model downtimes. The platform supports a wide range of models such as GPT-4, Claude 3.5, and DeepSeek, and offers AI application observability, allowing users to track model performance and optimize their usage. By reducing API costs and improving efficiency, Requesty empowers developers to build smarter, more reliable AI applications.
  • 22
    Sudo

    Sudo

    Sudo

    Sudo offers “one API for all models”, a unified interface so developers can integrate multiple large language models and generative AI tools (for text, image, audio) through a single endpoint. It handles routing between different models to optimize for things like latency, throughput, cost, or whatever criteria you choose. The platform supports flexible billing and monetization options; subscription tiers, usage-based metered billing, or hybrids. It also supports in-context AI-native ads (you can insert context-aware ads into AI outputs, controlling relevance and frequency). Onboarding is quick: you create an API key, install their SDK (Python or TypeScript), and start making calls to the AI endpoints. They emphasize low latency (“optimized for real-time AI”), better throughput compared with some alternatives, and avoiding vendor lock-in.
  • 23
    PromptUnit

    PromptUnit

    PromptUnit

    PromptUnit is an AI inference proxy that reduces AI costs automatically by sitting between an app and its AI providers with no code changes required. Teams swap the base URL, keep the same SDK, endpoints, response parsing, and error handling, then PromptUnit handles routing, failover, cost tracking, and quality validation. It logs every API call by model, feature, user segment, token count, latency, and cost, giving real-time visibility into where AI spend is going before any routing changes go live. In observation mode, PromptUnit watches traffic, shadow-classifies requests, forecasts savings, and explains routing decisions so teams can see exact savings before enabling live routing. Once enabled, Smart Routing uses task classification to route each request to the cheapest model that clears the configured quality bar. PromptUnit also includes prompt compression, token inflation defense, prompt efficiency scoring, semantic request caching, and multi-model consensus.
  • 24
    nexos.ai

    nexos.ai

    nexos.ai

    nexos.ai is an all-in-one AI platform that helps drive secure organization wide AI adoption. Teach leaders set policies & guardrails and oversee AI usage. Business teams use any AI models they need. Our platform consists of two powerful products: AI Gateway and AI Workspace. AI Gateway integrates multiple LLMs seamlessly, while AI Workspace offers a secure, web-based environment for working with AI. Founded by the team behind Europe's fastest-growing businesses, nexos.ai has already secured an $8 million investment from industry leaders and angel investors, including Index Ventures.
  • 25
    Bifrost

    Bifrost

    Maxim AI

    Bifrost is a high-performance AI gateway that unifies access to 20+ providers OpenAI, Anthropic, AWS, Bedrock, Google Vertex, Azure, and more, through a unified API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade governance. In sustained benchmarks at 5,000 requests per second, Bifrost adds only 11 µs of overhead per request.
  • Previous
  • You're on page 1
  • Next
Auth0 Logo