Alternatives to EXAONE
Compare EXAONE alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to EXAONE in 2024. Compare features, ratings, user reviews, pricing, and more from EXAONE competitors and alternatives in order to make an informed decision for your business.
-
1
Gopher
DeepMind
Language, and its role in demonstrating and facilitating comprehension - or intelligence - is a fundamental part of being human. It gives people the ability to communicate thoughts and concepts, express ideas, create memories, and build mutual understanding. These are foundational parts of social intelligence. It’s why our teams at DeepMind study aspects of language processing and communication, both in artificial agents and in humans. As part of a broader portfolio of AI research, we believe the development and study of more powerful language models – systems that predict and generate text – have tremendous potential for building advanced AI systems that can be used safely and efficiently to summarise information, provide expert advice and follow instructions via natural language. Developing beneficial language models requires research into their potential impacts, including the risks they pose. -
2
Med-PaLM 2
Google Cloud
Healthcare breakthroughs change the world and bring hope to humanity through scientific rigor, human insight, and compassion. We believe AI can contribute to this, with thoughtful collaboration between researchers, healthcare organizations, and the broader ecosystem. Today, we're sharing exciting progress on these initiatives, with the announcement of limited access to Google’s medical large language model, or LLM, called Med-PaLM 2. It will be available in the coming weeks to a select group of Google Cloud customers for limited testing, to explore use cases and share feedback as we investigate safe, responsible, and meaningful ways to use this technology. Med-PaLM 2 harnesses the power of Google’s LLMs, aligned to the medical domain to more accurately and safely answer medical questions. As a result, Med-PaLM 2 was the first LLM to perform at an “expert” test-taker level performance on the MedQA dataset of US Medical Licensing Examination (USMLE)-style questions. -
3
Phi-2
Microsoft
We are now releasing Phi-2, a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters. On complex benchmarks Phi-2 matches or outperforms models up to 25x larger, thanks to new innovations in model scaling and training data curation. With its compact size, Phi-2 is an ideal playground for researchers, including for exploration around mechanistic interpretability, safety improvements, or fine-tuning experimentation on a variety of tasks. We have made Phi-2 available in the Azure AI Studio model catalog to foster research and development on language models. -
4
PanGu-α
Huawei
PanGu-α is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-α, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-α in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-α in performing various tasks under few-shot or zero-shot settings. -
5
Alpa
Alpa
Alpa aims to automate large-scale distributed training and serving with just a few lines of code. Alpa was initially developed by folks in the Sky Lab, UC Berkeley. Some advanced techniques used in Alpa have been written in a paper published in OSDI'2022. Alpa community is growing with new contributors from Google. A language model is a probability distribution over sequences of words. It predicts the next word based on all the previous words. It is useful for a variety of AI applications, such the auto-completion in your email or chatbot service. For more information, check out the language model wikipedia page. GPT-3 is very large language model, with 175 billion parameters, that uses deep learning to produce human-like text. Many researchers and news articles described GPT-3 as "one of the most interesting and important AI systems ever produced". GPT-3 is gradually being used as a backbone in the latest NLP research and applications.Starting Price: Free -
6
Sarvam AI
Sarvam AI
We are developing efficient large language models for India's diverse linguistic culture and enabling new GenAI applications through bespoke enterprise models. We are building an enterprise-grade platform that lets you develop and evaluate your company’s GenAI apps. We believe in the power of open-source to accelerate AI innovation and will be contributing to open-source models and datasets, as well be leading efforts for large-scale data curation in public-good space. We are a dynamic and close-knit team of AI pioneers, blending expertise in research, engineering, product design, and business operations. Our diverse backgrounds unite under a shared commitment to excellence in science and the creation of societal impact. We foster an environment where tackling complex tech challenges is not just a job, but a passion. -
7
Mathstral
Mistral AI
As a tribute to Archimedes, whose 2311th anniversary we’re celebrating this year, we are proud to release our first Mathstral model, a specific 7B model designed for math reasoning and scientific discovery. The model has a 32k context window published under the Apache 2.0 license. We’re contributing Mathstral to the science community to bolster efforts in advanced mathematical problems requiring complex, multi-step logical reasoning. The Mathstral release is part of our broader effort to support academic projects, it was produced in the context of our collaboration with Project Numina. Akin to Isaac Newton in his time, Mathstral stands on the shoulders of Mistral 7B and specializes in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks. In particular, it achieves 56.6% on MATH and 63.47% on MMLU, with the following MMLU performance difference by subject between Mathstral 7B and Mistral 7B. -
8
InstructGPT
OpenAI
InstructGPT is an open-source framework for training language models to generate natural language instructions from visual input. It uses a generative pre-trained transformer (GPT) model and the state-of-the-art object detector, Mask R-CNN, to detect objects in images and generate natural language sentences that describe the image. InstructGPT is designed to be effective across domains such as robotics, gaming and education; it can assist robots in navigating complex tasks with natural language instructions, or help students learn by providing descriptive explanations of processes or events.Starting Price: $0.0200 per 1000 tokens -
9
Medical LLM
John Snow Labs
John Snow Labs' Medical LLM is an advanced, domain-specific large language model (LLM) designed to revolutionize the way healthcare organizations harness the power of artificial intelligence. This innovative platform is tailored specifically for the healthcare industry, combining cutting-edge natural language processing (NLP) capabilities with a deep understanding of medical terminology, clinical workflows, and regulatory requirements. The result is a powerful tool that enables healthcare providers, researchers, and administrators to unlock new insights, improve patient outcomes, and drive operational efficiency. At the heart of the Healthcare LLM is its comprehensive training on vast amounts of healthcare data, including clinical notes, research papers, and regulatory documents. This specialized training allows the model to accurately interpret and generate medical text, making it an invaluable asset for tasks such as clinical documentation, automated coding, and medical research. -
10
Cohere
Cohere AI
Build natural language understanding and generation into your product with a few lines of code. The Cohere API provides access to models that read billions of web pages and learn to understand the meaning, sentiment, and intent of the words we use. Use the Cohere API to write human-like text by completing a prompt or filling in blanks. You can write copy, generate code, summarize text, and more. Compute the likelihood of text and retrieve representations from the model. Use the likelihood API to filter text based on chosen categories or selected criteria. With representations, you can train your own downstream models on a wide variety of domain-specific natural language tasks. The Cohere API can compute the similarity between pieces of text, and make categorical predictions by comparing the likelihood of different text options. The model has multiple lenses through which to view ideas, so that it can recognize abstract similarities between concepts as distinct as DNA and computers.Starting Price: $0.40 / 1M Tokens -
11
Alpaca
Stanford Center for Research on Foundation Models (CRFM)
Instruction-following models such as GPT-3.5 (text-DaVinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful. Many users now interact with these models regularly and even use them for work. However, despite their widespread deployment, instruction-following models still have many deficiencies: they can generate false information, propagate social stereotypes, and produce toxic language. To make maximum progress on addressing these pressing problems, it is important for the academic community to engage. Unfortunately, doing research on instruction-following models in academia has been difficult, as there is no easily accessible model that comes close in capabilities to closed-source models such as OpenAI’s text-DaVinci-003. We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model. -
12
ESMFold
Meta
ESMFold shows how AI can give us new tools to understand the natural world, much like the microscope, which enabled us to see into the world at an infinitesimal scale and opened up a whole new understanding of life. AI can help us understand the immense scope of natural diversity, and see biology in a new way. Much of AI research has focused on helping computers understand the world in a way similar to how humans do. The language of proteins is one that is beyond human comprehension and has eluded even the most powerful computational tools. AI has the potential to open up this language to our understanding. Studying AI in new domains such as biology can also give insight into artificial intelligence more broadly. Our work reveals connections across domains: large language models that are behind advances in machine translation, natural language understanding, speech recognition, and image generation are also able to learn deep information about biology.Starting Price: Free -
13
YandexGPT
Yandex
Take advantage of the capabilities of generative language models to improve and optimize your applications and web services. Get an aggregated result of accumulated textual data whether it be information from work chats, user reviews, or other types of data. YandexGPT will help both summarize and interpret the information. Speed up text creation as you improve their quality and style. Create template texts for newsletters, product descriptions for online stores and other applications. Develop a chatbot for your support service: teach the bot to answer various user questions, both common and more complicated. Use the API to integrate the service with your applications and automate processes. -
14
GPT-J
EleutherAI
GPT-J is a cutting-edge language model created by the research organization EleutherAI. In terms of performance, GPT-J exhibits a level of proficiency comparable to that of OpenAI's renowned GPT-3 model in a range of zero-shot tasks. Notably, GPT-J has demonstrated the ability to surpass GPT-3 in tasks related to generating code. The latest iteration of this language model, known as GPT-J-6B, is built upon a linguistic dataset referred to as The Pile. This dataset, which is publicly available, encompasses a substantial volume of 825 gibibytes of language data, organized into 22 distinct subsets. While GPT-J shares certain capabilities with ChatGPT, it is important to note that GPT-J is not designed to operate as a chatbot; rather, its primary function is to predict text. In a significant development in March 2023, Databricks introduced Dolly, a model that follows instructions and is licensed under Apache.Starting Price: Free -
15
Megatron-Turing
NVIDIA
Megatron-Turing Natural Language Generation model (MT-NLG), is the largest and the most powerful monolithic transformer English language model with 530 billion parameters. This 105-layer, transformer-based MT-NLG improves upon the prior state-of-the-art models in zero-, one-, and few-shot settings. It demonstrates unmatched accuracy in a broad set of natural language tasks such as, Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation, etc. With the intent of accelerating research on the largest English language model till date and enabling customers to experiment, employ and apply such a large language model on downstream language tasks - NVIDIA is pleased to announce an Early Access program for its managed API service to MT-NLG mode. -
16
Giga ML
Giga ML
We just launched X1 large series of Models. Giga ML's most powerful model is available for pre-training and fine-tuning with on-prem deployment. Since we are Open AI compatible, your existing integrations with long chain, llama-index, and all others work seamlessly. You can continue pre-training of LLM's with domain-specific data books or docs or company docs. The world of large language models (LLMs) rapidly expanding, offering unprecedented opportunities for natural language processing across various domains. However, some critical challenges have remained unaddressed. At Giga ML, we proudly introduce the X1 Large 32k model, a pioneering on-premise LLM solution that addresses these critical issues. -
17
Granite Code
IBM
We introduce the Granite series of decoder-only code models for code generative tasks (e.g., fixing bugs, explaining code, documenting code), trained with code written in 116 programming languages. A comprehensive evaluation of the Granite Code model family on diverse tasks demonstrates that our models consistently reach state-of-the-art performance among available open source code LLMs. The key advantages of Granite Code models include: All-rounder Code LLM: Granite Code models achieve competitive or state-of-the-art performance on different kinds of code-related tasks, including code generation, explanation, fixing, editing, translation, and more. Demonstrating their ability to solve diverse coding tasks. Trustworthy Enterprise-Grade LLM: All our models are trained on license-permissible data collected following IBM's AI Ethics principles and guided by IBM’s Corporate Legal team for trustworthy enterprise usage.Starting Price: Free -
18
Mistral Large 2
Mistral AI
Mistral Large 2 has a 128k context window and supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Mistral Large 2 is designed for single-node inference with long-context applications in mind – its size of 123 billion parameters allows it to run at large throughput on a single node. We are releasing Mistral Large 2 under the Mistral Research License, that allows usage and modification for research and non-commercial usages.Starting Price: Free -
19
Claude
Anthropic
Claude is an artificial intelligence large language model that can process and generate human-like text. Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Large, general systems of today can have significant benefits, but can also be unpredictable, unreliable, and opaque: our goal is to make progress on these issues. For now, we’re primarily focused on research towards these goals; down the road, we foresee many opportunities for our work to create value commercially and for public benefit.Starting Price: Free -
20
ChatGPT Enterprise
OpenAI
Enterprise-grade security & privacy and the most powerful version of ChatGPT yet. 1. Customer prompts or data are not used for training models 2. Data encryption at rest (AES-256) and in transit (TLS 1.2+) 3. SOC 2 compliant 4. Dedicated admin console and easy bulk member management 5. SSO and Domain Verification 6. Analytics dashboard to understand usage 7. Unlimited, high-speed access to GPT-4 and Advanced Data Analysis* 8. 32k token context windows for 4X longer inputs and memory 9. Shareable chat templates for your company to collaborate -
21
Llama 3.2
Meta
The open-source AI model you can fine-tune, distill and deploy anywhere is now available in more versions. Choose from 1B, 3B, 11B or 90B, or continue building with Llama 3.1 Llama 3.2 is a collection of large language models (LLMs) pretrained and fine-tuned in 1B and 3B sizes that are multilingual text only, and 11B and 90B sizes that take both text and image inputs and output text. Develop highly performative and efficient applications from our latest release. Use our 1B or 3B models for on device applications such as summarizing a discussion from your phone or calling on-device tools like calendar. Use our 11B or 90B models for image use cases such as transforming an existing image into something new or getting more information from an image of your surroundings.Starting Price: Free -
22
AI21 Studio
AI21 Studio
AI21 Studio provides API access to Jurassic-1 large-language-models. Our models power text generation and comprehension features in thousands of live applications. Take on any language task. Our Jurassic-1 models are trained to follow natural language instructions and require just a few examples to adapt to new tasks. Use our specialized APIs for common tasks like summarization, paraphrasing and more. Access superior results at a lower cost without reinventing the wheel. Need to fine-tune your own custom model? You're just 3 clicks away. Training is fast, affordable and trained models are deployed immediately. Give your users superpowers by embedding an AI co-writer in your app. Drive user engagement and success with features like long-form draft generation, paraphrasing, repurposing and custom auto-complete.Starting Price: $29 per month -
23
GPT-3
OpenAI
Our GPT-3 models can understand and generate natural language. We offer four main models with different levels of power suitable for different tasks. Davinci is the most capable model, and Ada is the fastest. The main GPT-3 models are meant to be used with the text completion endpoint. We also offer models that are specifically meant to be used with other endpoints. Davinci is the most capable model family and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results. These increased capabilities require more compute resources, so Davinci costs more per API call and is not as fast as the other models.Starting Price: $0.0200 per 1000 tokens -
24
GPT-3.5
OpenAI
GPT-3.5 is the next evolution of GPT 3 large language model from OpenAI. GPT-3.5 models can understand and generate natural language. We offer four main models with different levels of power suitable for different tasks. The main GPT-3.5 models are meant to be used with the text completion endpoint. We also offer models that are specifically meant to be used with other endpoints. Davinci is the most capable model family and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results. These increased capabilities require more compute resources, so Davinci costs more per API call and is not as fast as the other models.Starting Price: $0.0200 per 1000 tokens -
25
DBRX
Databricks
Today, we are excited to introduce DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to our measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B in programming, in addition to its strength as a general-purpose LLM. This state-of-the-art quality comes with marked improvements in training and inference performance. DBRX advances the state-of-the-art in efficiency among open models thanks to its fine-grained mixture-of-experts (MoE) architecture. Inference is up to 2x faster than LLaMA2-70B, and DBRX is about 40% of the size of Grok-1 in terms of both total and active parameter counts. -
26
LFM-3B
Liquid AI
LFM-3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models, but also outperforms the previous generation of 7B and 13B models. It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller. LFM-3B is the ideal choice for mobile and other edge text-based applications. -
27
Pixtral 12B
Mistral AI
Pixtral 12B is a pioneering multimodal AI model developed by Mistral AI, designed to process and interpret both text and image data seamlessly. This model marks a significant advancement in the integration of different data types, allowing for more intuitive interactions and enhanced content creation capabilities. With a foundation built upon Mistral's NeMo 12B text model, Pixtral 12B incorporates an additional vision adapter that adds approximately 400 million parameters, expanding its ability to handle visual inputs up to 1024 x 1024 pixels in size. This model supports a variety of applications, from detailed image analysis to answering questions about visual content, showcasing its versatility in real-world applications. Pixtral 12B's architecture not only supports a large context window of 128k tokens but also employs innovative techniques like GeLU activation and 2D RoPE for its vision components, making it a robust tool for developers and enterprises aiming to leverage AI.Starting Price: Free -
28
Stable LM
Stability AI
Stable LM: Stability AI Language Models. The release of Stable LM builds on our experience in open-sourcing earlier language models with EleutherAI, a nonprofit research hub. These language models include GPT-J, GPT-NeoX, and the Pythia suite, which were trained on The Pile open-source dataset. Many recent open-source language models continue to build on these efforts, including Cerebras-GPT and Dolly-2. Stable LM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course. The richness of this dataset gives Stable LM surprisingly high performance in conversational and coding tasks, despite its small size of 3 to 7 billion parameters (by comparison, GPT-3 has 175 billion parameters). Stable LM 3B is a compact language model designed to operate on portable digital devices like handhelds and laptops, and we’re excited about its capabilities and portability.Starting Price: Free -
29
Inflection AI
Inflection AI
Inflection AI is a cutting-edge artificial intelligence research and development company focused on creating advanced AI systems designed to interact with humans in more natural, intuitive ways. Founded in 2022 by entrepreneurs such as Mustafa Suleyman, one of the co-founders of DeepMind, and Reid Hoffman, co-founder of LinkedIn, the company's mission is to make powerful AI more accessible and aligned with human values. Inflection AI specializes in building large-scale language models that enhance human-AI communication, aiming to transform industries ranging from customer service to personal productivity through intelligent, responsive, and ethically designed AI systems. The company's focus on safety, transparency, and user control ensures that their innovations contribute positively to society while addressing potential risks associated with AI technology.Starting Price: Free -
30
PaLM 2
Google
PaLM 2 is our next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI. It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements. PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. It was evaluated rigorously for its potential harms and biases, capabilities and downstream uses in research and in-product applications. It’s being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API. -
31
LaMDA
Google
LaMDA, our latest research breakthrough, adds pieces to one of the most tantalizing sections of that puzzle: conversation. While conversations tend to revolve around specific topics, their open-ended nature means they can start in one place and end up somewhere completely different. A chat with a friend about a TV show could evolve into a discussion about the country where the show was filmed before settling on a debate about that country’s best regional cuisine. That meandering quality can quickly stump modern conversational agents (commonly known as chatbots), which tend to follow narrow, pre-defined paths. But LaMDA — short for “Language Model for Dialogue Applications” — can engage in a free-flowing way about a seemingly endless number of topics, an ability we think could unlock more natural ways of interacting with technology and entirely new categories of helpful applications. -
32
NVIDIA NeMo Megatron
NVIDIA
NVIDIA NeMo Megatron is an end-to-end framework for training and deploying LLMs with billions and trillions of parameters. NVIDIA NeMo Megatron, part of the NVIDIA AI platform, offers an easy, efficient, and cost-effective containerized framework to build and deploy LLMs. Designed for enterprise application development, it builds upon the most advanced technologies from NVIDIA research and provides an end-to-end workflow for automated distributed data processing, training large-scale customized GPT-3, T5, and multilingual T5 (mT5) models, and deploying models for inference at scale. Harnessing the power of LLMs is made easy through validated and converged recipes with predefined configurations for training and inference. Customizing models is simplified by the hyperparameter tool, which automatically searches for the best hyperparameter configurations and performance for training and inference on any given distributed GPU cluster configuration. -
33
Mixtral 8x22B
Mistral AI
Mixtral 8x22B is our latest open model. It sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. It is fluent in English, French, Italian, German, and Spanish. It has strong mathematics and coding capabilities. It is natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale. Its 64K tokens context window allows precise information recall from large documents. We build models that offer unmatched cost efficiency for their respective sizes, delivering the best performance-to-cost ratio within models provided by the community. Mixtral 8x22B is a natural continuation of our open model family. Its sparse activation patterns make it faster than any dense 70B model.Starting Price: Free -
34
Command R
Cohere
Command’s model outputs come with clear citations that mitigate the risk of hallucinations and enable the surfacing of additional context from the source materials. Command can write product descriptions, help draft emails, suggest example press releases, and much more. Ask Command multiple questions about a document to assign a category to the document, extract a piece of information, or answer a general question about the document. Where answering a few questions about a document can save you a few minutes, doing it for thousands of documents can save a company years. This family of scalable models balances high efficiency with strong accuracy to enable enterprises to move from proof of concept into production-grade AI. -
35
Mistral NeMo
Mistral AI
Mistral NeMo, our new best small model. A state-of-the-art 12B model with 128k context length, and released under the Apache 2.0 license. Mistral NeMo is a 12B model built in collaboration with NVIDIA. Mistral NeMo offers a large context window of up to 128k tokens. Its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B. We have released pre-trained base and instruction-tuned checkpoints under the Apache 2.0 license to promote adoption for researchers and enterprises. Mistral NeMo was trained with quantization awareness, enabling FP8 inference without any performance loss. The model is designed for global, multilingual applications. It is trained on function calling and has a large context window. Compared to Mistral 7B, it is much better at following precise instructions, reasoning, and handling multi-turn conversations.Starting Price: Free -
36
Command R+
Cohere
Command R+ is Cohere's newest large language model, optimized for conversational interaction and long-context tasks. It aims at being extremely performant, enabling companies to move beyond proof of concept and into production. We recommend using Command R+ for those workflows that lean on complex RAG functionality and multi-step tool use (agents). Command R, on the other hand, is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks, as well as applications where price is a major consideration.Starting Price: Free -
37
Martian
Martian
By using the best-performing model for each request, we can achieve higher performance than any single model. Martian outperforms GPT-4 across OpenAI's evals (open/evals). We turn opaque black boxes into interpretable representations. Our router is the first tool built on top of our model mapping method. We are developing many other applications of model mapping including turning transformers from indecipherable matrices into human-readable programs. If a company experiences an outage or high latency period, automatically reroute to other providers so your customers never experience any issues. Determine how much you could save by using the Martian Model Router with our interactive cost calculator. Input your number of users, tokens per session, and sessions per month, and specify your cost/quality tradeoff. -
38
Qwen-7B
Alibaba
Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. The features of the Qwen-7B series include: Trained with high-quality pretraining data. We have pretrained Qwen-7B on a self-constructed large-scale high-quality dataset of over 2.2 trillion tokens. The dataset includes plain texts and codes, and it covers a wide range of domains, including general domain data and professional domain data. Strong performance. In comparison with the models of the similar model size, we outperform the competitors on a series of benchmark datasets, which evaluates natural language understanding, mathematics, coding, etc. And more.Starting Price: Free -
39
Doubao
ByteDance
Doubao is an intelligent language model developed by ByteDance. It has been providing useful answers and insights to users across a wide range of topics. Doubao can handle complex questions, offer detailed explanations, and engage in meaningful conversations. With its advanced language understanding and generation capabilities, it continues to assist people in seeking knowledge, solving problems, and exploring new ideas. Whether for academic inquiries, creative inspiration, or simply having a conversation, Doubao is a valuable tool for users looking for accurate and helpful information.Starting Price: Free -
40
DeepSeek LLM
DeepSeek
Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. -
41
Claude 3 Opus
Anthropic
Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence. All Claude 3 models show increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.Starting Price: Free -
42
OpenAI
OpenAI
OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome. Apply our API to any language task — semantic search, summarization, sentiment analysis, content generation, translation, and more — with only a few examples or by specifying your task in English. One simple integration gives you access to our constantly-improving AI technology. Explore how you integrate with the API with these sample completions. -
43
T5
Google
With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself. -
44
PanGu-Σ
Huawei
Significant advancements in the field of natural language processing, understanding, and generation have been achieved through the expansion of large language models. This study introduces a system which utilizes Ascend 910 AI processors and the MindSpore framework to train a language model with over a trillion parameters, specifically 1.085T, named PanGu-{\Sigma}. This model, which builds upon the foundation laid by PanGu-{\alpha}, takes the traditionally dense Transformer model and transforms it into a sparse one using a concept known as Random Routed Experts (RRE). The model was efficiently trained on a dataset of 329 billion tokens using a technique called Expert Computation and Storage Separation (ECSS), leading to a 6.3-fold increase in training throughput via heterogeneous computing. Experimentation indicates that PanGu-{\Sigma} sets a new standard in zero-shot learning for various downstream Chinese NLP tasks. -
45
GPT-4
OpenAI
GPT-4 (Generative Pre-trained Transformer 4) is a large-scale unsupervised language model, yet to be released by OpenAI. GPT-4 is the successor to GPT-3 and part of the GPT-n series of natural language processing models, and was trained on a dataset of 45TB of text to produce human-like text generation and understanding capabilities. Unlike most other NLP models, GPT-4 does not require additional training data for specific tasks. Instead, it can generate text or answer questions using only its own internally generated context as input. GPT-4 has been shown to be able to perform a wide variety of tasks without any task specific training data such as translation, summarization, question answering, sentiment analysis and more.Starting Price: $0.0200 per 1000 tokens -
46
Qwen
Alibaba
Qwen LLM refers to a family of large language models (LLMs) developed by Alibaba Cloud's Damo Academy. These models are trained on a massive dataset of text and code, allowing them to understand and generate human-like text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Here are some key features of Qwen LLMs: Variety of sizes: The Qwen series ranges from 1.8 billion to 72 billion parameters, offering options for different needs and performance levels. Open source: Some versions of Qwen are open-source, which means their code is publicly available for anyone to use and modify. Multilingual support: Qwen can understand and translate multiple languages, including English, Chinese, and French. Diverse capabilities: Besides generation and translation, Qwen models can be used for tasks like question answering, text summarization, and code generation.Starting Price: Free -
47
Gemini Ultra
Google
Gemini Ultra is a powerful new language model from Google DeepMind. It is the largest and most capable model in the Gemini family, which also includes Gemini Pro and Gemini Nano. Gemini Ultra is designed for highly complex tasks, such as natural language processing, machine translation, and code generation. It is also the first language model to outperform human experts on the Massive Multitask Language Understanding (MMLU) test, obtaining a score of 90%. -
48
Mixtral 8x7B
Mistral AI
Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT-3.5 on most standard benchmarks.Starting Price: Free -
49
Aya
Cohere AI
Aya is a new state-of-the-art, open-source, massively multilingual, generative large language research model (LLM) covering 101 different languages — more than double the number of languages covered by existing open-source models. Aya helps researchers unlock the powerful potential of LLMs for dozens of languages and cultures largely ignored by most advanced models on the market today. We are open-sourcing both the Aya model, as well as the largest multilingual instruction fine-tuned dataset to-date with a size of 513 million covering 114 languages. This data collection includes rare annotations from native and fluent speakers all around the world, ensuring that AI technology can effectively serve a broad global audience that have had limited access to-date. -
50
Jurassic-1
AI21 Labs
Jurassic-1 models come in two sizes, where the Jumbo version, at 178B parameters, is the largest and most sophisticated language model ever released for general use by developers. AI21 Studio is currently in open beta, allowing anyone to sign up and immediately start querying Jurassic-1 using our API and interactive web environment. Our mission at AI21 Labs is to fundamentally reimagine the way humans read and write by introducing machines as thought partners, and the only way we can achieve this is if we take on this challenge together. We’ve been researching language models since our Mesozoic Era (aka 2017 😉). Jurassic-1 builds on this research, and it is the first generation of models we’re making available for widespread use.