Alternatives to Alpa

Compare Alpa alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Alpa in 2024. Compare features, ratings, user reviews, pricing, and more from Alpa competitors and alternatives in order to make an informed decision for your business.

  • 1
    Adept

    Adept

    Adept

    Adept is an ML research and product lab building general intelligence by enabling humans and computers to work together creatively. Designed and trained specifically for taking actions on computers in response to your natural language commands. ACT-1 is our first step towards a foundation model that can use every software tool, API and website that exists. Adept is building an entirely new way to get things done. It takes your goals, in plain language, and turns them into actions on the software you use every day. We believe that AI systems should be built with users at the center — where machines work together with people in the driver's seat, discovering new solutions, enabling more informed decisions, and giving us more time for the work we love.
  • 2
    GPT-4

    GPT-4

    OpenAI

    GPT-4 (Generative Pre-trained Transformer 4) is a large-scale unsupervised language model, yet to be released by OpenAI. GPT-4 is the successor to GPT-3 and part of the GPT-n series of natural language processing models, and was trained on a dataset of 45TB of text to produce human-like text generation and understanding capabilities. Unlike most other NLP models, GPT-4 does not require additional training data for specific tasks. Instead, it can generate text or answer questions using only its own internally generated context as input. GPT-4 has been shown to be able to perform a wide variety of tasks without any task specific training data such as translation, summarization, question answering, sentiment analysis and more.
    Starting Price: $0.0200 per 1000 tokens
  • 3
    GPT-NeoX

    GPT-NeoX

    EleutherAI

    An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.
    Starting Price: Free
  • 4
    Azure OpenAI Service
    Apply advanced coding and language models to a variety of use cases. Leverage large-scale, generative AI models with deep understandings of language and code to enable new reasoning and comprehension capabilities for building cutting-edge applications. Apply these coding and language models to a variety of use cases, such as writing assistance, code generation, and reasoning over data. Detect and mitigate harmful use with built-in responsible AI and access enterprise-grade Azure security. Gain access to generative models that have been pretrained with trillions of words. Apply them to new scenarios including language, code, reasoning, inferencing, and comprehension. Customize generative models with labeled data for your specific scenario using a simple REST API. Fine-tune your model's hyperparameters to increase accuracy of outputs. Use the few-shot learning capability to provide the API with examples and achieve more relevant results.
    Starting Price: $0.0004 per 1000 tokens
  • 5
    Megatron-Turing
    Megatron-Turing Natural Language Generation model (MT-NLG), is the largest and the most powerful monolithic transformer English language model with 530 billion parameters. This 105-layer, transformer-based MT-NLG improves upon the prior state-of-the-art models in zero-, one-, and few-shot settings. It demonstrates unmatched accuracy in a broad set of natural language tasks such as, Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation, etc. With the intent of accelerating research on the largest English language model till date and enabling customers to experiment, employ and apply such a large language model on downstream language tasks - NVIDIA is pleased to announce an Early Access program for its managed API service to MT-NLG mode.
  • 6
    Claude

    Claude

    Anthropic

    Claude is an artificial intelligence large language model that can process and generate human-like text. Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Large, general systems of today can have significant benefits, but can also be unpredictable, unreliable, and opaque: our goal is to make progress on these issues. For now, we’re primarily focused on research towards these goals; down the road, we foresee many opportunities for our work to create value commercially and for public benefit.
    Starting Price: Free
  • 7
    LUIS

    LUIS

    Microsoft

    Language Understanding (LUIS): A machine learning-based service to build natural language into apps, bots, and IoT devices. Quickly create enterprise-ready, custom models that continuously improve. Add natural language to your apps. Designed to identify valuable information in conversations, LUIS interprets user goals (intents) and distills valuable information from sentences (entities), for a high quality, nuanced language model. LUIS integrates seamlessly with the Azure Bot Service, making it easy to create a sophisticated bot. Powerful developer tools are combined with customizable pre-built apps and entity dictionaries, such as Calendar, Music, and Devices, so you can build and deploy a solution more quickly. Dictionaries are mined from the collective knowledge of the web and supply billions of entries, helping your model to correctly identify valuable information from user conversations. Active learning is used to continuously improve the quality of the models.
  • 8
    Codestral Mamba

    Codestral Mamba

    Mistral AI

    As a tribute to Cleopatra, whose glorious destiny ended in tragic snake circumstances, we are proud to release Codestral Mamba, a Mamba2 language model specialized in code generation, available under an Apache 2.0 license. Codestral Mamba is another step in our effort to study and provide new architectures. It is available for free use, modification, and distribution, and we hope it will open new perspectives in architecture research. Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length. It allows users to engage with the model extensively with quick responses, irrespective of the input length. This efficiency is especially relevant for code productivity use cases, this is why we trained this model with advanced code and reasoning capabilities, enabling it to perform on par with SOTA transformer-based models.
  • 9
    Sarvam AI

    Sarvam AI

    Sarvam AI

    We are developing efficient large language models for India's diverse linguistic culture and enabling new GenAI applications through bespoke enterprise models. We are building an enterprise-grade platform that lets you develop and evaluate your company’s GenAI apps. We believe in the power of open-source to accelerate AI innovation and will be contributing to open-source models and datasets, as well be leading efforts for large-scale data curation in public-good space. We are a dynamic and close-knit team of AI pioneers, blending expertise in research, engineering, product design, and business operations. Our diverse backgrounds unite under a shared commitment to excellence in science and the creation of societal impact. We foster an environment where tackling complex tech challenges is not just a job, but a passion.
  • 10
    NVIDIA NeMo Megatron
    NVIDIA NeMo Megatron is an end-to-end framework for training and deploying LLMs with billions and trillions of parameters. NVIDIA NeMo Megatron, part of the NVIDIA AI platform, offers an easy, efficient, and cost-effective containerized framework to build and deploy LLMs. Designed for enterprise application development, it builds upon the most advanced technologies from NVIDIA research and provides an end-to-end workflow for automated distributed data processing, training large-scale customized GPT-3, T5, and multilingual T5 (mT5) models, and deploying models for inference at scale. Harnessing the power of LLMs is made easy through validated and converged recipes with predefined configurations for training and inference. Customizing models is simplified by the hyperparameter tool, which automatically searches for the best hyperparameter configurations and performance for training and inference on any given distributed GPU cluster configuration.
  • 11
    EXAONE
    EXAONE is a large language model developed by LG AI Research with the goal of nurturing "Expert AI" in multiple domains. The Expert AI Alliance was formed as a collaborative effort among leading companies in various fields to advance the capabilities of EXAONE. Partner companies within the alliance will serve as mentors, providing skills, knowledge, and data to help EXAONE gain expertise in relevant domains. EXAONE, described as being akin to a college student who has completed general elective courses, requires additional intensive training to become an expert in specific areas. LG AI Research has already demonstrated EXAONE's abilities through real-world applications, such as Tilda, an AI human artist that debuted at New York Fashion Week, as well as AI applications for summarizing customer service conversations and extracting information from complex academic papers.
  • 12
    PaLM 2

    PaLM 2

    Google

    PaLM 2 is our next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI. It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements. PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. It was evaluated rigorously for its potential harms and biases, capabilities and downstream uses in research and in-product applications. It’s being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API.
  • 13
    BERT

    BERT

    Google

    BERT is a large language model and a method of pre-training language representations. Pre-training refers to how BERT is first trained on a large source of text, such as Wikipedia. You can then apply the training results to other Natural Language Processing (NLP) tasks, such as question answering and sentiment analysis. With BERT and AI Platform Training, you can train a variety of NLP models in about 30 minutes.
    Starting Price: Free
  • 14
    Qwen2

    Qwen2

    Alibaba

    Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud. Qwen2 is a series of large language models developed by the Qwen team at Alibaba Cloud. It includes both base language models and instruction-tuned models, ranging from 0.5 billion to 72 billion parameters, and features both dense models and a Mixture-of-Experts model. The Qwen2 series is designed to surpass most previous open-weight models, including its predecessor Qwen1.5, and to compete with proprietary models across a broad spectrum of benchmarks in language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning.
    Starting Price: Free
  • 15
    ChatGPT

    ChatGPT

    OpenAI

    ChatGPT is a language model developed by OpenAI. It has been trained on a diverse range of internet text, allowing it to generate human-like responses to a variety of prompts. ChatGPT can be used for various natural language processing tasks, such as question answering, conversation, and text generation. ChatGPT is a pre-trained language model that uses deep learning algorithms to generate text. It was trained on a large corpus of text data, allowing it to generate human-like responses to a wide range of prompts. The model has a transformer architecture, which has been shown to be effective in many NLP tasks. In addition to generating text, ChatGPT can also be fine-tuned for specific NLP tasks such as question answering, text classification, and language translation. This allows developers to build powerful NLP applications that can perform specific tasks more accurately. ChatGPT can also process and generate code.
  • 16
    Cohere

    Cohere

    Cohere AI

    Build natural language understanding and generation into your product with a few lines of code. The Cohere API provides access to models that read billions of web pages and learn to understand the meaning, sentiment, and intent of the words we use. Use the Cohere API to write human-like text by completing a prompt or filling in blanks. You can write copy, generate code, summarize text, and more. Compute the likelihood of text and retrieve representations from the model. Use the likelihood API to filter text based on chosen categories or selected criteria. With representations, you can train your own downstream models on a wide variety of domain-specific natural language tasks. The Cohere API can compute the similarity between pieces of text, and make categorical predictions by comparing the likelihood of different text options. The model has multiple lenses through which to view ideas, so that it can recognize abstract similarities between concepts as distinct as DNA and computers.
    Starting Price: $0.40 / 1M Tokens
  • 17
    AI21 Studio

    AI21 Studio

    AI21 Studio

    AI21 Studio provides API access to Jurassic-1 large-language-models. Our models power text generation and comprehension features in thousands of live applications. Take on any language task. Our Jurassic-1 models are trained to follow natural language instructions and require just a few examples to adapt to new tasks. Use our specialized APIs for common tasks like summarization, paraphrasing and more. Access superior results at a lower cost without reinventing the wheel. Need to fine-tune your own custom model? You're just 3 clicks away. Training is fast, affordable and trained models are deployed immediately. Give your users superpowers by embedding an AI co-writer in your app. Drive user engagement and success with features like long-form draft generation, paraphrasing, repurposing and custom auto-complete.
    Starting Price: $29 per month
  • 18
    ERNIE 3.0 Titan
    Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, We design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts.
  • 19
    Aya

    Aya

    Cohere AI

    Aya is a new state-of-the-art, open-source, massively multilingual, generative large language research model (LLM) covering 101 different languages — more than double the number of languages covered by existing open-source models. Aya helps researchers unlock the powerful potential of LLMs for dozens of languages and cultures largely ignored by most advanced models on the market today. We are open-sourcing both the Aya model, as well as the largest multilingual instruction fine-tuned dataset to-date with a size of 513 million covering 114 languages. This data collection includes rare annotations from native and fluent speakers all around the world, ensuring that AI technology can effectively serve a broad global audience that have had limited access to-date.
  • 20
    Qwen

    Qwen

    Alibaba

    Qwen LLM refers to a family of large language models (LLMs) developed by Alibaba Cloud's Damo Academy. These models are trained on a massive dataset of text and code, allowing them to understand and generate human-like text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Here are some key features of Qwen LLMs: Variety of sizes: The Qwen series ranges from 1.8 billion to 72 billion parameters, offering options for different needs and performance levels. Open source: Some versions of Qwen are open-source, which means their code is publicly available for anyone to use and modify. Multilingual support: Qwen can understand and translate multiple languages, including English, Chinese, and French. Diverse capabilities: Besides generation and translation, Qwen models can be used for tasks like question answering, text summarization, and code generation.
    Starting Price: Free
  • 21
    VideoPoet

    VideoPoet

    Google

    VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator. It contains a few simple components. An autoregressive language model learns across video, image, audio, and text modalities to autoregressively predict the next video or audio token in the sequence. A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video-to-audio. Furthermore, such tasks can be composed together for additional zero-shot capabilities. This simple recipe shows that language models can synthesize and edit videos with a high degree of temporal consistency.
  • 22
    ALBERT

    ALBERT

    Google

    ALBERT is a self-supervised Transformer model that was pretrained on a large corpus of English data. This means it does not require manual labelling, and instead uses an automated process to generate inputs and labels from raw texts. It is trained with two distinct objectives in mind. The first is Masked Language Modeling (MLM), which randomly masks 15% of words in the input sentence and requires the model to predict them. This technique differs from RNNs and autoregressive models like GPT as it allows the model to learn bidirectional sentence representations. The second objective is Sentence Ordering Prediction (SOP), which entails predicting the ordering of two consecutive segments of text during pretraining.
  • 23
    LLaMA

    LLaMA

    Meta

    LLaMA (Large Language Model Meta AI) is a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI. Smaller, more performant models such as LLaMA enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field. Training smaller foundation models like LLaMA is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases. Foundation models train on a large set of unlabeled data, which makes them ideal for fine-tuning for a variety of tasks. We are making LLaMA available at several sizes (7B, 13B, 33B, and 65B parameters) and also sharing a LLaMA model card that details how we built the model in keeping with our approach to Responsible AI practices.
  • 24
    Alpaca

    Alpaca

    Stanford Center for Research on Foundation Models (CRFM)

    Instruction-following models such as GPT-3.5 (text-DaVinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful. Many users now interact with these models regularly and even use them for work. However, despite their widespread deployment, instruction-following models still have many deficiencies: they can generate false information, propagate social stereotypes, and produce toxic language. To make maximum progress on addressing these pressing problems, it is important for the academic community to engage. Unfortunately, doing research on instruction-following models in academia has been difficult, as there is no easily accessible model that comes close in capabilities to closed-source models such as OpenAI’s text-DaVinci-003. We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model.
  • 25
    BLOOM

    BLOOM

    BigScience

    BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text in 46 languages and 13 programming languages that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn't been explicitly trained for, by casting them as text generation tasks.
  • 26
    CodeQwen

    CodeQwen

    QwenLM

    CodeQwen is the code version of Qwen, the large language model series developed by the Qwen team, Alibaba Cloud. It is a transformer-based decoder-only language model pre-trained on a large amount of data of codes. Strong code generation capabilities and competitive performance across a series of benchmarks. Supporting long context understanding and generation with the context length of 64K tokens. CodeQwen supports 92 coding languages and provides excellent performance in text-to-SQL, bug fixes, etc. You can just write several lines of code with transformers to chat with CodeQwen. Essentially, we build the tokenizer and the model from pre-trained methods, and we use the generate method to perform chatting with the help of the chat template provided by the tokenizer. We apply the ChatML template for chat models following our previous practice. The model completes the code snippets according to the given prompts, without any additional formatting.
    Starting Price: Free
  • 27
    Qwen-7B

    Qwen-7B

    Alibaba

    Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. The features of the Qwen-7B series include: Trained with high-quality pretraining data. We have pretrained Qwen-7B on a self-constructed large-scale high-quality dataset of over 2.2 trillion tokens. The dataset includes plain texts and codes, and it covers a wide range of domains, including general domain data and professional domain data. Strong performance. In comparison with the models of the similar model size, we outperform the competitors on a series of benchmark datasets, which evaluates natural language understanding, mathematics, coding, etc. And more.
    Starting Price: Free
  • 28
    GPT-3.5

    GPT-3.5

    OpenAI

    GPT-3.5 is the next evolution of GPT 3 large language model from OpenAI. GPT-3.5 models can understand and generate natural language. We offer four main models with different levels of power suitable for different tasks. The main GPT-3.5 models are meant to be used with the text completion endpoint. We also offer models that are specifically meant to be used with other endpoints. Davinci is the most capable model family and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results. These increased capabilities require more compute resources, so Davinci costs more per API call and is not as fast as the other models.
    Starting Price: $0.0200 per 1000 tokens
  • 29
    GPT-J

    GPT-J

    EleutherAI

    GPT-J is a cutting-edge language model created by the research organization EleutherAI. In terms of performance, GPT-J exhibits a level of proficiency comparable to that of OpenAI's renowned GPT-3 model in a range of zero-shot tasks. Notably, GPT-J has demonstrated the ability to surpass GPT-3 in tasks related to generating code. The latest iteration of this language model, known as GPT-J-6B, is built upon a linguistic dataset referred to as The Pile. This dataset, which is publicly available, encompasses a substantial volume of 825 gibibytes of language data, organized into 22 distinct subsets. While GPT-J shares certain capabilities with ChatGPT, it is important to note that GPT-J is not designed to operate as a chatbot; rather, its primary function is to predict text. In a significant development in March 2023, Databricks introduced Dolly, a model that follows instructions and is licensed under Apache.
    Starting Price: Free
  • 30
    Code Llama
    Code Llama is a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts. Code Llama is free for research and commercial use. Code Llama is built on top of Llama 2 and is available in three models: Code Llama, the foundational code model; Codel Llama - Python specialized for Python; and Code Llama - Instruct, which is fine-tuned for understanding natural language instructions.
    Starting Price: Free
  • 31
    LTM-1

    LTM-1

    Magic AI

    Magic’s LTM-1 enables 50x larger context windows than transformers. Magic's trained a Large Language Model (LLM) that’s able to take in the gigantic amounts of context when generating suggestions. For our coding assistant, this means Magic can now see your entire repository of code. Larger context windows can allow AI models to reference more explicit, factual information and their own action history. We hope to be able to utilize this research to improve reliability and coherence.
  • 32
    GPT4All

    GPT4All

    Nomic AI

    GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs. The goal is simple - be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute and build on. A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software. Nomic AI supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models. Data is one the most important ingredients to successfully building a powerful, general-purpose large language model. The GPT4All community has built the GPT4All open source data lake as a staging ground for contributing instruction and assistant tuning data for future GPT4All model trains.
    Starting Price: Free
  • 33
    Medical LLM

    Medical LLM

    John Snow Labs

    John Snow Labs' Medical LLM is an advanced, domain-specific large language model (LLM) designed to revolutionize the way healthcare organizations harness the power of artificial intelligence. This innovative platform is tailored specifically for the healthcare industry, combining cutting-edge natural language processing (NLP) capabilities with a deep understanding of medical terminology, clinical workflows, and regulatory requirements. The result is a powerful tool that enables healthcare providers, researchers, and administrators to unlock new insights, improve patient outcomes, and drive operational efficiency. At the heart of the Healthcare LLM is its comprehensive training on vast amounts of healthcare data, including clinical notes, research papers, and regulatory documents. This specialized training allows the model to accurately interpret and generate medical text, making it an invaluable asset for tasks such as clinical documentation, automated coding, and medical research.
  • 34
    Cerebras-GPT

    Cerebras-GPT

    Cerebras

    State-of-the-art language models are extremely challenging to train; they require huge compute budgets, complex distributed compute techniques and deep ML expertise. As a result, few organizations train large language models (LLMs) from scratch. And increasingly those that have the resources and expertise are not open sourcing the results, marking a significant change from even a few months back. At Cerebras, we believe in fostering open access to the most advanced models. With this in mind, we are proud to announce the release to the open source community of Cerebras-GPT, a family of seven GPT models ranging from 111 million to 13 billion parameters. Trained using the Chinchilla formula, these models provide the highest accuracy for a given compute budget. Cerebras-GPT has faster training times, lower training costs, and consumes less energy than any publicly available model to date.
    Starting Price: Free
  • 35
    Baichuan-13B

    Baichuan-13B

    Baichuan Intelligent Technology

    Baichuan-13B is an open source and commercially available large-scale language model containing 13 billion parameters developed by Baichuan Intelligent following Baichuan -7B . It has achieved the best results of the same size on authoritative Chinese and English benchmarks. This release contains two versions of pre-training ( Baichuan-13B-Base ) and alignment ( Baichuan-13B-Chat ). Larger size, more data : Baichuan-13B further expands the number of parameters to 13 billion on the basis of Baichuan -7B , and trains 1.4 trillion tokens on high-quality corpus, which is 40% more than LLaMA-13B. It is currently open source The model with the largest amount of training data in the 13B size. Support Chinese and English bilingual, use ALiBi position code, context window length is 4096.
    Starting Price: Free
  • 36
    Smaug-72B

    Smaug-72B

    Abacus

    Smaug-72B is a powerful open-source large language model (LLM) known for several key features: High Performance: It currently holds the top spot on the Hugging Face Open LLM leaderboard, surpassing models like GPT-3.5 in various benchmarks. This means it excels at tasks like understanding, responding to, and generating human-like text. Open Source: Unlike many other advanced LLMs, Smaug-72B is freely available for anyone to use and modify, fostering collaboration and innovation in the AI community. Focus on Reasoning and Math: It specifically shines in handling reasoning and mathematical tasks, attributing this strength to unique fine-tuning techniques developed by Abacus AI, the creators of Smaug-72B. Based on Qwen-72B: It's technically a fine-tuned version of another powerful LLM called Qwen-72B, released by Alibaba, further improving upon its capabilities. Overall, Smaug-72B represents a significant step forward in open-source AI.
    Starting Price: Free
  • 37
    LongLLaMA

    LongLLaMA

    LongLLaMA

    This repository contains the research preview of LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more. LongLLaMA is built upon the foundation of OpenLLaMA and fine-tuned using the Focused Transformer (FoT) method. LongLLaMA code is built upon the foundation of Code Llama. We release a smaller 3B base variant (not instruction tuned) of the LongLLaMA model on a permissive license (Apache 2.0) and inference code supporting longer contexts on hugging face. Our model weights can serve as the drop-in replacement of LLaMA in existing implementations (for short context up to 2048 tokens). Additionally, we provide evaluation results and comparisons against the original OpenLLaMA models.
    Starting Price: Free
  • 38
    OPT

    OPT

    Meta

    Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
  • 39
    Gemma 2

    Gemma 2

    Google

    A family of state-of-the-art, light-open models created from the same research and technology that were used to create Gemini models. These models incorporate comprehensive security measures and help ensure responsible and reliable AI solutions through selected data sets and rigorous adjustments. Gemma models achieve exceptional comparative results in their 2B, 7B, 9B, and 27B sizes, even outperforming some larger open models. With Keras 3.0, enjoy seamless compatibility with JAX, TensorFlow, and PyTorch, allowing you to effortlessly choose and change frameworks based on task. Redesigned to deliver outstanding performance and unmatched efficiency, Gemma 2 is optimized for incredibly fast inference on various hardware. The Gemma family of models offers different models that are optimized for specific use cases and adapt to your needs. Gemma models are large text-to-text lightweight language models with a decoder, trained in a huge set of text data, code, and mathematical content.
  • 40
    GPT-4V (Vision)
    GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of expanding the impact of language-only systems with novel interfaces and capabilities, enabling them to solve new tasks and provide novel experiences for their users. In this system card, we analyze the safety properties of GPT-4V. Our work on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs.
  • 41
    LaMDA

    LaMDA

    Google

    LaMDA, our latest research breakthrough, adds pieces to one of the most tantalizing sections of that puzzle: conversation. While conversations tend to revolve around specific topics, their open-ended nature means they can start in one place and end up somewhere completely different. A chat with a friend about a TV show could evolve into a discussion about the country where the show was filmed before settling on a debate about that country’s best regional cuisine. That meandering quality can quickly stump modern conversational agents (commonly known as chatbots), which tend to follow narrow, pre-defined paths. But LaMDA — short for “Language Model for Dialogue Applications” — can engage in a free-flowing way about a seemingly endless number of topics, an ability we think could unlock more natural ways of interacting with technology and entirely new categories of helpful applications.
  • 42
    ESMFold

    ESMFold

    Meta

    ESMFold shows how AI can give us new tools to understand the natural world, much like the microscope, which enabled us to see into the world at an infinitesimal scale and opened up a whole new understanding of life. AI can help us understand the immense scope of natural diversity, and see biology in a new way. Much of AI research has focused on helping computers understand the world in a way similar to how humans do. The language of proteins is one that is beyond human comprehension and has eluded even the most powerful computational tools. AI has the potential to open up this language to our understanding. Studying AI in new domains such as biology can also give insight into artificial intelligence more broadly. Our work reveals connections across domains: large language models that are behind advances in machine translation, natural language understanding, speech recognition, and image generation are also able to learn deep information about biology.
    Starting Price: Free
  • 43
    Med-PaLM 2

    Med-PaLM 2

    Google Cloud

    Healthcare breakthroughs change the world and bring hope to humanity through scientific rigor, human insight, and compassion. We believe AI can contribute to this, with thoughtful collaboration between researchers, healthcare organizations, and the broader ecosystem. Today, we're sharing exciting progress on these initiatives, with the announcement of limited access to Google’s medical large language model, or LLM, called Med-PaLM 2. It will be available in the coming weeks to a select group of Google Cloud customers for limited testing, to explore use cases and share feedback as we investigate safe, responsible, and meaningful ways to use this technology. Med-PaLM 2 harnesses the power of Google’s LLMs, aligned to the medical domain to more accurately and safely answer medical questions. As a result, Med-PaLM 2 was the first LLM to perform at an “expert” test-taker level performance on the MedQA dataset of US Medical Licensing Examination (USMLE)-style questions.
  • 44
    Eternity AI

    Eternity AI

    Eternity AI

    Eternity AI is building an HTLM-7B, a machine learning model that knows what the internet is and how to access it to generate responses. Humans don't make decisions based on 2-year-old data. For a model to think like a human, it needs to get access to real-time knowledge and everything about how humans behave. Members of our team have previously published white papers and articles on topics related to on-chain vulnerability coordination, GPT database retrieval, decentralized dispute resolution, etc.
  • 45
    XLNet

    XLNet

    XLNet

    XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking.
    Starting Price: Free
  • 46
    Phi-2

    Phi-2

    Microsoft

    We are now releasing Phi-2, a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters. On complex benchmarks Phi-2 matches or outperforms models up to 25x larger, thanks to new innovations in model scaling and training data curation. With its compact size, Phi-2 is an ideal playground for researchers, including for exploration around mechanistic interpretability, safety improvements, or fine-tuning experimentation on a variety of tasks. We have made Phi-2 available in the Azure AI Studio model catalog to foster research and development on language models.
  • 47
    InstructGPT

    InstructGPT

    OpenAI

    InstructGPT is an open-source framework for training language models to generate natural language instructions from visual input. It uses a generative pre-trained transformer (GPT) model and the state-of-the-art object detector, Mask R-CNN, to detect objects in images and generate natural language sentences that describe the image. InstructGPT is designed to be effective across domains such as robotics, gaming and education; it can assist robots in navigating complex tasks with natural language instructions, or help students learn by providing descriptive explanations of processes or events.
    Starting Price: $0.0200 per 1000 tokens
  • 48
    Mistral Large 2

    Mistral Large 2

    Mistral AI

    Mistral Large 2 has a 128k context window and supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Mistral Large 2 is designed for single-node inference with long-context applications in mind – its size of 123 billion parameters allows it to run at large throughput on a single node. We are releasing Mistral Large 2 under the Mistral Research License, that allows usage and modification for research and non-commercial usages.
    Starting Price: Free
  • 49
    Gopher

    Gopher

    DeepMind

    Language, and its role in demonstrating and facilitating comprehension - or intelligence - is a fundamental part of being human. It gives people the ability to communicate thoughts and concepts, express ideas, create memories, and build mutual understanding. These are foundational parts of social intelligence. It’s why our teams at DeepMind study aspects of language processing and communication, both in artificial agents and in humans. As part of a broader portfolio of AI research, we believe the development and study of more powerful language models – systems that predict and generate text – have tremendous potential for building advanced AI systems that can be used safely and efficiently to summarise information, provide expert advice and follow instructions via natural language. Developing beneficial language models requires research into their potential impacts, including the risks they pose.
  • 50
    Upstage

    Upstage

    Upstage

    Use the Chat API to create a simple conversational agent with Solar. Function Calling is now supported, the way to connect LLM to external tools. The embedding vectors can be used for tasks such as retrieval and classification. Context-aware English-Korean translation that leverages previous dialogues to ensure unmatched coherence and continuity in your conversations. Verifies whether the answers provided by the LLM are appropriately generated, based on the user's question and search results. Developing a healthcare LLM to automate patient communication, personalize treatment plans, aid in clinical decision support and support medical transcription. Aims to enable business owners and companies to deploy generative AI chatbots on websites and mobile apps easily, providing human-like services in customer support and engagement.
    Starting Price: $0.5 per 1M tokens