Audience
Developers interested in a solution for generalized autoregressive pretraining for language understanding
About XLNet
XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking.
Other Popular Alternatives & Related Software
GPT-4
GPT-4 (Generative Pre-trained Transformer 4) is a large-scale unsupervised language model, yet to be released by OpenAI. GPT-4 is the successor to GPT-3 and part of the GPT-n series of natural language processing models, and was trained on a dataset of 45TB of text to produce human-like text generation and understanding capabilities. Unlike most other NLP models, GPT-4 does not require additional training data for specific tasks. Instead, it can generate text or answer questions using only its own internally generated context as input. GPT-4 has been shown to be able to perform a wide variety of tasks without any task specific training data such as translation, summarization, question answering, sentiment analysis and more.
Learn more
LLaVA
LLaVA (Large Language-and-Vision Assistant) is an innovative multimodal model that integrates a vision encoder with the Vicuna language model to facilitate comprehensive visual and language understanding. Through end-to-end training, LLaVA exhibits impressive chat capabilities, emulating the multimodal functionalities of models like GPT-4. Notably, LLaVA-1.5 has achieved state-of-the-art performance across 11 benchmarks, utilizing publicly available data and completing training in approximately one day on a single 8-A100 node, surpassing methods that rely on billion-scale datasets. The development of LLaVA involved the creation of a multimodal instruction-following dataset, generated using language-only GPT-4. This dataset comprises 158,000 unique language-image instruction-following samples, including conversations, detailed descriptions, and complex reasoning tasks. This data has been instrumental in training LLaVA to perform a wide array of visual and language tasks effectively.
Learn more
RoBERTa
RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples. RoBERTa, which was implemented in PyTorch, modifies key hyperparameters in BERT, including removing BERT’s next-sentence pretraining objective, and training with much larger mini-batches and learning rates. This allows RoBERTa to improve on the masked language modeling objective compared with BERT and leads to better downstream task performance. We also explore training RoBERTa on an order of magnitude more data than BERT, for a longer amount of time. We used existing unannotated NLP datasets as well as CC-News, a novel set drawn from public news articles.
Learn more
Megatron-Turing
Megatron-Turing Natural Language Generation model (MT-NLG), is the largest and the most powerful monolithic transformer English language model with 530 billion parameters. This 105-layer, transformer-based MT-NLG improves upon the prior state-of-the-art models in zero-, one-, and few-shot settings. It demonstrates unmatched accuracy in a broad set of natural language tasks such as, Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation, etc.
With the intent of accelerating research on the largest English language model till date and enabling customers to experiment, employ and apply such a large language model on downstream language tasks - NVIDIA is pleased to announce an Early Access program for its managed API service to MT-NLG mode.
Learn more
Pricing
Starting Price:
Free
Free Version:
Free Version available.
Integrations
Company Information
XLNet
Founded: 2019
github.com/zihangdai/xlnet
Other Useful Business Software
Auth0 for AI Agents now in GA
Connect your AI agents to apps and data more securely, give users control over the actions AI agents can perform and the data they can access, and enable human confirmation for critical agent actions.
Product Details
Platforms Supported
Cloud
On-Premises
Training
Documentation