roberta-large

RoBERTa-large is a robustly optimized transformer model for English, trained by Facebook AI using a masked language modeling (MLM) objective. Unlike BERT, RoBERTa was trained on 160GB of data from BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories, with dynamic masking applied during training. It uses a byte-level BPE tokenizer and was trained with a sequence length of 512 and a batch size of 8K across 1024 V100 GPUs. RoBERTa improves performance across multiple NLP tasks by removing BERT’s next-sentence prediction objective and leveraging larger batches and longer training. With 355 million parameters, it learns bidirectional sentence representations and performs strongly in tasks like sequence classification, token classification, and question answering. However, it reflects social biases present in its training data, so caution is advised when deploying in sensitive contexts.

Features

Pretrained using masked language modeling on 160GB of text
355M parameters for rich contextual language understanding
Learns bidirectional representations (unlike GPT)
Dynamic token masking per epoch for better generalization
Optimized for downstream tasks like QA and classification
Compatible with PyTorch, TensorFlow, and JAX
Case-sensitive vocabulary with 50K BPE tokens
Open-source under the MIT license

Project Samples

Project Activity

See All Activity >

Follow roberta-large

roberta-large Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of roberta-large!

Additional Project Details

Registered

2025-07-01

Similar Business Software

RoBERTa

RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples. RoBERTa, which was implemented in PyTorch, modifies key hyperparameters in BERT, including removing BERT’s next-sentence...

See Software
ALBERT

ALBERT is a self-supervised Transformer model that was pretrained on a large corpus of English data. This means it does not require manual labelling, and instead uses an automated process to generate inputs and labels from raw texts. It is trained with two distinct objectives in mind. The first...

See Software
BERT

BERT is a large language model and a method of pre-training language representations. Pre-training refers to how BERT is first trained on a large source of text, such as Wikipedia. You can then apply the training results to other Natural Language Processing (NLP) tasks, such as question...

See Software