bert-base-multilingual-cased download

bert-base-multilingual-cased is a multilingual version of BERT pre-trained on Wikipedia articles from the top 104 languages using masked language modeling (MLM) and next sentence prediction (NSP) objectives. Unlike uncased models, it preserves case distinctions (e.g., "english" ≠ "English"). Trained in a self-supervised fashion, this model captures deep bidirectional language representations, enabling it to be fine-tuned for a wide range of natural language understanding tasks across multiple languages. It supports sequence classification, token classification, question answering, and more. Built with a shared vocabulary of 110,000 tokens, it is compatible with both PyTorch and TensorFlow.

Features

Pretrained on Wikipedia data in 104 languages
Case-sensitive (e.g., differentiates "Apple" from "apple")
Trained with both Masked Language Modeling and Next Sentence Prediction
Uses WordPiece tokenization with a shared multilingual vocabulary
Supports downstream tasks like classification and QA
Works with PyTorch, TensorFlow, and JAX
179M parameters and 12-layer Transformer architecture

Project Samples

bert-base-multilingual-cased Screenshot 1

Project Activity

See All Activity >

Follow bert-base-multilingual-cased

bert-base-multilingual-cased Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of bert-base-multilingual-cased!

Additional Project Details

Registered

2025-07-01

Similar Business Software

ALBERT

ALBERT is a self-supervised Transformer model that was pretrained on a large corpus of English data. This means it does not require manual labelling, and instead uses an automated process to generate inputs and labels from raw texts. It is trained with two distinct objectives in mind. The first...

See Software
RoBERTa

RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples. RoBERTa, which was implemented in PyTorch, modifies key hyperparameters in BERT, including removing BERT’s next-sentence...

See Software
BERT

BERT is a large language model and a method of pre-training language representations. Pre-training refers to how BERT is first trained on a large source of text, such as Wikipedia. You can then apply the training results to other Natural Language Processing (NLP) tasks, such as question...

See Software