bert-base-multilingual-cased is a multilingual version of BERT pre-trained on Wikipedia articles from the top 104 languages using masked language modeling (MLM) and next sentence prediction (NSP) objectives. Unlike uncased models, it preserves case distinctions (e.g., "english" ≠ "English"). Trained in a self-supervised fashion, this model captures deep bidirectional language representations, enabling it to be fine-tuned for a wide range of natural language understanding tasks across multiple languages. It supports sequence classification, token classification, question answering, and more. Built with a shared vocabulary of 110,000 tokens, it is compatible with both PyTorch and TensorFlow.
Features
- Pretrained on Wikipedia data in 104 languages
- Case-sensitive (e.g., differentiates "Apple" from "apple")
- Trained with both Masked Language Modeling and Next Sentence Prediction
- Uses WordPiece tokenization with a shared multilingual vocabulary
- Supports downstream tasks like classification and QA
- Works with PyTorch, TensorFlow, and JAX
- 179M parameters and 12-layer Transformer architecture
Categories
AI ModelsFollow bert-base-multilingual-cased
Other Useful Business Software
Our Free Plans just got better! | Auth0
You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of bert-base-multilingual-cased!