paraphrase-multilingual-mpnet-base-v2 is a sentence-transformers model designed to generate dense vector representations of sentences and paragraphs in 50 languages. Developed by the Sentence Transformers team, it is particularly well-suited for tasks like semantic search, clustering, and paraphrase detection. The model maps input text to a 768-dimensional vector space, making it easy to compare the semantic meaning of different sentences. Based on the XLM-RoBERTa architecture and trained using the MPNet framework, it offers multilingual support with strong performance across a wide range of languages. It can be used via the sentence-transformers library for streamlined access or directly through Hugging Face Transformers with custom pooling operations. The model is compatible with multiple formats, including PyTorch, TensorFlow, ONNX, and OpenVINO. With over 3 million downloads per month, it’s widely adopted in both research and production environments.
Features
- Generates 768-dimensional sentence embeddings
- Supports 50+ languages for multilingual applications
- Based on XLM-RoBERTa with MPNet optimization
- Fine-tuned for semantic similarity and paraphrasing
- Compatible with PyTorch, TensorFlow, ONNX, and OpenVINO
- Can be used with sentence-transformers or Hugging Face Transformers
- Pretrained with pooling for mean token representation
- Ideal for clustering, search, and cross-lingual NLP tasks