all-mpnet-base-v2

all-mpnet-base-v2 is a sentence embedding model from the Sentence-Transformers library that maps English sentences and paragraphs into dense 768-dimensional vector representations. Based on the microsoft/mpnet-base transformer, the model is fine-tuned using over 1.17 billion sentence pairs via contrastive learning to perform tasks such as semantic search, information retrieval, clustering, and similarity detection. It supports both PyTorch and ONNX, and can be used via SentenceTransformers or Hugging Face Transformers with custom pooling. This model truncates input longer than 384 tokens and achieves strong results across a variety of datasets, including Reddit, WikiAnswers, StackExchange, MS MARCO, and more. Originally trained during Hugging Face’s Community Week using JAX/Flax and TPUs, it delivers high-quality semantic embeddings suitable for production-scale NLP applications.

Features

Maps sentences to 768-dimensional dense vectors
Trained on 1.17B sentence pairs using contrastive learning
Fine-tuned from microsoft/mpnet-base
Optimized for sentence similarity, clustering, and retrieval
Truncates input longer than 384 tokens
Available in PyTorch, ONNX, and OpenVINO formats
Efficient inference using Sentence-Transformers or Transformers
Pretrained on diverse datasets including Reddit, S2ORC, StackExchange

Project Samples

Project Activity

See All Activity >

Follow all-mpnet-base-v2

all-mpnet-base-v2 Web Site

nel_h2

Simply solve complex auth. Easy for devs to set up. Easy for non-devs to use.

Transform user access with Frontegg CIAM: login box, SSO, MFA, multi-tenancy, and 99.99% uptime.

Custom auth drains 25% of dev time and risks 62% more breaches, stalling enterprise deals. Frontegg platform delivers a simple login box, seamless authentication (SSO, MFA, passwordless), robust multi-tenancy, and a customizable Admin Portal. Integrate fast with the React SDK, meet compliance needs, and focus on innovation.

Start for Free

Rate This Project

User Reviews

Be the first to post a review of all-mpnet-base-v2!

Additional Project Details

Registered

2025-07-01

Similar Business Software

ALBERT

ALBERT is a self-supervised Transformer model that was pretrained on a large corpus of English data. This means it does not require manual labelling, and instead uses an automated process to generate inputs and labels from raw texts. It is trained with two distinct objectives in mind. The first...

See Software
Phi-4-mini-reasoning

Phi-4-mini-reasoning is a 3.8-billion parameter transformer-based language model optimized for mathematical reasoning and step-by-step problem solving in environments with constrained computing or latency. Fine-tuned with synthetic data generated by the DeepSeek-R1 model, it balances efficiency...

See Software
Phi-4-reasoning

Phi-4-reasoning is a 14-billion parameter transformer-based language model optimized for complex reasoning tasks, including math, coding, algorithmic problem solving, and planning. Trained via supervised fine-tuning of Phi-4 on carefully curated "teachable" prompts and reasoning demonstrations...

See Software
Amazon Titan

Amazon Titan is a series of advanced foundation models (FMs) from AWS, designed to enhance generative AI applications with high performance and flexibility. Built on AWS's 25 years of AI and machine learning experience, Titan models support a range of use cases such as text generation,...

See Software
InstructGPT

InstructGPT is an open-source framework for training language models to generate natural language instructions from visual input. It uses a generative pre-trained transformer (GPT) model and the state-of-the-art object detector, Mask R-CNN, to detect objects in images and generate natural...

See Software
LUIS

Language Understanding (LUIS): A machine learning-based service to build natural language into apps, bots, and IoT devices. Quickly create enterprise-ready, custom models that continuously improve. Add natural language to your apps. Designed to identify valuable information in conversations,...

See Software