bge-base-en-v1.5 is an English sentence embedding model from BAAI optimized for dense retrieval tasks, part of the BGE (BAAI General Embedding) family. It is a fine-tuned BERT-based model designed to produce high-quality, semantically meaningful embeddings for tasks like semantic similarity, information retrieval, classification, and clustering. This version (v1.5) improves retrieval performance and stabilizes similarity score distribution without requiring instruction-based prompts. With 768 embedding dimensions and a maximum sequence length of 512 tokens, it achieves strong performance across multiple MTEB benchmarks, nearly matching larger models while maintaining efficiency. It supports use via SentenceTransformers, Hugging Face Transformers, FlagEmbedding, and ONNX for various deployment scenarios. Typical usage includes normalizing output embeddings and calculating cosine similarity via dot product for ranking.
Features
- Fine-tuned English embedding model for semantic search and dense retrieval
- Embedding size of 768 with support for sequences up to 512 tokens
- Outperforms other base models on MTEB with strong generalization
- Compatible with FlagEmbedding, SentenceTransformers, and Hugging Face
- No instruction prefix needed for queries or passages
- ONNX format and inference support for efficient deployment
- Built on BERT architecture with 109M parameters
- MIT-licensed and free for commercial use