bge-m3

BGE-M3 is an advanced text embedding model developed by BAAI that excels in multi-functionality, multi-linguality, and multi-granularity. It supports dense retrieval, sparse retrieval (lexical weighting), and multi-vector retrieval (ColBERT-style), making it ideal for hybrid systems in retrieval-augmented generation (RAG). The model handles over 100 languages and supports long-text inputs up to 8192 tokens, offering flexibility across short queries and full documents. BGE-M3 was trained using self-knowledge distillation and unified fine-tuning strategies to align its performance across all modes. It achieves state-of-the-art results in several multilingual and long-document retrieval benchmarks, surpassing models from OpenAI in certain tests. Designed to integrate with tools like Milvus and Vespa, BGE-M3 enables efficient hybrid retrieval pipelines and downstream scoring via re-ranking models.

Features

Supports dense, sparse, and multi-vector retrieval in one model
Multilingual coverage across 100+ languages
Handles sequences up to 8192 tokens
Fine-tuned for document, sentence, and passage embeddings
Outperforms OpenAI models in key multilingual benchmarks
Plug-and-play integration with Vespa and Milvus for hybrid search
Built with self-distillation and efficient batching for long texts
Open-sourced under the MIT license with full documentation and examples

Project Samples

Project Activity

See All Activity >

Follow bge-m3

bge-m3 Web Site

Other Useful Business Software

Build Securely on AWS with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now

Rate This Project

User Reviews

Be the first to post a review of bge-m3!

Additional Project Details

Programming Language

Python

Related Categories

Python AI Models

Registered

2025-06-27

Similar Business Software

Palmyra LLM

Palmyra is a suite of Large Language Models (LLMs) engineered for precise, dependable performance in enterprise applications. These models excel in tasks such as question-answering, image analysis, and support for over 30 languages, with fine-tuning available for industries like healthcare and...

See Software
Amazon Titan

Amazon Titan is a series of advanced foundation models (FMs) from AWS, designed to enhance generative AI applications with high performance and flexibility. Built on AWS's 25 years of AI and machine learning experience, Titan models support a range of use cases such as text generation,...

See Software
Amazon Nova Sonic

Amazon Nova Sonic is a state-of-the-art speech-to-speech model that delivers real-time, human-like voice conversations with industry-leading price performance. It unifies speech understanding and generation into a single model, enabling developers to create natural, expressive conversational AI...

See Software