MTEB download | SourceForge.net

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks.

Features

Dataset selection
Datasets can be selected by providing the list of datasets
You can also specify which languages to load for multilingual/crosslingual tasks
You can evaluate only on test splits of all tasks
Use a custom model
Evaluate on a custom task

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow MTEB

MTEB Web Site

User Reviews

Be the first to post a review of MTEB!

Additional Project Details

Programming Language

Python

Related Categories

Python Neural Search Software

Registered

2023-08-21

Similar Business Software

Qdrant

Qdrant is a vector similarity engine & vector database. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much...

See Software
Cohere

Build natural language understanding and generation into your product with a few lines of code. The Cohere API provides access to models that read billions of web pages and learn to understand the meaning, sentiment, and intent of the words we use. Use the Cohere API to write human-like text by...

See Software
Jina Search

With Jina Search, you can search for anything in seconds - faster and more accurately than any traditional search engine. Our AI search captures all the information stored in images and text, providing you with the most comprehensive results. Unlock the power of search and revolutionize the way...

See Software

Report inappropriate content

MTEB

MTEB: Massive Text Embedding Benchmark

Features

Project Samples

Project Activity

Categories

License

Follow MTEB

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered