jina-embeddings-v3 is a multilingual, multi-task text embedding model developed by Jina AI, designed to generate highly adaptable representations across a wide range of natural language processing tasks. Built on a modified XLM-RoBERTa architecture with Rotary Position Embeddings (RoPE), it supports long inputs up to 8192 tokens. The model includes five task-specific LoRA adapters—covering retrieval, classification, clustering, and text matching—that allow users to optimize embeddings for different applications. Jina Embeddings v3 also supports Matryoshka embeddings, enabling users to select embedding sizes (32–1024) based on performance or resource needs. It performs well across 94 languages, with focused tuning on 30 languages including English, Chinese, Arabic, and Spanish. The model is compatible with Hugging Face Transformers, ONNX, and Sentence-Transformers libraries, and can be fine-tuned via LoRA adapters or fully trained.
Features
- Multilingual support across 94 languages
- Supports input sequences up to 8192 tokens via RoPE
- Task-specific LoRA adapters for flexible embeddings
- Matryoshka embeddings with customizable vector dimensions
- Fine-tuning via SentenceTransformerTrainer or full model training
- High compatibility with Transformers, ONNX, and sentence-transformers
- Optimized for symmetric/asymmetric retrieval, clustering, classification
- 572 million parameters and strong MTEB benchmark performance