| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 2.13.2 source code.tar.gz | 2026-05-14 | 10.8 MB | |
| 2.13.2 source code.zip | 2026-05-14 | 13.0 MB | |
| README.md | 2026-05-14 | 9.9 kB | |
| Totals: 3 Items | 23.8 MB | 2 | |
2.13.2 (2026-05-14)
Ci
-
ci: Speedup CI (#4648)
-
speedup ci
-
pin uv version in pyproject
-
upd makefile (
31ae267)
Fix
- fix: OgmaWrapper inherit from AbsEncoder (#4670)
OgmaWrapper was declared as a bare class, so isinstance(model, EncoderProtocol) returns False and RetrievalEvaluator rejects it with TypeError. Inheriting from AbsEncoder picks up the default similarity / similarity_pairwise implementations driven by ModelMeta.similarity_fn_name (already set to COSINE for all Ogma models). No change to encoding behaviour.
Reported by @Samoed on embeddings-benchmark/results#525. (8d92537)
Unknown
-
Move directory creation to write-time call. (#4667)
-
Do not mkdir as side effect of accessing property.
-
Move directory creation to write-time call. (
33f9ea4) -
Model: Add new model revision of Querit/Querit (#4669)
Update Querit model implementation
Co-authored-by: zhongyunfei <zhongyunfei@baidu.com> (a6ab84e)
- per-model num_frames for V-JEPA 2 variants, matching fpc value (#4668)
fix: per-model num_frames for V-JEPA 2 variants, matching fpc value
The wrapper defaulted to FPS-mode sampling (fps=2.0, max_frames=64), but V-JEPA 2's own demo (facebookresearch/vjepa2) and HF config use fixed-count uniform sampling, where the frame count is the model's pretrained "frames per clip" (fpc) value — encoded in the model name and also in model.config.frames_per_clip:
- fpc64 variants (5): vitl-fpc64-256, vith-fpc64-256, vitg-fpc64-256, vitg-fpc64-384, vitg-fpc64-384-ssv2 -> 64
- fpc32 variants (2): vitg-fpc32-384-diving48, vitl-fpc32-256-diving48 -> 32
- fpc16 variant (1): vitl-fpc16-256-ssv2 -> 16
Drop FPS-mode defaults from the wrapper (all three to None) and pass
num_frames per ModelMeta via loader_kwargs. Users can still override
at eval time via meta.load_model(num_frames=...) or get_model(...). (b947278)
-
model: adds ogma models (#4620)
-
Add Axiotic Ogma embedding models
-
Add Axiotic Ogma embedding models
-
Add Ogma models
-
Add Ogma models
-
Fix: switches to auto-tokenizer and resolves truncation of special tokens.
-
Fix: defer special-token handling to OgmaTokenizerFast.
The previous wrapper called tokenizer.encode(text, add_special_tokens=False), manually shifted ids by n_special_tokens, and prepended/appended raw 2/3 (Ogma <s>/</s>). That places the wrong special-token embeddings at the sequence boundaries — the model was trained with the inner tokenizer's [CLS]/[SEP] (shifted to ids 9/10), prepended task token added by the model itself.
Switch to AutoTokenizer's OgmaTokenizerFast call, which runs the post-processor (adds [CLS]/[SEP]) and applies the +N_SPECIAL shift on attention_mask==1 positions only, leaving padding at 0. Also handles truncation correctly so special tokens are preserved at max_seq_len.
Bumps each ogma revision to the HF main sha that ships tokenization_ogma.OgmaTokenizerFast. Smoke-tested: matches the canonical model.embed() output bit-for-bit.
- Fix: route Retrieval/Reranking through QRY on both sides.
Ogma's published MTEB scores are produced with task=QRY for both queries and corpus on retrieval-style tasks. The previous wrapper used DOC for the corpus side, which underperforms (QRY, QRY) by ~4pp on ArguAna and ~4pp on AskUbuntuDupQuestions (ogma-micro). The HF README numbers were not reproducible end-to-end against this PR.
Dispatch on task_metadata.type instead of prompt_type: Retrieval / Reranking / InstructionRetrieval / InstructionReranking -> QRY (both sides) Classification / Clustering / PairClassification / STS / Summarization -> SYM
Smoke-tested: bit-exact parity with model.embed(task='qry') on the
retrieval-corpus side, and model.embed(task='sym') on STS. (15a7175)
-
Support experiments for submit results (#4660)
-
support experiments for submit results
-
upd link (
817857b) -
add vlm2vec2 (#4567)
-
add vlm2vec2
-
fix import
-
fix device
-
try to fix
-
use finetuned config
-
add frames collator (
b8ade6e) -
add jepa (#4565)
-
add jepa
-
Update mteb/models/model_implementations/facebook_jepa.py
Co-authored-by: AdnanElAssadi56 <115242814+AdnanElAssadi56@users.noreply.github.com>
-
use mean pooling
-
fix device
-
set max frames
Co-authored-by: AdnanElAssadi56 <115242814+AdnanElAssadi56@users.noreply.github.com> (1906877)
-
add penguin vl (#4581)
-
add penguin vl
-
update requirements
-
fix unpack (
0db0853) -
dataset: Add SWEbenchCodeRetrieval task for RTEB (#4365)
-
feat: Add SWEbenchCodeRetrieval task for RTEB
Add a new code retrieval task based on SWE-bench Verified (500 real GitHub issues from 12 popular Python repos). Queries are issue descriptions, corpus is Python source files, relevance is gold patch files. Dataset at embedding-benchmark/SWEbenchCodeRetrieval on HF.
- New task: SWEbenchCodeRetrieval (58K corpus docs, 500 queries)
-
Added to RTEB(beta), RTEB(eng, beta), RTEB(Code, beta)
-
Re-upload dataset via push_dataset_to_hub, remove custom load_data
-
Re-uploaded dataset to embedding-benchmark/SWEbenchCodeRetrieval in standard MTEB format (_id, title, text columns)
- Removed custom load_data() — default retrieval loader handles it
-
Updated revision hash to match new upload
-
add descriptive statistics for SWEbenchCodeRetrieval
-
Remove SWEbenchCodeRetrieval from RTEB benchmarks for now
Keep the task definition but don't add it to benchmark lists yet.
Results will be added in a follow-up PR once model coverage is complete. (dcc5965)
-
Depreciate Model2VecModel and use SentenceTransformerEncoderWrapper instead (#4649)
-
Depreceate Model2VecModel and use SentenceTransformerEncoderWrapper instead
-
update revision for all NeuML models
-
removed model2vec extra group
-
revert pyproject.toml and remove extra_requirement_groups
-
revert loader for 100k, 500k, and 1M variant to Model2VecModel
-
bump revisions
-
updated revision for rest of models
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (0a1f8b6)
-
Fix/jina v5 omni video audio kwargs (#4652)
-
fix: pass video and audio processing kwargs to Jina v5 omni configs
JinaV5OmniWrapper.encode reads self.fps / max_frames / target_sampling_rate / max_samples to build the VideoCollator / AudioCollator, but neither jina_embeddings_v5_omni_small nor jina_embeddings_v5_omni_nano was setting them in loader_kwargs. Without these, videos passed through unbounded and audio used a fallback 16 kHz rate but no truncation.
Set the same defaults the other video models in MTEB use:
- fps=2.0 (matches Jina processor_config.json "fps": 2)
- max_frames=64 (MTEB convention; 64 frames @ 2 fps = 32 s, aligned with the audio window — Jina's processor ceiling is 768)
- target_sampling_rate=16000 (Whisper feature extractor in audio_config)
-
max_samples=30 * 16000 (Whisper's native 30-second window: matches audio_config.max_source_positions = 1500)
-
refactor: move Jina v5 omni defaults into wrapper init
The prior commit added fps/max_frames/target_sampling_rate/max_samples in each ModelMeta's loader_kwargs. The repo convention (PE-AV, Qwen Omni, Omni-Embed-Nemotron) is to put these defaults in the wrapper class's init instead, so per-model configs stay uncluttered and any future Jina v5 omni variant inherits the same defaults.
Same numeric values, different place.
- refactor: use explicit positional args in JinaV5OmniWrapper.init
Mirror the OmniEmbedNemotronWrapper pattern (the closest analog — also inherits from SentenceTransformerMultimodalEncoderWrapper): explicit model/revision/device positional args instead of *args. Same behavior; just makes the signature self-documenting and consistent across the two omni wrappers that share this parent.
- review: move video/audio defaults from wrapper init to loader_kwargs
Per review feedback (@Samoed): these values describe the model (Jina's processor_config.json fps=2; Whisper backbone at 16 kHz with a native 30s window; max_frames=64 the MTEB eval convention), so they read more naturally on each ModelMeta than as wrapper defaults.
Drop the JinaV5OmniWrapper.init override and put the kwargs on both
omni configs explicitly. (aee4ed7)
- Update codefuse_models.py (#4643) (
a34d49d)