Download Latest Version 2.13.7 source code.tar.gz (10.8 MB)
Email in envelope

Get an email when there's a new version of MTEB

Home / 2.13.5
Name Modified Size InfoDownloads / Week
Parent folder
2.13.5 source code.tar.gz 2026-05-16 10.8 MB
2.13.5 source code.zip 2026-05-16 13.1 MB
README.md 2026-05-16 2.7 kB
Totals: 3 Items   23.9 MB 0

2.13.5 (2026-05-16)

Fix

  • fix: match UME-R1's official video config (#4678)

  • fix: match UME-R1's official video config (fps=1, max_pixels=360*420)

UME-R1's HF model card shows a local-file video example with: {"type": "video", "video": "...", "max_pixels": 360 * 420, "fps": 1.0}

Bring the wrapper in line:

  • fps: 2.0 -> 1.0. The MTEB convention is 2.0 across most video wrappers, but UME-R1's authors prescribe 1.0 in their inference example, and fps is just a sampling rate (doesn't change model behavior structurally) so matching the model's config is fine here.
  • max_pixels: not set previously -> set to 360*420. The preprocessor config default is 2.3M (15x larger); without the override video frames are processed at much higher resolution than the authors used.

max_frames=64 cap stays — not in the official docs but needed for batch eval memory safety; matches every other MTEB video wrapper.

  • review: keep fps=2.0 (MTEB convention)

fps is a sampling rate, not a model config — Qwen2-VL handles variable frame counts natively. fps=2.0 gives finer temporal resolution for the short clips MTEB video tasks use (MSR-VTT 15s, AudioCaps 10s, DiDeMo 25s) while staying well within UME-R1's total_pixels budget: 64 frames * 360420 = 9.7M < 2048028*28 = 16M total budget.

Aligns with every other MTEB video wrapper (PE-AV, Jina, Qwen Omni, e5, OmniEmbedNemotron, VLM2Vec-V2). The fps=1.0 in UME-R1's HF demo is just one example choice, not a structural model requirement. EOF )

  • review: use UME-R1's URL-example pixel budget (4/256 * 28*28)

UME-R1's README shows three video configs. The URL example is the more deliberate one (full min/max/total triple, all patch-aligned to 2828 which matches Qwen2-VL's merged-token size). Use that instead of the local-file example's ad-hoc 360420.

  • min_pixels = 4 * 28 * 28 (>= 4 tokens per frame)
  • max_pixels = 256 * 28 * 28 (<= 256 tokens per frame)

With max_frames=64 this gives at most 64256 = 16,384 tokens across all frames, comfortably under the URL example's total_pixels budget of 20,480 * 2828 tokens.

  • review: pass min/max_pixels to AutoProcessor.from_pretrained (@Samoed)

Cleaner than mutating self.processor.image_processor attributes after load. Matches the pattern in gme_v_models.py: processor = AutoProcessor.from_pretrained( model_name, min_pixels=..., max_pixels=..., ... ) Same end behavior, no hasattr branch. (2b565bf)

Source: README.md, updated 2026-05-16