SentenceTransformers - Browse /v3.4.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-01-23	12.0 kB	0
v3.4.0 - Resolved memory leak when deleting a model _ trainer_ add Matryoshka _ Cached loss compatibility_ small features _ bug fixes source code.tar.gz	2025-01-23	12.7 MB	0
v3.4.0 - Resolved memory leak when deleting a model _ trainer_ add Matryoshka _ Cached loss compatibility_ small features _ bug fixes source code.zip	2025-01-23	13.0 MB	0
Totals: 3 Items		25.7 MB	0

This release resolves a memory leak when deleting a model & trainer, adds compatibility between the Cached... losses and the Matryoshka loss modifier, resolves numerous bugs, and adds several small features.

Install this version with

:::bash
# Training + Inference
pip install sentence-transformers[train]==3.4.0

# Inference only, use one of:
pip install sentence-transformers==3.4.0
pip install sentence-transformers[onnx-gpu]==3.4.0
pip install sentence-transformers[onnx]==3.4.0
pip install sentence-transformers[openvino]==3.4.0

Matryoshka & Cached loss compatibility (#3068, [#3107])

It is now possible to combine the strong Cached losses (CachedMultipleNegativesRankingLoss, CachedGISTEmbedLoss, CachedMultipleNegativesSymmetricRankingLoss) with the Matryoshka loss modifier:

:::python
from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer, losses
from datasets import Dataset

model = SentenceTransformer("microsoft/mpnet-base")
train_dataset = Dataset.from_dict({
    "anchor": ["It's nice weather outside today.", "He drove to work."],
    "positive": ["It's so sunny.", "He took the car to the office."],
})
loss = losses.CachedMultipleNegativesRankingLoss(model, mini_batch_size=16)
loss = losses.MatryoshkaLoss(model, loss, [768, 512, 256, 128, 64])

trainer = SentenceTransformerTrainer(
    model=model,
    train_dataset=train_dataset,
    loss=loss,
)
trainer.train()

See for example tomaarsen/mpnet-base-gooaq-cmnrl-mrl which was trained with CachedMultipleNegativesRankingLoss (CMNRL) with the Matryoshka loss modifier (MRL).

Resolve memory leak when Model and Trainer are reinitialized (#3144)

Due to a circular dependency in the SentenceTransformerTrainer -> SentenceTransformer -> SentenceTransformerModelCardData -> SentenceTransformerTrainer, deleting the trainer and model still doesn't clear them up via garbage disposal. I've moved a lot of components around, and now SentenceTransformerModelCardData does not need to store the SentenceTransformerTrainer, breaking the cycle.

We ran the seed optimization script (which frequently creates and deletes models and trainers): * Before: Approximate highest recorded VRAM: 16332MiB / 24576MiB * After: Approximate highest recorded VRAM: 8222MiB / 24576MiB

Small Features

Add Matthews Correlation Coefficient to the BinaryClassificationEvaluator in [#3051].
Add a triplet margin parameter to the TripletEvaluator in [#2862].
Put dataset information in the automatically generated model card in "expanding sections" blocks if there are many datasets in [#3088].
Add multi-GPU (and CPU multi-process) support for mine_hard_negatives in [#2967].

Notable Bug Fixes

Subsequent batches were identical when using the no_duplicates Batch Sampler (#3069). This has been resolved in [#3073]
The old-style model.fit() training with write_csv on an evaluator would crash (#3062). This has been resolved in [#3066].
The output types of some evaluators were np.float instead of float (#3075). This has been resolved in [#3076] and [#3096].
It was not possible to specify a revision or cache_dir when loading a PEFT Adapter model (#3061). This has been resolved in [#3079] and [#3174].
The CrossEncoder was lazily placed on the incorrect device, did not respond to model.to (#3078). This has been resolved in [#3104].
If a model used a custom module with custom kwargs, those kwargs keys were not saved in modules.json correctly, e.g. relevant for jina-embeddings-v3 (#3111). This has been resolved in [#3112].
HfArgumentParser(SentenceTransformerTrainingArguments) would crash due to prompts typing (#3090). This has been resolved in [#3178].

Example Updates

Update the quantization script in [#3070].
Update the seed optimization script in [#3092].
Update the TSDAE scripts in [#3137].
Add PEFT Adapter script in [#3180].

Documentation Updates

Add PEFT Adapter documentation in [#3180].
Add links to backend-export in Speeding up Inference.

All Changes

[training] Pass steps/epoch/output_path to Evaluator during training by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3066
[examples] Update the quantization script by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3070
[fix] Fix different batches per epoch in NoDuplicatesBatchSampler by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3073
[docs] Add links to backend-export in Speeding up Inference by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3071
add MCC to BinaryClassificationEvaluator by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3051
support cached losses in combination with matryoshka loss by @Marcel256 in https://github.com/UKPLab/sentence-transformers/pull/3068
align model_card_templates.py with code by @amitport in https://github.com/UKPLab/sentence-transformers/pull/3081
converting np float result to float in binary classification evaluator by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3076
Add triplet margin for distance functions in TripletEvaluator by @zivicmilos in https://github.com/UKPLab/sentence-transformers/pull/2862
[model_card] Keep the model card readable even with many datasets by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3088
[docs] Add NanoBEIR to the Training Overview evaluators by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3089
[fix] revision of the adapter model can now be specified. by @pesuchin in https://github.com/UKPLab/sentence-transformers/pull/3079
[docs] Update from Sphinx==3.5.4 to 8.1.3, recommonmark -> myst-parser by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3099
normalize to float in NanoBEIREvaluator, InformationRetrievalEvaluator, MSEEvaluator by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3096
[docs] List 'prompts' as a key training argument by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3101
revert float type cast manually in BinaryClassificationEvaluator by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3102
update train_sts_seed_optimization with SentenceTransformerTrainer by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3092
Fix cross encoder device issue by @susnato in https://github.com/UKPLab/sentence-transformers/pull/3104
[enhancement] Make MultipleNegativesRankingLoss easier to understand by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3100
[fix] Fix breaking change in PyLate when loading modules by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3110
multi-GPU support for mine_hard_negatives by @alperctnkaya in https://github.com/UKPLab/sentence-transformers/pull/2967
raises error when dataset is an empty list in NanoBEIREvaluator by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3122
Added a note to the documentation stating that the similarity method does not support embeddings other than non-quantized ones. by @pesuchin in https://github.com/UKPLab/sentence-transformers/pull/3131
[typo] Add missing space between sentences in error message by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3125
raises ValueError when num_label !=1 when using Crossencoder.rank() by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3126
fix backward pass for cached losses by @Marcel256 in https://github.com/UKPLab/sentence-transformers/pull/3114
Adding evaluation checks to prevent Transformer ValueError by @stsfaroz in https://github.com/UKPLab/sentence-transformers/pull/3105
[typo] Fix incorrect spelling for "corpus" by @ignasgr in https://github.com/UKPLab/sentence-transformers/pull/3154
[fix] Save custom module kwargs if specified by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3112
[memory] Avoid storing trainer in ModelCardCallback and SentenceTransformerModelCardData by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3144
Suport for embedded representation by @Radu1999 in https://github.com/UKPLab/sentence-transformers/pull/3156
[DRAFT] tests for nanobeir evaluator by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3127
Update TSDAE examples with SentenceTransformerTrainer by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3137
[docs] Update the Static Embedding example snippet by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3177
fix: propagate cache dir to find adapter by @lauralehoczki11 in https://github.com/UKPLab/sentence-transformers/pull/3174
[fix] Use HfArgumentParser-compatible typing for prompts by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3178
testcases for community detection by @JINO-ROHIT in https://github.com/UKPLab/sentence-transformers/pull/3163
[docs] Add PEFT documentation + training example by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3180
[tests] Make TripletEvaluator test more consistent by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3183
[deprecation] Clarify that datasets and readers are deprecated since v3 by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3184
[docs] Update the documentation surrounding Matryoshka + Cached losses by @tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3190

New Contributors

@JINO-ROHIT made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3051
@Marcel256 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3068
@amitport made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3081
@zivicmilos made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2862
@susnato made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3104
@alperctnkaya made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2967
@stsfaroz made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3105
@ignasgr made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3154
@Radu1999 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3156
@lauralehoczki11 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3174

An explicit thanks to @JINO-ROHIT who has made a large amount of contributions in this release.

Full Changelog: https://github.com/UKPLab/sentence-transformers/compare/v3.3.1...v3.4.0

Source: README.md, updated 2025-01-23

SentenceTransformers Files

Multilingual sentence & image embeddings with BERT

Matryoshka & Cached loss compatibility (#3068, [#3107])

Resolve memory leak when Model and Trainer are reinitialized (#3144)

Small Features

Notable Bug Fixes

Example Updates

Documentation Updates

All Changes

New Contributors

SentenceTransformers Files

Multilingual sentence & image embeddings with BERT

Get an email when there's a new version of SentenceTransformers

Matryoshka & Cached loss compatibility (#3068, [#3107])

Resolve memory leak when Model and Trainer are reinitialized (#3144)

Small Features

Notable Bug Fixes

Example Updates

Documentation Updates

All Changes

New Contributors